Posted in

使用 MongoDB 和 SuperDuperDB 解锁保险业 PDF 搜索_AI阅读总结 — 包阅AI

包阅导读总结

1. 关键词:保险、PDF 搜索、MongoDB、SuperDuperDB、RAG 应用

2. 总结:保险行业文档驱动,处理文档耗时。结合 MongoDB 和 SuperDuperDB 可构建 RAG 驱动的 PDF 搜索系统,提高效率和准确性。文中介绍了系统架构、流程及 SuperDuperDB 的优势,还通过实例展示应用效果。

3. 主要内容:

– 保险行业文档处理现状

– 保险行业依赖文档,专业人员处理文档耗时。

– RAG 应用于保险行业的意义

– 可利用非结构化数据,提升 PDF 搜索效率。

– 构建 PDF 搜索系统

– 用户添加待搜索的 PDF。

– 脚本扫描、分块和向量化。

– 向量和元数据存于 MongoDB 并创建索引。

– 用户提问,系统返回答案和来源。

– SuperDuperDB 介绍

– 开源 Python 框架,集成 AI 模型和工作流。

– 优势包括灵活、可扩展、保护数据等。

– 提供多种样本用例和笔记本。

– 实践案例

– 以保险承销商为例展示应用效果。

思维导图:

文章地址:https://www.mongodb.com/blog/post/unlock-pdf-search-in-insurance-mongodb-superduperdb

文章来源:mongodb.com

作者:Luca Napoli

发布时间:2024/7/19 12:50

语言:英文

总字数:1002字

预计阅读时间:5分钟

评分:86分

标签:PDF搜索,保险,MongoDB,SuperDuperDB,文档处理


以下为原文内容

本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com

As industries go, the insurance industry is particularly document-driven. Insurance professionals, including claim adjusters and underwriters, spend considerable time handling documentation with a significant portion of their workday consumed by paperwork and administrative tasks. This makes solutions that speed up the process of reviewing documents all the more important.

Retrieval-augmented generation (RAG) applications are a game-changer for insurance companies, enabling them to harness the power of unstructured data while promoting accessibility and flexibility. This is especially true for PDFs, which despite their prevalence are difficult to search, leading claim adjusters and underwriters to spend hours reviewing contracts, claims, and guidelines in this common format.

By combining MongoDB and SuperDuperDB you can build a RAG-powered system for PDF search, thus bringing efficiency and accuracy to this cumbersome task. With a PDF search application, users can simply type a question in natural language and the app will sift through company data, provide an answer, summarize the content of the documents, and indicate the source of the information, including the page and paragraph where it was found.

VIDEO

In this blog, we will dive into the architecture of how this PDF search application can be created and what it looks like in practice.

Why should insurance companies care about PDF Search?

Insurance firms rely heavily on data processing. To make investment decisions or handle claims, they leverage vast amounts of data, mostly unstructured. As previously mentioned, underwriters and claim adjusters need to comb through numerous pages of guidelines, contracts, and reports, typically in PDF format. Manually finding and reviewing every piece of information is time-consuming and can easily lead to expensive mistakes, such as incorrect risk estimations. Quickly finding and accessing relevant content is key. Combining Atlas Vector Search and LLMs to build RAG apps can directly impact the bottom line of an insurance company.

Behind the scenes: System architecture and flow

As mentioned, MongoDB and SuperDuperDB underpin our information retrieval system. Let’s break down the process of building it:

  1. The user adds the PDFs that need to be searched.

  2. A script scans them, creates the chunks, and vectorizes them (see Figure 1). The chunking step is carried out using a sliding window methodology, which ensures that potentially important transitional data between chunks is not lost, helping to preserve continuity of context.

  3. Vectors and chunk metadata are stored in MongoDB, and an Atlas Vector Search index is created (see Figure 3).

  4. The PDFs are now ready to be queried. The user selects a customer, asks a question, and the system returns an answer, where it was found and highlights the section with a red frame (see Figure 3).

Figure 1: PDF chunking, embedding creation, and storage orchestrated with SuperDuperDB

Each customer has a guidelines PDF associated with their account based on their residency. When the user selects a customer and asks a question, the system runs a Vector Search query on that particular document, seamlessly filtering out the non-relevant ones. This is made possible by the pre-filtering field included in the search query.

Atlas Vector Search also takes advantage of MongoDB’s new Search Nodes dedicated architecture, enabling better optimization for the right level of resourcing for specific workload needs. Search Nodes provide dedicated infrastructure for Atlas Search and Vector Search workloads, allowing you to optimize your compute resources and fully scale your search needs independent of the database. Search Nodes provide better performance at scale, delivering workload isolation, higher availability, and the ability to optimize resource usage.

Figure 2: PDF querying flow, orchestrated with SuperDuperDB

SuperDuperDB

SuperDuperDB is an open-source Python framework for integrating AI models and workflows directly with and across major databases for more flexible and scalable custom enterprise AI solutions. It enables developers to build, deploy, and manage AI on their existing data infrastructure and data, while using their preferred tools, eliminating data migration and duplication.

With SuperDuperDB, developers can:

  • Bring AI to their databases, eliminate data pipelines and moving data, and minimize engineering efforts, time to production, and computation resources.

  • Implement AI workflows with any open and closed source AI models and APIs, on any type of data, with any AI and Python framework, package, class or function.

  • Safeguard their data by switching from APIs to hosting and fine-tuning your own models, on your own existing infrastructure, whether on-premises or in the cloud.

  • Easily switch between embedding models and LLMs, to other API providers as well as hosting your own models, on HuggingFace, or elsewhere just by changing a small configuration.

Build next-generation AI apps on your existing database

SuperDuperDB provides an array of sample use cases and notebooks that developers can use to get started, including vector search with MongoDB, embedding generation, multimodal search, retrieval-augmented generation (RAG), transfer learning, and many more. The demo showcased in this post is adapted from an app previously developed by SuperDuperDB.

Let’s put it into practice

To show you how this could work in practice, let’s look at, an underwriter handling a specific case. The underwriter is seeking to identify the risk control measures as shown in Figure 3 below but needs to look through documentation. Analyzing the guidelines PDF associated with a specific customer helps determine the loss in the event of an accident or the new premium in the case of a policy renewal. The app assists by answering questions and displaying relevant sections of the document.

Figure 3: Screenshot of the UI of the application, showing the question asked, the LLM’s answer, and the reference document where the information is found

By integrating MongoDB and SuperDuperDB, you can create a RAG-powered system for efficient and accurate PDF search. This application allows users to type questions in natural language, enabling the app to search through company data, provide answers, summarize document content, and pinpoint the exact source of the information, including the specific page and paragraph.

If you would like to learn more about Vector Search powered apps and SuperDuperDB, visit the following resources: