Posted in

New Computer 借助 LangSmith 将记忆检索召回率提升 50%_AI阅读总结 — 包阅AI

包阅导读总结

1. 关键词:New Computer、Dot、Memory Retrieval、LangSmith、Precision

2. 总结:New Computer 通过 LangSmith 改进了 Dot 的记忆检索系统,实现了更高的召回率和精度,还优化了对话提示,不断探索深化与用户的关系。

3. 主要内容:

– New Computer 团队打造了具有创新型代理记忆系统的 Dot

– 其记忆系统通过观察用户的言语和行为线索学习偏好,不断进化对用户的认知以提供个性化协助。

– 利用 LangSmith 改进记忆检索

– 测试多种检索方法,用合成数据和标记的数据集快速迭代,显著提高了召回率和精度。

– 调整对话提示

– 用合成用户生成查询来优化提示,在 LangSmith 中直观比较和调整,提高了迭代速度。

– 未来展望

– 持续寻求让用户感觉被理解的方法,与 LangChain 团队合作至关重要。

思维导图:

文章地址:https://blog.langchain.dev/customers-new-computer/

文章来源:blog.langchain.dev

作者:LangChain

发布时间:2024/7/17 4:57

语言:英文

总字数:830字

预计阅读时间:4分钟

评分:86分

标签:AI 记忆检索,LangSmith,New Computer,Dot AI,合成数据


以下为原文内容

本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com

About New Computer

New Computer is the team behind Dot, the first personal AI designed to truly understand its users. Dot’s long-term memory system learns users preferences over time by observing verbal and behavioral cues. Dot’s memory system goes beyond just recall— it constantly evolves its picture of who the user is in order to provide timely and personalized assistance, creating a perception of true understanding.

With LangSmith, New Computer has been able to test and improve their memory retrieval systems, leading to 50% higher recall and 40% higher precision compared to a previous baseline implementation of dynamic memory retrieval.

A brief overview of Dot’s agentic memory

The New Computer team has built an innovative, first-of-its-kind agentic memory system. Unlike standard RAG methods that rely on a static set of documents, agentic memory involves dynamically creating or pre-calculating documents that will only be retrieved later. This means that information must be structured during memory creation in order to make retrieval possible and, as memories accumulate over time, accurate & efficient.

In addition to the raw content, Dot’s memories have a set of optional “meta-fields” that are useful for retrieval. These include status (e.g. COMPLETED or IN PROGRESS) and datetime fields like start or due date. These can be used as additional filter methods for high-frequency queries during retrieval, such as “Which tasks did I want to get done this week?”, or “What do I have left to complete for today?”

Improving memory retrieval with LangSmith

With their diverse range of retrieval methods— one or multiple of semantic, keyword, BM25, meta-field filter techniques — New Computer needed a new way to iterate quickly on a dataset of labeled examples. To test performance while preserving user privacy, they generated synthetic data by creating a cohort of synthetic users with LLM-generated backstories. After an initial conversation to seed the memory database for each synthetic user, the team began storing queries (messages by synthetic users) along with the full set of available memories in a LangSmith dataset.

Using an in-house tool connected to LangSmith, the New Computer team labeled relevant memories for each query and defined evaluation metrics like precision, recall and F1, allowing them to quickly iterate on improving retrieval for the agentic memory system.

For this set of experiments, they started out with a simple baseline system using semantic search that retrieves a fixed number of the most relevant memories per query. They then tested other techniques to assess performance across different query types. In some cases, similarity search or keyword methods like BM25 worked better; in others, these methods required some pre-filtering by meta-fields in order to perform effectively.

As you might imagine, running these multiple methods in parallel can lead to a combinatorial explosion of experiments— thus, validating different methods quickly on a diverse dataset is crucial for making progress. LangSmith’s easy-to-use SDK and Experiments UI enabled New Computer to run, evaluate, and inspect the results of these experiments quickly and efficiently.

An overview of F1 performance across different experiments that New Computer ran in LangSmith

These experiments enabled New Computer to significantly improve their memory systems, leading to 50% higher recall and 40% higher precision compared to a previous baseline implementation of dynamic memory retrieval.

Adjusting the conversation prompt with LangSmith

Dot’s responses are generated by a dynamic conversational prompt— which means that in addition to including relevant memories, the system might also rely upon tool usage (e.g. search results) and highly-contextual behavioral instructions in order to respond in an accurate and natural way.

Developing a highly variable system like this can be challenging, as a change that improves one query can have detrimental effects on others.

To optimize the prompt, the New Computer team again used a cohort of synthetic users to generate user queries for a wide range of intents. They were then able to easily inspect the global effects of prompt changes in LangSmith’s experiment comparison view. This let them identify regressed runs derived from prompt changes in a highly-visual manner.

In addition, in failure cases where the output was inaccurate, the team could directly adjust prompts without leaving the LangSmith UI using the built-in prompt playground. This greatly improved the team’s iteration speed while evaluating and adjusting their conversation prompts.

What’s next for New Computer

As New Computer pushes to deepen human-AI relationships, the team is constantly seeking ways to make users feel truly perceived and understood. This includes enabling Dot to adapt to conversational or tonal preferences of the user, or becoming fully bespoke on a per-user basis by proactively reaching out to users with tailored messages.

Their recent launch has brought in a new wave of users— more than 45% of which converted to the app’s paid tier after hitting the free message limit— who expect Dot to grow and evolve alongside them over time. New Computer’s partnership with the LangChain team and use of LangSmith will remain pivotal to how the team uses novel AI materials to simulate the complexities of a deepening relationship with human users.

Updates from the LangChain team and community