Meta 发布迄今为止最大的开源模型 Llama 3.1 405B_AI阅读总结

包阅导读总结

1. `Meta`、`Llama 3.1 405B`、`开源模型`、`语言模型`、`参数规模`

2. Meta 发布最新语言模型 Llama 3.1 405B，它是目前最大的开源模型，具有众多优势，多家云厂商支持运行，在多个方面表现出色，接近或超越部分竞品，但非多模态，相关信息可在特定平台获取。

– Meta 发布语言模型 Llama 3.1 405B

– 包括 8B 和 70B 等版本

– 拥有 4050 亿参数、15 万亿令牌和 16000 个 GPU

– 特点与优势

– 优化数据、规模和复杂性

– 有更精心的数据处理管道

– 支持多语言和多种功能

– 开源，可下载权重

– 性能表现

– 在某些领域排名领先

– 接近或超越 GPT-4o 等

– 运行支持

– 多家云厂商宣布支持

– 但需强大硬件，非普通台式机可运行

– 目前非多模态，未来将有 multimodal Llamas

思维导图：

文章地址：https://www.infoq.com/news/2024/07/meta-releases-llama31-405b/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global

文章来源：infoq.com

作者：Andrew Hoblitzell

发布时间：2024/7/31 0:00

语言：英文

总字数：612字

预计阅读时间：3分钟

评分：90分

标签：人工智能模型,开源,Meta,语言模型,云供应商

以下为原文内容

本内容来源于用户推荐转载，旨在分享知识与观点，如有侵权请联系删除联系邮箱 media@ilingban.com

Meta recently unveiled its latest language model, Llama 3.1 405B. This AI model is the largest of the new Llama models, which also include 8B and 70B versions. With 405 billion parameters, 15 trillion tokens, and 16,000 GPUs, Llama 3.1 405B offers a range of impressive features.

“We believe there are three key levers in the development of high-quality foundation models: data, scale, and managing complexity. We seek to optimize for these three levers in our development process. These improvements include the development of more careful pre-processing and curation pipelines for pre-training data and the development of more rigorous quality assurance and filtering approaches for post-training data.” – Meta AI

After the announcement, several cloud vendors announced their support for running Llama 3.1 405B. 405B was released with providers including Databricks, Dell, Nvidia, IBM, Snowflake, Scale AI, and more. “Amazon Bedrock offers a turnkey way to build generative AI applications with Llama,” Amazon wrote. “Microsoft is announcing Llama 3.1 405B available today through Azure AI’s Models-as-a-Service as a serverless API endpoint,” Microsoft announced. “We’re excited to be one of Meta’s launch partners to make their newest Llama 3.1 8B model available”, Cloudflare said. Groq mentioned early API access to Llama 3.1 405B is currently available to select customers only.

The open-source models have a context window of 128k tokens, meaning users can enter hundreds of pages of content in their prompts. They are multilingual, with support for eight languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. The models also come with tools for web search, math reasoning, and code execution.

“Compared to prior versions of Llama (Touvron et al., 2023a,b), we improved both the quantity and quality of the data we use for pre-training and post-training. These improvements include the development of more careful pre-processing and curation pipelines for pre-training data and the development of more rigorous quality assurance and filtering approaches for post-training data. We pre-train Llama 3 on a corpus of about 15T multilingual tokens, compared to 1.8T tokens for Llama 2,” Meta wrote.

One of the most significant aspects of the Llama 3.1 models is that they are open source. Users can download the weights and use them in their applications. Its benchmark scores are close to, and sometimes even surpass, those of GPT-4o and Claude 3.5 Sonnet. Results can be seen in the model card.

According to the Scale AI’s SEAL leaderboard, Llama 3.1 405B ranks second in math and reasoning, fourth in coding, and first in following instructions. The exact performance will depend on the use case, but it is expected to be on par with the top closed LLMs.

Today, several tech companies are developing leading closed models. But open source is quickly closing the gap. Last year, Llama 2 was only comparable to an older generation of models behind the frontier. This year, Llama 3 is competitive with the most advanced models and leading in some areas. – Mark Zuckerberg

The release of Llama 3.1 405B is potentially the first time anyone can download a GPT-4-class large language model for free and run it on their own hardware. However, users will still need powerful hardware as Meta says it can run on a “single server node,” which is beyond the capabilities of a desktop PC. The release of Llama 3.1 405B is not just a technical achievement but also a strategic move in the AI industry.

It is worth noting that these models are not multimodal and do not understand or create images. Meta has promised that multimodal Llamas are on the way. Developers interested in learning more about the model can find it on the HuggingFace Hub or read the technical paper.

分类

Meta 发布迄今为止最大的开源模型 Llama 3.1 405B_AI阅读总结 — 包阅AI

以下为原文内容