包阅导读总结

1. 关键词：Mistral Large 2、开源、推理、微调、模型

2. 总结：Mistral 宣布推出旗舰机型 Mistral Large 2，其在代码生成、数学和推理等方面能力增强，支持多语言和多种编码语言。提供了模型链接、下载方式、推理代码，还介绍了用 ms-swift 进行自我认知微调及推理的方法，包括安装、脚本、显存消耗等。

3. 主要内容：

– Mistral Large 2 发布：

– 性能提升，支持多语言和编码语言

– 评估指标表现出色

– 模型获取：

– 模型链接

– 下载方式及代码

– 推理相关：

– 推理代码示例

– 数学等方面推理表现

– 模型微调：

– 使用 ms-swift 工具

– 微调前的环境安装

– 微调脚本及相关参数

– 微调显存消耗和 loss 可视化

– 微调后的推理脚本及加速方法

– 模型部署及显存利用率

思维导图：

文章地址：https://mp.weixin.qq.com/s/pXqpCVd2pmFwqu6cCqrz3Q

文章来源：mp.weixin.qq.com

作者：黄锦涛

发布时间：2024/7/25 13:39

语言：中文

总字数：1398字

预计阅读时间：6分钟

评分：91分

标签：大语言模型,Mistral Large 2,开源,多语言支持,代码生成

以下为原文内容

本内容来源于用户推荐转载，旨在分享知识与观点，如有侵权请联系删除联系邮箱 media@ilingban.com

Mistral宣布推出新一代旗舰机型 Mistral Large 2。与前代产品相比，Mistral Large 2 在代码生成、数学和推理方面的能力显著增强。它还提供了更强大的多语言支持和高级函数调用功能。

Mistral Large 2 具有 128k 上下文窗口，支持法语、德语、西班牙语、意大利语、葡萄牙语、阿拉伯语、印地语、俄语、中文、日语和韩语等数十种语言，以及 Python、Java、C、C++、JavaScript 和 Bash 等 80多种编码语言。

Mistral Large 2 在评估指标上在性能/服务成本方面树立了新标杆。特别是在 MMLU 上，预训练版本实现了 84.0%的准确率，并在开放模型的性能/成本树立了新标杆。

模型链接：

https://modelscope.cn/models/LLM-Research/Mistral-Large-Instruct-2407

模型下载：

from modelscope import snapshot_download# 可仅下载model safetensor文件model_dir=snapshot_download('LLM-Research/Mistral-Large-Instruct-2407',ignore_file_pattern=['^consolidated'])

模型license:MistralResearchLicense,仅允许用于在学术和非商用场景的使用

升级transformers版本

pipinstallgit+https://github.com/huggingface/transformers.git

推理代码：

from transformers import pipelinefrom modelscope import snapshot_downloadmodel_dir=snapshot_download('LLM-Research/Mistral-Large-Instruct-2407', ignore_file_pattern=['^consolidated'])
messages = [    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},    {"role": "user", "content": "Who are you?"},]chatbot = pipeline("text-generation", model=model_dir)chatbot(messages)

数学：最近很火的比大小

中文错了：

英文对了：

代码：写一个24点

中文错了：

英文对了：

常识问答，城市名：

中文：

英文：

我们介绍使用ms-swift对mistral-large-instruct-2407进行自我认知微调，并对微调前后的模型进行推理。swift是魔搭社区官方提供的LLM工具箱，支持300+大语言模型和50+多模态大模型的微调、推理、量化、评估和部署。

swift开源地址：

https://github.com/modelscope/swift

自我认知数据集：

https://modelscope.cn/datasets/swift/self-cognition

这里我们只展示可直接运行的demo，如果需要使用其他数据集进行微调，只需要修改–dataset即可。自定义dataset支持传入本地路径、modelscope和huggingface中的dataset_id。

文档可以查看：https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.md#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AE%E9%9B%86

在开始微调之前，请确保您的环境已正确安装

# 安装ms-swiftgit clone https://github.com/modelscope/swift.gitcd swiftpip install -e .[llm]
pip install transformers>=4.43
# 如果要使用推理加速pip install vllm>=0.5.3.post1

微调脚本：（如果出现显存不足，请增加GPU数量）

# 实验环境: 4 * A100# 训练时间: 40小时# 4 * 80GB GPU memoryNPROC_PER_NODE=4 \CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \    --model_type mistral-large-instruct-2407 \    --dataset alpaca-zh#500 alpaca-en#500 self-cognition#500 \    --logging_steps 5 \    --max_length 2048 \    --learning_rate 1e-4 \    --output_dir output \    --lora_target_modules ALL \    --model_name 小黄 'Xiao Huang' \    --model_author 魔搭 ModelScope \    --deepspeed default-zero3

微调显存消耗：

微调过程的loss可视化：

微调后推理脚本如下，这里的ckpt_dir需要修改为训练生成的last checkpoint文件夹。我们可以使用vLLM对merge后的checkpoint进行推理加速。

# 实验环境: 4 * A100# 4 * 80GB GPU memory# merge-loraCUDA_VISIBLE_DEVICES=0 swift export \    --ckpt_dir output/mistral-large-instruct-2407/vx-xxx/checkpoint-xxx \    --merge_lora true --merge_device_map cpu
# 使用vLLM进行推理加速CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer \    --ckpt_dir output/mistral-large-instruct-2407/vx-xxx/checkpoint-xxx-merged \    --tensor_parallel_size 4 --gpu_memory_utilization 0.9 \    --infer_backend vllm

推理结果：

模型部署

使用4卡机器，部署mistral-large-instruct-2407模型

CUDA_VISIBLE_DEVICES=0,1,2,3vllmserve<loca_path>--served_model_namemistral-large-instruct-2407--tensor_parallel_size4

显存利用率如下：

点击阅读原文，跳转模型链接~

分类

Large Enough！Mistral Large 2 开源！魔搭推理、微调最佳实战教程来啦！_AI阅读总结 — 包阅AI

以下为原文内容

模型部署