Posted in

OpenAI 发布 GPT-4o mini 模型,增强抗越狱能力_AI阅读总结 — 包阅AI

包阅导读总结

1.

关键词:OpenAI、GPT-4o mini、Jailbreak Resistance、Instruction Hierarchy、Model Performance

2.

总结:OpenAI 发布 GPT-4o mini 模型,它是 GPT-4o 的缩小版,在多项基准测试中表现出色,采用指令层级训练方法提高了防越狱和防系统提示提取能力,支持多种语言和模态,可通过 OpenAI API 和 ChatGPT 获取。

3.

主要内容:

– OpenAI 发布 GPT-4o mini 模型

– 是 GPT-4o 的较小版本

– 性能超越 GPT-3.5 Turbo

– 采用新的指令层级训练方法

– 提升对越狱和系统提示提取的防御

– 有更好的模型稳健性

– 模型特点

– 支持与 GPT-4o 相同的语言和模态

– 具有相同的上下文窗口和训练知识截止时间

– 内置安全缓解措施

– 性能表现

– 在 MMLU 和 HumanEval 等基准测试中表现出色

– 早期版本在 Arena 测试中得分接近 GPT-4-Turbo

– 但在复杂教育提示方面有不足

– 可获取途径

– 通过 OpenAI API

– 在 ChatGPT 中可用

思维导图:

文章地址:https://www.infoq.com/news/2024/07/gpt-4o-mini/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global

文章来源:infoq.com

作者:Anthony Alford

发布时间:2024/7/23 0:00

语言:英文

总字数:521字

预计阅读时间:3分钟

评分:88分

标签:AI 模型,OpenAI,GPT-4o mini,AI 安全,成本效益


以下为原文内容

本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com

OpenAI released GPT-4o mini, a smaller version of their flagship GPT-4o model. GPT-4o mini outperforms GPT-3.5 Turbo on several LLM benchmarks and is OpenAI’s first model trained with an instruction hierarchy method that improves the model’s resistance to jailbreaks and system prompt extraction.

GPT-4o mini supports the same languages and modalities as the full GPT-4o model, although currently the OpenAI API only allows text and vision, with audio and video input/output “coming in the future.” The model also has the same context window, 128k tokens, and the same October 2023 training knowledge cutoff. It has the same built-in safety mitigations as GPT-4o, and in addition was trained using OpenAI’s instruction hierarchy training method which gives models up to 30% better robustness against jailbreaks and 60% improved defense against system prompt extraction. On LLM benchmarks such as MMLU and HumanEval, GPT-4o mini outperforms comparable small LLMs such as Gemini Flash and Claude Haiku as well as GPT-3.5. According to OpenAI:

Over the past few years, we’ve witnessed remarkable advancements in AI intelligence paired with substantial reductions in cost…We’re committed to continuing this trajectory of driving down costs while enhancing model capabilities. We envision a future where models become seamlessly integrated in every app and on every website. GPT-4o mini is paving the way for developers to build and scale powerful AI applications more efficiently and affordably. The future of AI is becoming more accessible, reliable, and embedded in our daily digital experiences, and we’re excited to continue to lead the way.

While OpenAI has not published many technical details of the model, the company did recently publish a research paper on training models to follow an instruction hierarchy. The key idea is that many attack vectors against LLMs use the fact that “LLMs often consider system prompts to be the same priority as text from untrusted users and third parties.” To address this, OpenAI developed a training dataset that teaches LLMs to ignore “lower-privileged” instructions when they conflict with higher ones.

To evaluate this method, the researchers first fine-tuned a model on the dataset then tested it on a set of both open-source attack benchmarks and proprietary ones. The fine-tuned model showed improved robustness on all benchmarks. The team did notice, however, that the model tended to “over-refuse” on some benchmarks, but they said they do not expect this “to cause noticeable degradations in model behavior” for real-world use cases.

OpenAI CEO Sam Altman posted on X that the company’s best model in 2022, text-davinci-003, was “much, much worse” than GPT-4o mini. Also on X, the LMSYS team revealed that:

GPT-4o mini’s early version “upcoming-gpt-mini” was tested in Arena in the past week. With over 6K user votes, we are excited to share its early score reaching GPT-4-Turbo performance, while offering significant cost reduction.

However, Wharton professor Ethan Mollick wrote:

First impressions with GPT-4o-mini (what a name) is that it is impressive for a small model but no replacement for a frontier model. When given complex education prompts it can’t follow instructions as well & misses nuance GPT-4o nails.

GPT-4o mini is available via the OpenAI API as well as in ChatGPT.