包阅导读总结
1. `Google Translate`、`语言扩张`、`新增 110 种语言`、`语言多样性`、`翻译支持`
2. Google Translate 进行了最大规模的扩张,新增 110 种语言,涵盖多种主要语言及小众语言,为超 6.14 亿人提供翻译支持。介绍了部分新语言,还提到选择语言品种的考虑因素及未来支持更多语言的计划。
3.
– Google Translate 新增 110 种语言,是其最大规模的扩张
– 新增语言包括 Cantonese、Qʼeqchiʼ 等,代表超 6.14 亿人
– 约四分之一新语言来自非洲
– 部分新语言介绍,如 Afar、Cantonese、Manx 等
– 选择语言品种的考量
– 优先考虑最常用的语言品种
– 语言有大量变体,无单一标准形式
– 未来计划
– 随着技术进步及合作,将支持更多语言品种和拼写惯例
– 可在相关网站和应用上开始使用新语言翻译
思维导图:
文章地址:https://blog.google/products/translate/google-translate-new-languages-2024/
文章来源:blog.google
作者:Isaac Caswell
发布时间:2024/7/2 20:38
语言:英文
总字数:651字
预计阅读时间:3分钟
评分:83分
标签:谷歌翻译,语言技术,人工智能,PaLM 2,翻译
以下为原文内容
本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com
Translation support for more than half a billion people
From Cantonese to Qʼeqchiʼ, these new languages represent more than 614 million speakers, opening up translations for around 8% of the world’s population. Some are major world languages with over 100 million speakers. Others are spoken by small communities of Indigenous people, and a few have almost no native speakers but active revitalization efforts. About a quarter of the new languages come from Africa, representing our largest expansion of African languages to date, including Fon, Kikongo, Luo, Ga, Swati, Venda and Wolof.
Here are some of the newly supported languages in Google Translate:
- Afar is a tonal language spoken in Djibouti, Eritrea and Ethiopia. Of all the languages in this launch, Afar had the most volunteer community contributions.
- Cantonese has long been one of the most requested languages for Google Translate. Because Cantonese often overlaps with Mandarin in writing, it’s tricky to find data and train models.
- Manx is the Celtic language of the Isle of Man. It almost went extinct with the death of its last native speaker in 1974. But thanks to an island-wide revival movement, there are now thousands of speakers.
- NKo is a standardized form of the West African Manding languages that unifies many dialects into a common language. Its unique alphabet was invented in 1949, and it has an active research community that develops resources and technology for it today.
- Punjabi (Shahmukhi) is the variety of Punjabi written in Perso-Arabic script (Shahmukhi), and is the most spoken language in Pakistan.
- Tamazight (Amazigh) is a Berber language spoken across North Africa. Although there are many dialects, the written form is generally mutually understandable. It’s written in Latin script and Tifinagh script, both of which Google Translate supports.
- Tok Pisin is an English-based creole and the lingua franca of Papua New Guinea. If you speak English, try translating into Tok Pisin — you might be able to make out the meaning!
How we choose language varieties
There’s a lot to consider when adding new languages to Translate — everything from what varieties we offer, to what specific spellings we use.
Languages have an immense amount of variation: regional varieties, dialects, different spelling standards. In fact, many languages have no one standard form, so it’s impossible to pick a “right” variety. Our approach has been to prioritize the most commonly used varieties of each language. For example, Romani is a language that has many dialects all throughout Europe. Our models produce text that is closest to Southern Vlax Romani, a commonly used variety online. But it also mixes in elements from others, like Northern Vlax and Balkan Romani.
PaLM 2 was a key piece to the puzzle, helping Translate more efficiently learn languages that are closely related to each other, including languages close to Hindi, like Awadhi and Marwadi, and French creoles like Seychellois Creole and Mauritian Creole. As technology advances, and as we continue to partner with expert linguists and native speakers, we’ll support even more language varieties and spelling conventions over time.
Visit the Help Center to learn more about these newly supported languages. And get started translating at translate.google.com or on the Google Translate app on Android and iOS.