包阅导读总结
1. `DARPA`、`TRACTOR`、`Rust`、`C 语言`、`内存安全`
2. 总结:DARPA 推出将 C 语言应用迁移到 Rust 的 TRACTOR 项目,以解决内存安全问题。该项目虽面临诸多挑战和质疑,但被认为有价值,可能带来新的解决方案和知识积累。
3.
– DARPA 启动 TRACTOR 项目,旨在将全球遗留 C 代码自动翻译为 Rust
– 内存安全漏洞是 C 等语言常见问题,Rust 有助于解决
– 项目引发对能否保留原始代码语义及避免错误的质疑
– 专家观点
– 有人认为 DARPA 适合推动此项目,也有人质疑目标可行性
– 有人担心企业大规模迁移代码的风险
– 项目期待与挑战
– 希望有软件分析和大语言模型结合的方案
– 依赖大语言模型面临翻译质量和测试等问题
– 项目艰巨但有意义,可能带来知识积累和进步
思维导图:
文章地址:https://thenewstack.io/can-darpas-tractor-pull-c-to-rust-for-memory-safe-overhaul/
文章来源:thenewstack.io
作者:Darryl K. Taft
发布时间:2024/8/14 16:03
语言:英文
总字数:1885字
预计阅读时间:8分钟
评分:89分
标签:内存安全,Rust,C语言,国防高级研究计划局,大型语言模型(LLMs)
以下为原文内容
本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com
The ongoing effort to limit and prevent software vulnerabilities has a powerful champion in the Defense Advanced Research Projects Agency (DARPA), which has launched an ambitious effort to migrate C language applications to Rust for memory-safety reasons.
The recently announced Translating All C to Rust (TRACTOR) program seeks to automate the translation of the world’s legacy C code to Rust, said Dan Wallach, the DARPA program manager in charge of the effort.
DARPA notes that memory-safety vulnerabilities are the most prevalent type of disclosed software vulnerability that can occur with programming languages like C that allow programmers to manipulate memory directly, making it easy to accidentally introduce errors in their program that would enable a seemingly routine operation to corrupt the state of memory. And memory-safety issues can arise when a programming language exhibits an “undefined behavior,” according to a post on the DARPA website.
“After more than two decades of grappling with memory-safety issues in C and C++, the software engineering community has reached a consensus. Relying on bug-finding tools is not enough,” the post said.
However, the shift toward the use of Rust and recent breakthroughs in machine learning techniques, like large language models (LLMs), have created an environment that may lend itself to a new class of solutions, DARPA proposes.
“Rust forces the programmer to get things right,” Wallach said in a statement. “It can feel constraining to deal with all the rules it forces, but when you acclimate to them, the rules give you freedom. They’re like guardrails; once you realize they’re there to protect you, you’ll become free to focus on more important things.”
Indeed, memory-safe languages such as Rust “really protect us from ourselves, namely bad or lazy software development that may work just fine but contains the potential for ruin if run or attacked in the right way,” said Brad Shimmin, an analyst at Omdia.
DARPA Alone Can Fix It?
Despite the ambitious scope of the project, Rust experts see value in DARPA taking on the effort above others.
“It is a formidable task, and that’s precisely why an agency like DARPA should support it,” Tim McNamara, founder of Accelerant and author of “Rust in Action,” told The New Stack.
However, “I’m quite skeptical that the stated aims in the abstract are even possible. How can one preserve the semantics of the original code without preserving its bugs?” he asked. “Still, the worst case is that the software industry gains more information about how to translate old code into memory-safe alternatives. The best case is that we gain a tool that can protect critical infrastructure against digital sabotage.”
Moreover, some are concerned about whether it’s even a good idea to move an entire codebase from C to Rust unless that codebase is actively evolving or in use within a high-risk environment, which is surely the case with DARPA.
For the average enterprise, however, refactoring at this scale is liable to incur risk for a couple of reasons, Shimmin told The New Stack.
“There’s a good reason the Linux kernel development team has taken its time incorporating just a tiny bit of Rust into its predominantly C codebase,” he said. “First, unless the company has a solid foundation in Rust, such a move would be like driving a car blindfolded, just asking for an issue to surface unexpectedly. Second, even though GenAI can certainly transcribe snippets from one language into another, that transcription will lack contextual awareness of the entire codebase.”
Proposals Expected
Wallach is hoping for proposals that include novel combinations of software analysis, such as static and dynamic analysis, and large language models. The DARPA program will host public competitions throughout the effort to test the capabilities of the LLM-powered solutions.
DARPA will sponsor a Proposers Day on Aug. 26, which you can attend in person or virtually. Participants must register by Aug. 19. Details and registration info are available at SAM.Gov.
Relying on LLMs
“You can go to any of the LLM websites, start chatting with one of the AI chatbots, and all you need to say is ‘Here’s some C code. Please translate it to safe idiomatic Rust code,’ cut, paste, and something comes out, and it’s often very good, but not always,” said Wallach said in a statement. “The research challenge is to dramatically improve the automated translation from C to Rust, particularly for program constructs with the most relevance.”
The progress of LLMs in the last year has been substantial, and their ability to formulate and translate human language has risen to academic and professional translation levels, said Holger Mueller, an analyst at Constellation Research. “Not surprisingly it is DARPA constructing an LLM that transfers C code into Rust – thus eliminating the dreaded memory issues that often have been created in C,” he told The New Stack. “Extra benefits of modern programming languages will be having a larger, usually growing developer base and better documentation of code.”
Further, even the best GenAI models are not at this time generating code at the same level you’d get from an accomplished programmer.
“Still, if motivated, a company interested in moving to Rust could fine-tune a suitable LLM using the existing C codebase and state-of-the-art Rust code,” Shimmin said. “Then, if that model has a sufficiently large enough context window, that company might see some usable results.”
But “even then, that’s just code generation,” he noted. “What about unit, regression and other forms of testing? That in and of itself would require some extensive expertise both in testing and in Rust to end up with a code base that’s at least ‘as safe’ as the original. In other words, it can be done, but it will not be easy, and it will prove costly in resources.”
Daunting but Doable
Tony Aiello, head of product and innovation at AdaCore, called TRACTOR a very interesting, very ambitious project that earns the label “DARPA hard.”
“First, writing safe Rust requires a disciplined use of pointers that is not typically reflected in C,” he told The New Stack. “Second, once safe Rust has been generated, there is the validation question: How can you be sure that your Rust is functionally equivalent to your C?” he said.
Aiello added that AdaCore is watching TRACTOR with interest for potential future solutions.
“While the work of moving critical code from C to Rust does seem daunting, we have already made a lot of progress,” said Josh Aas, executive director of the Internet Security Research Group, told The New Stack.
“Today there are memory-safe and high-performance options for TLS, NTP, Sudo, DNS and AV1 video decoding,” Aas explained. “We’ve experimented with translation tools to develop some of those tools, and we’re glad to see DARPA invest in making that a more robust option. As memory-safe software becomes more readily available, we’d like to start seeing it deployed more widely in production.”
This is definitely an incredibly ambitious project from DARPA. And that’s a good thing. Memory-unsafe programming languages like C and C++ represent huge amounts of security risk to the overall digital ecosystem and therefore necessitate proportional investments to improve resilience.
A resolution will require a balanced-portfolio approach, as DARPA’s efforts here are from the research portion of that portfolio and offer higher risk and higher reward. But DARPA has invested in work like this before.
DARPAs mission is to avoid technological surprises. Therefore, it funds “high risk, high reward” work and doesn’t expect every project to produce a working solution, said Per Larsen, co-founder and CEO of Immunant.
“My employer, Immunant, is maintainingaC-to-Rust translation tool, which has been developed with support from DARPA,” he said. “We’ve also migrated C code to Rust using this tool plus manual effort…”
Moreover, “Ultimately, while this is not likely to be a short-term or easy answer to the challenges of memory unsafety, I think it’s reasonable to expect that this is an ‘aim for the moon, even if you miss, you’ll end up among the stars’-type of project,” said one developer familiar with TRACTOR, who requested anonymity as a federal government employee. “Even if DARPA falls short of their vision of totally automatic rewrites, there are still huge opportunities for large advances that reduce cost and improve security.”
Building Knowledge
As many have noted, getting code to work is one thing, having understandable and maintainable code is another.
However, there is value in building knowledge.
“If you have a team rewriting a piece of software, like we did NTP, you’ll end up with a group of experts on, in this case, time synchronization — a group of people who understand how the software works and can maintain and improve it,” said Erik Jonkers, director of open source at Tweede golf, a Dutch software consultancy specializing in Rust development.
“When you translate code, you get developers who know the tricks to polish translated code but don’t, to put it bluntly, know how the software works,” Jonkers explained. “Of course, you can plug that hole by investing time in understanding the translated code, the architecture, or underlying specifications, but the question then becomes whether that is still a more productive [expertise] and efficient [cost] approach.”
This is not necessarily the way to stimulate talented individuals to become leading experts in a specificfield. It seems more likely that developers would abandon the project and leave the project owner with code that runs, but nobody dares to touch.
Risk vs. Reward
Yet, despite the risk, Jonkers said Tweede golf is interested in this approach.
“Undoubtedly, there are pieces of software components, and other types of software, for which it does make sense to take a translation approach,” he said.
For instance, Immunant createdc2rust, a C-to-Rust translator. It is using it for AOMedia Video 1 (AV1). Immunant has done C-to-Rust work similarly to what Tweede golf has done, Larsen said.
“We are planning to use c2rustfor abzip2port. We will use that experienceto better judge its feasibility andcompare it to the ‘from scratch’ approach weused for Sudo and NTP,” Jonkers said.
“I expect we will know more in a couple of months,” he added. “Our first experiments say that it is not a silver bullet currently, but it can be useful. It works reasonably well for certain pieces of code, but not for all. In any case, it requires additional manual work to get to a production-ready state.”
Don’t Wait to Act
The time to start is now as DARPA takes an active lead in this effort to promote memory safety.
“I have no doubt that DARPA can improve the situation and that the code produced will be of higher quality, in more cases, and ‘LLM-powered solutions’ will contribute to that,” Jonkers said. “It will take a while before they get there, I’d say.”
However, companies that have large existing exposure to security risk from C or C++ should not treat this as an excuse to wait for years to act. Companies need to be taking proactive steps today to both mitigate that risk and start replacing C and C++ entirely.
“I believe that it is an extremely hard problem, but also one that is worth pursuing,” Larsen said.
YOUTUBE.COM/THENEWSTACK
Tech moves fast, don’t miss an episode. Subscribe to our YouTubechannel to stream all our podcasts, interviews, demos, and more.