使用 Gemini 1.5 Flash 大规模扩展恶意软件分析 | Google Cloud 博客_AI阅读总结

包阅导读总结

1. 关键词：Malware Analysis、Gemini 1.5 Flash、Google Compute Engine、Hex-Rays Decompiler、Google Threat Intelligence

2. 总结：文本介绍了内部的 Mandiant 恶意软件分析服务，包括其在 Google Compute Engine 上的架构、流程、面临的挑战及持续发展，还提到与 Hex-Rays 的合作改进，以及在 Google Threat Intelligence 中的应用和未来进化。

3. 主要内容：

– Mandiant 恶意软件分析服务

– 基于 Google Compute Engine 提供可扩展的恶意软件配置提取

– 执行恶意软件解混淆、解密和拆包

– 提取配置和 IOCs 用于威胁识别和追踪

– Hex-Rays IDA Pro 解编译器集群

– 提供必要的解编译能力

– 集成于管道，读取来自 Mandiant Backscatter 的二进制文件

– 存储解编译后的伪 C 代码

– 挑战和持续发展

– 强调各阶段的相互依存和改进的重要性

– 与 Hex-Rays 合作的三个关键改进领域

– Google Threat Intelligence

– 代码分析报告将集成到 VirusTotal 的 Code Insight 部分

– 开发高级版本，利用 Gemini 1.5 Pro 和 AI 代理进行更强大的分析

思维导图：

文章地址：https://cloud.google.com/blog/topics/threat-intelligence/scaling-up-malware-analysis-with-gemini/

文章来源：cloud.google.com

作者：Threat Intelligence

发布时间：2024/7/15 0:00

语言：英文

总字数：2668字

预计阅读时间：11分钟

评分：85分

标签：恶意软件分析,Gemini 1.5 Flash,Google Cloud,AI 在网络安全中的应用,大型语言模型

以下为原文内容

本内容来源于用户推荐转载，旨在分享知识与观点，如有侵权请联系删除联系邮箱 media@ilingban.com

Mandiant Backscatter

Our internal Mandiant Malware Analysis Backscatter Service, hosted on the Google Compute Engine, provides scalable malware configuration extraction. As part of extracting configurations, Backscatter also performs malware deobfuscation, decryption and unpacking in-line with our VirusTotal pipeline to decompose the malware into artifacts. From these artifacts, configurations are extracted and the resulting IOCs are used to identify and track malware threats and actors across hundreds of malware families in our Google Threat Intelligence platform. The artifacts, including unpacked binaries, are also resubmitted back into the pipeline, allowing tools such as Gemini 1.5 Flash to perform additional processing to extend our knowledge of what operations the malware is performing with the IOCs identified in previous stages.

Hex-Rays Decompiler

Our cluster of Hex-Rays IDA Pro Decompilers, hosted on Google Compute Engine, provides the scalable decompilation power necessary for this pipeline. We leverage the new IDA LIB, a headless version of IDA Pro designed for automated workflows, which is scheduled for release in Q3 2024. The cluster seamlessly integrates with our pipeline, reading unpacked binaries from a Google Cloud Pub/Sub queue fed by Mandiant Backscatter. The resulting decompiled pseudo-C code is then stored in a Google Cloud Storage bucket, ready for analysis by Gemini 1.5 Flash. Currently, each node in the cluster can decompile more than 3,000 files per hour, ensuring we can keep pace with the high volume of incoming binaries.

Challenges and Ongoing Development

As expected, our tests highlighted a crucial aspect of this pipeline: the performance of Gemini 1.5 Flash is heavily dependent on the quality of the preceding unpacking and decompilation stages. For instance, if the unpacking phase fails to fully unpack a new or unknown packer, the decompiler will only be able to extract the code of the packer itself, not the original program logic hidden within. In such cases, Gemini correctly reports that it’s analyzing a program performing unpacking, decryption, or deobfuscation operations, and that it won’t be able to analyze the true purpose of the code concealed by the packer.

Similarly, the quality of the decompiled code directly impacts Gemini’s ability to understand and analyze the program’s behavior. The decompiled code is the raw material for Gemini’s analysis, so any errors or inconsistencies in this code will propagate to the final report. Moreover, Gemini must also contend with various code-level obfuscation methods, including new approaches employed by attackers, requiring it to continuously adapt and improve its analysis capabilities in this evolving landscape.

This interdependence underscores the importance of continuously improving all three stages of the pipeline. A weakness in any part of this sequential workflow will directly impact the performance of the subsequent phases. Improved outputs from these stages directly translate to more successful analysis by Gemini. Therefore, our ongoing development efforts focus not only on enhancing Gemini’s analytical capabilities but also on refining the unpacking and decompilation stages to ensure they deliver the highest quality output for analysis.

On the decompilation side, we are working closely with Hex-Rays to enhance their decompiler, focusing on three key areas:

Improved Language-Specific Structure Recognition: We aim to enhance the decompiler’s ability to recognize structures unique to specific programming languages. This includes elements like try-catch statements or class member definitions within C++, Rust, and Golang code. By adding a new semantic layer to the decompiler, we can enable it to interpret the underlying code more effectively. This leads to more accurate and readable output, ultimately benefiting Gemini’s analysis.
More Meaningful Function and Variable Naming: Clear and descriptive names for functions and variables within the decompiled code significantly aid Gemini’s analysis. We’re exploring techniques to generate such names during the decompilation process, including the possibility of integrating Gemini for this purpose.
Richer Contextual Information: Beyond improved decompiled code, we’re investigating methods to provide the model with richer contextual data. This might include visual representations like data flow diagrams and control flow graphs, or even a complete export of IDA Pro’s IDB. This additional information can provide valuable insights into the program’s overall structure and logic, enabling a more thorough and accurate analysis.

Google Threat Intelligence: The Next Evolution

This is just the beginning of our exploration into leveraging AI for large-scale threat analysis. We are excited to announce that these types of code analysis reports will soon be integrated into VirusTotal’s Code Insight section. This integration will provide the VirusTotal community with valuable insights into the behavior of binary files, powered by the speed and scalability of Gemini 1.5 Flash.

For an even more powerful analysis experience, we are developing an advanced version of this pipeline within Google Threat Intelligence. This implementation will leverage the capabilities of Gemini 1.5 Pro enhanced by AI agents that can use specialized malware analysis tools and correlate threat information from across Google, Mandiant, and VirusTotal. This advanced analysis will be available within our Private Scanning service, ensuring the confidentiality of the content processed. Watch our recent webinar for more on Gemini in Google Threat Intelligence.

We will continue to share our progress and new advancements in AI-driven threat analysis as we strive to make the digital world a safer place. Here at GSEC Malaga, we are dedicated to pushing the boundaries of what’s possible in cybersecurity and exploring new ways to apply AI to protect users from evolving threats.

Samples Details

The following table contains details on the binary samples discussed in this post.

Filename	SHA-256
`goopdate.dll`	`0d2115d3de900bcd5aeca87b9af0afac90f99c5a009db7c162101a200fbfeb2c`
`BootstrapPackagedGame-Win64-Shipping.exe`	`07db922be22e4feedbacea7f92983f51404578bd0c495abaae3d4d6bf87ae6d0`
`svrwsc.exe`	`0cdb71e81b07247ee9d4ea1e1005c9454a5d3eb5f1078279a905f0095fd88566`
`colto.exe`	`091e505df4290f1244b3d9a75817bb1e7524ac346a2f28b0ef3c689c445beb45`
`3DViewer2009.exe`	`08f20e0a2d30ba259cd3fe2a84ead6580b84e33abfcec4f151c5b2e454602f81`
`AdvProdTool.exe`	`04af0519d0dbe20bc8dc8ba4d97a791ae3e3474c6372de83087394d219babd47`

分类

使用 Gemini 1.5 Flash 大规模扩展恶意软件分析 | Google Cloud 博客_AI阅读总结 — 包阅AI