AI & Malware Workshop

The Inria PIRAT project team is planning a workshop ‘IA&Malware’ hosted by CentraleSupelec Rennes Campus on Tuesday, March 19, 2024.

This venue, funded by the targeted PEPR Cybersecurity project – DefMal , spans one day and is specifically dedicated to PhD students and post-doctoral researchers with a focus on AI applications in malware analysis. We extend a warm invitation to senior researchers in this domain to participate and contribute their advancements. The workshop encompasses a wide range of research topics, including but not limited to: AI-based malware detection. 

AI&Malware Workshop                                      

Tuesday, March 19, 10 am to 4pm – Room 151, CentraleSupelec, Rennes
Remote attending link: https://bbb.inria.fr/han-rea-stk-bpr  

10:00-10:10 Flashing welcome talk

10:10-10:35 Active Learning for Botnet Campaign Attribution

Speaker: Hélène Orsini, PhD student at INRIA PIRAT team

Abstract: Network attack attribution is crucial for identifying malicious actors, understanding attack campaigns, and implementing preemptive measures. Traditional machine learning approaches face challenges such as labor-intensive campaign annotation, imbalanced attack data distribution, and evolving attack patterns causing concept drift. To address these challenges, we propose DYNAMO, a novel weakly supervised and human-in-the-loop machine learning framework for automated network attack attribution using raw network traffic records. DYNAMO integrates self-supervised learning and density-aware active learning techniques to reduce the overhead of exhaustive annotation, querying human analysts to label only a few selected highly representative network traffic samples. This approach ensures comprehensive and balanced training data coverage, overcoming imbalanced attack data distribution. Our experiments on the CTU-13 dataset demonstrate that annotating less than 3% of the records achieves attribution accuracy comparable to fully supervised approaches with twice as many labeled records. Moreover, compared to classic active learning and semi-supervised techniques, DYNAMO achieves 20% higher attribution accuracy and nearly perfect detection accuracy for unknown botnet campaigns with minimal annotations.

Bio: Hélène Orsini is a 3-year PhD student in Cybersecurity and AI at Inria PIRAT team. She primarily focuses on using AI techniques to analyze the behavior of botnet campaigns.

10:35-11:00 CROISSANT: malware behavioral clustering based on ontological pattern signatures

Speaker: Vincent Raulin, PhD student at INRIA PIRAT team

Abstract: Malware analysis consists of studying a sample of suspicious code to understand it and producing a representation or explanation of this code that can be used by a human expert or a clustering/classification/detection tool. Most detection tools including anti-viruses are based on static signatures, meaning that they look for specific already-seen patterns in samples. Static analysis can be avoided by the attacker using obfuscation or dynamic code-loading techniques and performs poorly on zero-day malware. Dynamic analysis studies the malware’s behavior: the question is then to decide whether an observed behavior is malicious or not. We introduce a new method of performing malware analysis named CROISSANT: a behavioral clustering system that groups malware samples that show similar behaviors and provides dynamic explainable signatures based on the BAGUETTE ontology.

Bio: Vincent Raulin isa 3rd year PhD student of PIRAT team. His research interests include explainable malware clustering, graph-structured representation of malware behavioral reports. He is also the author of the open-sourced tool Baguette for encoding malware behavioral reports into graph-structured representations to facilitate malware behavior explanation and similarity matching: https://gitlab.inria.fr/vraulin/baguette-verse.

11:00-11:10 Coffee break

11:10-12:10 Invited talk: Features Analysis of Threats in Microprocessors: Detection & Mitigation Techniques

Speaker: Dr. Alessandro Palumbo, Associate Professor, INRIA SUSHI team, CentraleSupelec, Rennes Campus

Abstract: Software-exploitable Hardware Trojan Horses can be inserted into Microprocessors allowing attackers to run their own software or to gain unauthorized privileges. On the other hand, observing some features of the Microprocessor (apparently unrelated to its program run), a malicious user may gain information to steal secrets and private data. As a consequence, the devices that are built in safe foundries could also be attacked. Implementing Hardware Security Modules that look at the runtime Microprocessor behavior is a new approach to detecting whether attacks are running.

Why do we need hardware modules to protect against attacks? Aren’t software solutions enough? It’s extremely challenging for software to protect from vulnerabilities close to hardware; Hardware Security Modules operate at the circuit level. Consequently, they are well-suited to detect and defend against low-level attacks.

Bio: Alessandro Palumbo is an Associate Professor at CentraleSupelèc, Université Paris-Saclay, and an Associate Researcher at University of Rennes, CNRS, IRISA, France IRISA Lab, Inria. He received a Ph.D. in Electronics Engineering in 2022 at the university of Tor Vergata for his research titled “Features Analysis of Microarchitectural Attacks and Hardware Trojans in Microprocessors: Detection & Mitigation Techniques.” He took a master’s degree in Electronics Engineering for Telecommunications and Multimedia at the same university, where he also received his bachelor’s degree in Electronics Engineering. In 2022/2023 A.Y., he was an Assistant Researcher at Politecnico di Milano. There, his research activity was titled “Design of Integrated Circuits for High-Security Primitive of In-Memory Computing.” His research focus is Hardware Security. In particular, his interests include hardware acceleration of networking functions and CPU microarchitectures, with particular emphasis on Machine Learning techniques and Probabilistic Data Structures to guarantee security and reliability in microprocessor-based systems in both FPGA and In-Memory Computing scenarios. More information here: https://palessumbo.github.io/

Lunch break offered

13:35 – 14:00 Defensive Randomization Against Adversarial Attacks in Image-Based Android Malware Detection

Speaker: Tianwei Lan, PhD student of Université Paris Cité.

Abstract: The extensive popularity of Android operating system hones the increased malware attacks and threatens the Android ecosystem. Machine learning is one of the versatile tools to detect legacy and new malware with high accuracy. However, these Machine Learning (ML) models are vulnerable to adversarial attacks, which severely threaten their cybersecurity deployment. To combat the deterrence of ML models against adversarial attacks, we propose a novel randomization method as a defense for image-based detection systems. In addition to defensive randomization, the paper also introduces a novel method, called AutoE, for transforming an APK to an image by leveraging API calls only. To evaluate the effectiveness of randomization as a defense against adversarial settings, we compare our AutoE with two state-of-the-art image-based Android malware detection systems. The experimental results reveal that the randomization is a strong defensive hood for image-based Android malware detection systems against adversarial attacks. Moreover, our novel AutoE detects malware with 96% accuracy and the randomization approach makes it harder against adversarial attacks.

Bio: Tianwei Lan received the French engineering degree in electrical engineering from INSA Lyon (France), and the M.Sc. degree in artificial intelligence from Sorbonne University (France). He is currently pursuing the Ph.D. degree with Université Paris Cité (France). His research interests include machine learning, security, and malware detection.

14:00 – 15:00 Invited talk: Pentesting Windows malware detectors with Adversarial EXEmples (co-hosted with the webinar of DefMal)

Speaker: Dr. Luca Demetrio, Assistant Professor at University of Genoa

Abstract: Machine learning for malware detection has received a great boost in popularity, given its inhuman performances with extremely-low numbers of false alarms, compared to static signature which are unable to cope with all the possible variants. However, recent research shows that these techniques are not bullet-proof since they are vulnerable to Adversarial EXEmples, carefully-crafted malware samples optimised to bypass detection.

These are implemented through manipulations that preserve the original functionality, and their generation can be easily automated and targeted against both machine learning models and commercially-available antivirus programs. Hence, in this talk, we will provide insights on how to properly formulate these novel threats, and how they can be used to test malware detectors. Thanks to cutting-edge advancements, we will also share details on possible defenses and mitigations against Adversarial EXEmples, and we will close by highlighting limitations and possible future directions to improve this novel research field.

Bio: Luca Demetrio is Assistant Professor at the University of Genoa, and he received his Ph.D. in 2021. His research focuses on assessing the security of machine learning threat detectors, with a strong focus on Windows malware. He is first author on several paper on the topic, and he is maintainer of SecML Malware (https://github.com/pralab/secml_malware) which automates the generation of adversarial EXEmples. He has been awarded with an honourable mention by the “Gruppo 2003” for your researchers in 2023 for his contribution on the topic, and he is reviewer for top-tier conferences like USENIX and ICLR. Also, he took part to industrial conferences like TROOPERS, and, together with other people, he will also deliver a training to BlackHat 2024 covering machine learning for malware detection and pentesting techniques with EXEmples.

15:00 – 15:10 Coffee break

15:10-15:35 Inducing systematic targeted mismatches to ML-based binary function classifiers

Speaker: Gabriel Sauger, PhD student of CARBONE team at LORIA.

Abstract: Machine learning has become proeminent in solutions to the problem of binary function code classification. The goal is, given an unknown function binary code, to be able to recognize it, given a database or known patterns. Many classifiers have been published, using static analysis extracted features, and show extremely good performances on their benchmarks, being able to recognize functions thought compilation optimization options, target architectures, versions of the compiler or project, and even obufscations. However our work has shown that with carefully crafted modifications to the source code and the compiled assembly code of a function p, we are able, under selected functions size conditions, to have the function p be misclassified as a target function t in more than 50%, up to 80% depending on the defender’s classifier. We achieve this without having to query the defender’s classifier.
This raises questions about the relevance of the features that are currently commonly selected to identify the semantics of function binary code in the literature, and the performances of those models in the context of an attack.

Bio: Gabriel Sauger is a fourth year PhD student under the supervision of Jean-Yves Marion. I first built a static disassembler with a code classification module using Capstone, that gave me some insight on the related literature, that brought me to the current work I’m doing.
I have a general civil engineer formation at les Mines de Nancy, specialized in mathematics.

15:35-16:00 Summary talk

Please contact Dr.Yufei Han yufei.han@inria for more information / registration to the workshop.

Comments are closed.