In the “ELSA Use Case Updates” series, we share insights into the progress of research within the ELSA Use Cases. We speak with ELSA’s Use Case owners, leading researchers, project managers, and engineers.
Have you read about a new, better navigation app? Is your neighbour trying out a helpful tax software? A parent recommended an educational maths game for your 10-year-old? These apps sound great, let’s download and try them out! This is how most users treat software. Based on information and sources that they consider trustworthy, they visit their phones’ “app store” and download an app, unaware of the risks that can already come with the installation. Because there is a catch: All of this software could be malware that can already cause harm simply by being opened.
The ELSA researchers in our “Cybersecurity – Malware Detection” use case stepped up to take on this challenge. They are developing a model to identify malware in the Android app store, helping users avoid downloading potentially malicious apps. The use case is led by the University of Cagliari and Pluribus One. To learn more about their approach, ELSA Communication Manager Sandra Engel interviewed Maura Pintor, Assistant Professor at the University of Cagliari (UniCa), and Davide Ariu, co-founder and CEO of Pluribus One.
1. The Why: The Challenge of Detecting Malware.
End users’ phones and their personal data are prone to attacks. And users can often do very little about it. They rely on external information and trustworthy sources to determine whether software is safe. But we all know that on the internet, comments, ratings, or shops that look trustworthy can easily be fake.
Hence, we need professional cybersecurity solutions and malware-detecting software that make the digital world safer.
Some malware-detecting systems use Artificial Intelligence. What sounds like a handy approach poses a cumbersome problem, as the design of our use case describes:
Many anti-malware solutions are empowered by machine learning and data-driven AI algorithms. However, such algorithms fail to generalize well outside their training data distribution. As malware developers constantly manipulate their malicious samples to bypass detection, AI models need frequent retraining on past and newly collected data. This demands constant human intervention and dedicated resources.
In this use case, the teams at UniCa and Pluribus One aim to overcome this problem by automating the process of building and deploying AI-based malware detection systems that can be maintained with less effort and respond more promptly to novel threats. Their approach aims to reduce the effort for (re)training AI models, minimize human input, and improve malware detection systems.
2. The How: Model, Benchmark, Challenge(s)
The use case consists of three main pillars:
- Creating a robust, low-maintenance statistical model that predicts if an app for Android devices is malicious or not,
- A benchmark for testing the developed model and easily comparing its performance with other models.
- Challenges that test the model in real-world scenarios.
Let’s dive into each topic:
The use case aims to create a reliable malware detection model that requires neither frequent, extensive retraining nor extensive human supervision.
Maura Pintor explains how their model works:
“In the Android store, every app is published with a manifest that includes which phone data, apps, etc., the app is going to access. The apps can solely access the items listed in their manifests. Our model checks a specific pattern: the combination of the app’s access requests stated in the manifest, the instructions the app gives during and after installation, and the consensus between these two. Based on statistics, it predicts the likelihood that an app is actually malware. One indicator is, for example, when apps ask for excessive permissions. It should make you suspicious if, for example, a weather app asks for access to contacts or camera.”
To test their model, the team created a benchmark. It can be downloaded via the ELSA Benchmarks Platform and includes:
- A collection of the app IDs to download the apps outside of the app stores, including respective labels (“malware” or “goodware”)
- Non-malicious apps (in a .zip file)
- App metadata, such as the app release date
- Set of instructions for the apps
- The apps’ manifests and characteristics (features)
The data has been collected via AdroZoo from 2019 to 2023. VirusTotal and VirusShare were used to classify and label the data. The benchmark does not include malicious apps themselves, Maura Pintor explains. It only contains instructions on where and how to download the apps, along with the precautions to take. “It is important that the use case does not distribute malicious software itself”, she states.
The use cases’ own model has been trained on this data, providing researchers with all the information they need to determine whether it correctly classifies malicious apps.
In true ELSA style, the use case outlined several challenges to put their model through the acid test.
Maura Mintor: “UniCa planned and supervised the competition along with the SaTML 2025 conference. For the competition, we released the datasets and benchmarks and leveraged the SaTML conference to promote the challenge and attract submissions. We organised the three tracks to test our model’s robustness:
- Track 1: Adversarial Robustness to Feature-space Attacks.
- Track 2: Adversarial Robustness to Problem-space Attacks.
- Track 3: Temporal Robustness to Data Drift.
The competition allowed us deep insights into our model’s performance: Does it stay robust under attacks and over time? Which methods destabilise our models? Which ones beat it and stopped it from detecting malware? Is robustness to attacks and robustness to temporal decays achievable?”
Via these challenges, the use case found out that “the most efficient methods are the ones which don’t give too much weight to one piece of information, but the ones that combine several relevant characteristics to make their final decision,” Pintor shares. Her colleague, Angelo Sotgiu, assistant professor at the University of Cagliari, also played a significant role in organizing the competitions.
3. The Where: Practical Use
Techniques very similar to those used in Maura Pintor’s research have been applied in another cybersecurity domain: web security. Davide Ariu from Pluribus One explains:
“We have developed a full prototype that includes the full model. The research insights have also been included in a research paper titled “ModSec-AdvLearn: Countering Adversarial SQL Injections with Robust Machine Learning”. The insights are also relevant for Seer Box and OWASP WARM. Seer Box is an Application Detection and Response (ADR) solution that detects and responds to attacks against web applications and APIs. OWASP WARM is an open source project that improves “the effectiveness of the rulesets used by Web Application Firewalls (WAFs) to protect web applications and APIs from attacks.”
The benchmark will remain available to researchers and anyone interested via the ELSA Benchmarks Platform.
4. The collaboration: Academia and Industry working hand in hand
In this use case, the University of Cagliari covers the academic part of the research, while Pluribus One serves as the connection to industry and real-world use.
Maura Pintor highlights Pluribus One’s expertise: “The team of Pluribus Onehelped us with their knowledge about the cybersecurity field: Is the scenario that we are painting realistic? Will it be helpful? Is the benchmark’s design realistic? Last but not least, they provided essential support to store a large amount of data.”
Davide Ariu values the stable, close connection to academia: “For Pluribus One, this is the ideal case: people in our company are working on something with impact for the market, but still do research and stay closely connected to academia. We bring research to life.”

