Publications - SECRET

Ceschin, Fabrício; Botacin, Marcus; Bifet, Albert; Pfahringer, Bernhard; Oliveira, Luiz S; Gomes, Heitor Murilo; Grégio, André

Machine Learning (In) Security: A Stream of Problems Journal Article

Digital Threats, 2023, ISSN: 2692-1626, (Just Accepted).

Abstract | Links | BibTeX | Tags: cybersecurity, Data streams, Machine learning

Giovanini, Luiz; Gilda, Shlok; Silva, Mirela; Ceschin, Fabrício; Shrestha, Prakash; Brant, Christopher; Fernandes, Juliana; Silva, Catia S; Grégio, André; Oliveira, Daniela

People Still Care About Facts: Twitter Users Engage More with Factual Discourse than Misinformation Inproceedings

Security and Privacy in Social Networks and Big Data, pp. 3–22, Springer Nature Singapore, Singapore, 2023, ISBN: 978-981-99-5177-2.

Abstract | Links | BibTeX | Tags:

Pimenta, Thalita Scharr Rodrigues; Ceschin, Fabricio; Gregio, Andre

ANDROIDGYNY: Reviewing Clustering Techniques for Android Malware Family Classification Journal Article

Digital Threats, 2023, ISSN: 2692-1626, (Just Accepted).

Abstract | Links | BibTeX | Tags: Classification, Mobile Malware, Phylogeny

Ceschin, Fabrício; Botacin, Marcus; Gomes, Heitor Murilo; Pinagé, Felipe; Oliveira, Luiz S; Grégio, André

Fast & Furious: On the modelling of malware detection as an evolving data stream Journal Article

Expert Systems with Applications, pp. 118590, 2022, ISSN: 0957-4174.

Abstract | Links | BibTeX | Tags: Android, Concept drift, Data streams, Machine learning, malware detection

@article{CESCHIN2022118590,
title = {Fast & Furious: On the modelling of malware detection as an evolving data stream},
author = {Fabrício Ceschin and Marcus Botacin and Heitor Murilo Gomes and Felipe Pinagé and Luiz S Oliveira and André Grégio},
url = {https://www.sciencedirect.com/science/article/pii/S0957417422016463
https://secret.inf.ufpr.br/papers/fabricio_eswa_22.pdf},
doi = {https://doi.org/10.1016/j.eswa.2022.118590},
issn = {0957-4174},
year = {2022},
date = {2022-08-22},
journal = {Expert Systems with Applications},
pages = {118590},
abstract = {Malware is a major threat to computer systems and imposes many challenges to cyber security. Targeted threats, such as ransomware, cause millions of dollars in losses every year. The constant increase of malware infections has been motivating popular antiviruses (AVs) to develop dedicated detection strategies, which include meticulously crafted machine learning (ML) pipelines. However, malware developers unceasingly change their samples’ features to bypass detection. This constant evolution of malware samples causes changes to the data distribution (i.e., concept drifts) that directly affect ML model detection rates, something not considered in the majority of the literature work. In this work, we evaluate the impact of concept drift on malware classifiers for two Android datasets: DREBIN (≈130K apps) and a subset of AndroZoo (≈285K apps). We used these datasets to train an Adaptive Random Forest (ARF) classifier, as well as a Stochastic Gradient Descent (SGD) classifier. We also ordered all datasets samples using their VirusTotal submission timestamp and then extracted features from their textual attributes using two algorithms (Word2Vec and TF-IDF). Then, we conducted experiments comparing both feature extractors, classifiers, as well as four drift detectors (Drift Detection Method, Early Drift Detection Method, ADaptive WINdowing, and Kolmogorov–Smirnov WINdowing) to determine the best approach for real environments. Finally, we compare some possible approaches to mitigate concept drift and propose a novel data stream pipeline that updates both the classifier and the feature extractor. To do so, we conducted a longitudinal evaluation by (i) classifying malware samples collected over nine years (2009–2018), (ii) reviewing concept drift detection algorithms to attest its pervasiveness, (iii) comparing distinct ML approaches to mitigate the issue, and (iv) proposing an ML data stream pipeline that outperformed literature approaches, achieving an improvement of 22.05 percentage points of F1Score in the DREBIN dataset, and 8.77 in the AndroZoo dataset.},
keywords = {Android, Concept drift, Data streams, Machine learning, malware detection},
pubstate = {published},
tppubtype = {article}
}

Close

Malware is a major threat to computer systems and imposes many challenges to cyber security. Targeted threats, such as ransomware, cause millions of dollars in losses every year. The constant increase of malware infections has been motivating popular antiviruses (AVs) to develop dedicated detection strategies, which include meticulously crafted machine learning (ML) pipelines. However, malware developers unceasingly change their samples’ features to bypass detection. This constant evolution of malware samples causes changes to the data distribution (i.e., concept drifts) that directly affect ML model detection rates, something not considered in the majority of the literature work. In this work, we evaluate the impact of concept drift on malware classifiers for two Android datasets: DREBIN (≈130K apps) and a subset of AndroZoo (≈285K apps). We used these datasets to train an Adaptive Random Forest (ARF) classifier, as well as a Stochastic Gradient Descent (SGD) classifier. We also ordered all datasets samples using their VirusTotal submission timestamp and then extracted features from their textual attributes using two algorithms (Word2Vec and TF-IDF). Then, we conducted experiments comparing both feature extractors, classifiers, as well as four drift detectors (Drift Detection Method, Early Drift Detection Method, ADaptive WINdowing, and Kolmogorov–Smirnov WINdowing) to determine the best approach for real environments. Finally, we compare some possible approaches to mitigate concept drift and propose a novel data stream pipeline that updates both the classifier and the feature extractor. To do so, we conducted a longitudinal evaluation by (i) classifying malware samples collected over nine years (2009–2018), (ii) reviewing concept drift detection algorithms to attest its pervasiveness, (iii) comparing distinct ML approaches to mitigate the issue, and (iv) proposing an ML data stream pipeline that outperformed literature approaches, achieving an improvement of 22.05 percentage points of F1Score in the DREBIN dataset, and 8.77 in the AndroZoo dataset.

Close

Giovanini, Luiz; Ceschin, Fabrício; Silva, Mirela; Chen, Aokun; Kulkarni, Ramchandra; Banda, Sanjay; Lysaght, Madison; Qiao, Heng; Sapountzis, Nikolaos; Sun, Ruimin; Matthews, Brandon; Wu, Dapeng Oliver; Grégio, André; Oliveira, Daniela

Online Binary Models are Promising for Distinguishing Temporally Consistent Computer Usage Profiles Journal Article

IEEE Transactions on Biometrics, Behavior, and Identity Science, pp. 1-1, 2022.

Links | BibTeX | Tags:

Botacin, Marcus; Moreira, Francis B; Navaux, Philippe O A; Grégio, André; Alves, Marco A Z

Terminator: A Secure Coprocessor to Accelerate Real-Time AntiViruses Using Inspection Breakpoints Journal Article

ACM Trans. Priv. Secur., 25 (2), 2022, ISSN: 2471-2566.

Abstract | Links | BibTeX | Tags: antivirus, coprocessor, malware

@article{10.1145/3494535,
title = { Terminator: A Secure Coprocessor to Accelerate Real-Time AntiViruses Using Inspection Breakpoints},
author = {Marcus Botacin and Francis B Moreira and Philippe O A Navaux and André Grégio and Marco A Z Alves},
url = {https://doi.org/10.1145/3494535
https://secret.inf.ufpr.br/papers/marcus_coproc.pdf},
doi = {10.1145/3494535},
issn = {2471-2566},
year = {2022},
date = {2022-03-01},
journal = {ACM Trans. Priv. Secur.},
volume = {25},
number = {2},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
abstract = {AntiViruses (AVs) are essential to face the myriad of malware threatening Internet users. AVs operate in two modes: on-demand checks and real-time verification. Software-based real-time AVs intercept system and function calls to execute AV’s inspection routines, resulting in significant performance penalties as the monitoring code runs among the suspicious code. Simultaneously, dark silicon problems push the industry to add more specialized accelerators inside the processor to mitigate these integration problems. In this article, we propose Terminator, an AV-specific coprocessor to assist software AVs by outsourcing their matching procedures to the hardware, thus saving CPU cycles and mitigating performance degradation. We designed Terminator to be flexible and compatible with existing AVs by using YARA and ClamAVrules. Our experiments show that our approach can save up to 70 million CPU cycles per rule when outsourcing on-demand checks for matching typical, unmodified YARA rules against a dataset of 30 thousand in-the-wild malware samples. Our proposal eliminates the AV’s need for blocking the CPU to perform full system checks, which can now occur in parallel. We also designed a new inspection breakpoint mechanism that signals to the coprocessor the beginning of a monitored region, allowing it to scan the regions in parallel with their execution. Overall, our mechanism mitigated up to 44% of the overhead imposed to execute and monitor the SPEC benchmark applications in the most challenging scenario.},
keywords = {antivirus, coprocessor, malware},
pubstate = {published},
tppubtype = {article}
}

Close

Botacin, Marcus; Alves, Marco Zanata; Oliveira, Daniela; Grégio, André

HEAVEN: A Hardware-Enhanced AntiVirus ENgine to accelerate real-time, signature-based malware detection Journal Article

Expert Systems with Applications, pp. 117083, 2022, ISSN: 0957-4174.

Abstract | Links | BibTeX | Tags: antivirus, Branch prediction, malware, Performance, Signatures

@article{BOTACIN2022117083,
title = {HEAVEN: A Hardware-Enhanced AntiVirus ENgine to accelerate real-time, signature-based malware detection},
author = {Marcus Botacin and Marco Zanata Alves and Daniela Oliveira and André Grégio},
url = {https://www.sciencedirect.com/science/article/pii/S0957417422004882
https://secret.inf.ufpr.br/papers/marcus_heaven.pdf},
doi = {https://doi.org/10.1016/j.eswa.2022.117083},
issn = {0957-4174},
year = {2022},
date = {2022-01-01},
journal = {Expert Systems with Applications},
pages = {117083},
abstract = {Antiviruses (AVs) are computing-intensive applications that rely on constant monitoring of OS events and on applying pattern matching procedures on binaries to detect malware. In this paper, we introduce HEAVEN, a framework for Intel x86/x86-64 and MS Windows that combines hardware and software to improve AVs performance. HEAVEN workflow consists of a hardware-assisted signature matching process as its first step (triage), which is fast, and only invokes the software-based AV when the software is suspicious, i.e., with an unknown hardware signature for malignity. We implement a PoC for HEAVEN by instrumenting Intel’s x86/x86-64 branch predictor, which allows for the generation of malware signatures based on branch pattern history. To validate our PoC, we evaluate HEAVEN with a dataset composed of 10,000 malware and 1,000 benign software samples from different categories and accomplished malware detection rates of 100% (no false-positives). The detection occurred before the execution of 10% of the samples’ code. HEAVEN is designed to be memory efficient: it identified unique 32-bit signatures for each sample at the storage cost of only 35KB of SRAM. HEAVEN is also designed with processing efficiency in mind: its hardware extensions present negligible performance overhead and reduces the average workload of the chosen software AV counterpart (ClamWin)—10% for CPU usage, 5.61% for memory throughput, 16.22% for disk writes, and 20.22% for disk reads. With HEAVEN, we may decrease the number of CPU cycles used for malware scanning by 87.5%, which is a promising result regarding the feasibility of our proposal: the combination of hardware-/software-based AVs for practical and effective malware detection that flags suspicious software while posing negligible performance overhead.},
keywords = {antivirus, Branch prediction, malware, Performance, Signatures},
pubstate = {published},
tppubtype = {article}
}

Close

Botacin, Marcus; Grégio, André

Why We Need a Theory of Maliciousness: Hardware Performance Counters in Security Inproceedings

Susilo, Willy; Chen, Xiaofeng; Guo, Fuchun; Zhang, Yudi; Intan, Rolly (Ed.): Information Security, pp. 381–389, Springer International Publishing, Cham, 2022, ISBN: 978-3-031-22390-7.

Abstract | Links | BibTeX | Tags:

Botacin, Marcus; Grégio, André

Dissecting Applications Uninstallers and Removers: Are They Effective? Inproceedings

Susilo, Willy; Chen, Xiaofeng; Guo, Fuchun; Zhang, Yudi; Intan, Rolly (Ed.): Information Security, pp. 339–359, Springer International Publishing, Cham, 2022, ISBN: 978-3-031-22390-7.

Abstract | Links | BibTeX | Tags:

@inproceedings{10.1007/978-3-031-22390-7_20,
title = {Dissecting Applications Uninstallers and Removers: Are They Effective?},
author = {Marcus Botacin and André Grégio},
editor = {Willy Susilo and Xiaofeng Chen and Fuchun Guo and Yudi Zhang and Rolly Intan},
url = {https://link.springer.com/chapter/10.1007/978-3-031-22390-7_20
https://secret.inf.ufpr.br/papers/isc_uninstallers.pdf},
isbn = {978-3-031-22390-7},
year = {2022},
date = {2022-01-01},
booktitle = {Information Security},
pages = {339--359},
publisher = {Springer International Publishing},
address = {Cham},
abstract = {Developing a safe application is so important as to properly install it in a system, and not an application's tampered version. In a similar note, developers should properly care about applications' uninstall process to avoid leaving traces of sensitive data behind in the system or interfere with the remaining applications. Until now, the academic literature has paid little attention to uninstall procedures so far. Moreover, a whole ecosystem of application uninstallers has been created, making multiple uninstallers available in software repositories. A key point is to understand how these applications work so as to develop stronger systems. To this end, we present a landscape work evaluating the operation of the 11 most downloaded uninstaller applications from the three most popular Internet software repositories. We discovered that most of these applications are not very different from the native Windows uninstaller. Although evaluated uninstallers present a more organized User Interface, thus enhancing usability, they are only able to find the same installed application as the native Windows uninstaller, but not broken installations. Few uninstallers apply heuristics to find broken application installations. However, we show that those heuristics can be abused by attackers to remove third applications. Finally, we also show that none of the removers is resistant to malicious uninstallers that terminate the remover process.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Close

Botacin, Marcus; Aghakhani, Hojjat; Ortolani, Stefano; Kruegel, Christopher; Vigna, Giovanni; Oliveira, Daniela; Geus, Paulo Lício De; Grégio, André

One Size Does Not Fit All: A Longitudinal Analysis of Brazilian Financial Malware Journal Article

ACM Trans. Priv. Secur., 24 (2), 2021, ISSN: 2471-2566.

Abstract | Links | BibTeX | Tags: banking, malware, reverse engineer

@article{10.1145/3429741,
title = {One Size Does Not Fit All: A Longitudinal Analysis of Brazilian Financial Malware},
author = {Marcus Botacin and Hojjat Aghakhani and Stefano Ortolani and Christopher Kruegel and Giovanni Vigna and Daniela Oliveira and Paulo Lício De Geus and André Grégio},
url = {https://doi.org/10.1145/3429741
https://secret.inf.ufpr.br/papers/marcus_tops_br.pdf},
doi = {10.1145/3429741},
issn = {2471-2566},
year = {2021},
date = {2021-01-01},
journal = {ACM Trans. Priv. Secur.},
volume = {24},
number = {2},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
abstract = {Malware analysis is an essential task to understand infection campaigns, the behavior of malicious codes, and possible ways to mitigate threats. Malware analysis also allows better assessment of attackers’ capabilities, techniques, and processes. Although a substantial amount of previous work provided a comprehensive analysis of the international malware ecosystem, research on regionalized, country-, and population-specific malware campaigns have been scarce. Moving towards addressing this gap, we conducted a longitudinal (2012-2020) and comprehensive (encompassing an entire population of online banking users) study of MS Windows desktop malware that actually infected Brazilian banks’ users. We found that the Brazilian financial desktop malware has been evolving quickly: it started to make use of a variety of file formats instead of typical PE binaries, relied on native system resources, and abused obfuscation techniques to bypass detection mechanisms. Our study on the threats targeting a significant population on the ecosystem of the largest and most populous country in Latin America can provide invaluable insights that may be applied to other countries’ user populations, especially those in the developing world that might face cultural peculiarities similar to Brazil’s. With this evaluation, we expect to motivate the security community/industry to seriously consider a deeper level of customization during the development of next-generation anti-malware solutions, as well as to raise awareness towards regionalized and targeted Internet threats.},
keywords = {banking, malware, reverse engineer},
pubstate = {published},
tppubtype = {article}
}

Close

Botacin, Marcus; Ceschin, Fabricio; Sun, Ruimin; Oliveira, Daniela; Grégio, André

Challenges and Pitfalls in Malware Research Journal Article

Computers & Security, pp. 102287, 2021, ISSN: 0167-4048.

Abstract | Links | BibTeX | Tags:

@article{BOTACIN2021102287,
title = {Challenges and Pitfalls in Malware Research},
author = {Marcus Botacin and Fabricio Ceschin and Ruimin Sun and Daniela Oliveira and André Grégio},
url = {https://www.sciencedirect.com/science/article/pii/S0167404821001115
https://secret.inf.ufpr.br/papers/marcus_challenges.pdf},
doi = {https://doi.org/10.1016/j.cose.2021.102287},
issn = {0167-4048},
year = {2021},
date = {2021-01-01},
journal = {Computers & Security},
pages = {102287},
abstract = {As the malware research field became more established over the last two decades, new research questions arose, such as how to make malware research reproducible, how to bring scientific rigor to attack papers, or what is an appropriate malware dataset for relevant experimental results. The challenges these questions pose also brings pitfalls that affect the multiple malware research stakeholders. To help answering those questions and to highlight potential research pitfalls to be avoided, in this paper, we present a systematic literature review of 491 papers on malware research published in major security conferences between 2000 and 2018. We identified the most common pitfalls present in past literature and propose a method for assessing current (and future) malware research. Our goal is towards integrating science and engineering best practices to develop further, improved research by learning from issues present in the published body of work. As far as we know, this is the largest literature review of its kind and the first to summarize research pitfalls in a research methodology that avoids them. In total, we discovered 20 pitfalls that limit current research impact and reproducibility. The identified pitfalls range from (i) the lack of a proper threat model, that complicates paper’s evaluation, to (ii) the use of closed-source solutions and private datasets, that limit reproducibility. We also report yet-to-be-overcome challenges that are inherent to the malware nature, such as non-deterministic analysis results. Based on our findings, we propose a set of actions to be taken by the malware research and development community for future work: (i) Consolidation of malware research as a field constituted of diverse research approaches (e.g., engineering solutions, offensive research, landscapes/observational studies, and network traffic/system traces analysis); (ii) design of engineering solutions with clearer, direct assumptions (e.g., positioning solutions as proofs-of-concept vs. deployable); (iii) design of experiments that reflects (and emphasizes) the target scenario for the proposed solution (e.g., corporation, user, country-wide); (iv) clearer exposition and discussion of limitations of used technologies and exercised norms/standards for research (e.g., the use of closed-source antiviruses as ground-truth).},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

As the malware research field became more established over the last two decades, new research questions arose, such as how to make malware research reproducible, how to bring scientific rigor to attack papers, or what is an appropriate malware dataset for relevant experimental results. The challenges these questions pose also brings pitfalls that affect the multiple malware research stakeholders. To help answering those questions and to highlight potential research pitfalls to be avoided, in this paper, we present a systematic literature review of 491 papers on malware research published in major security conferences between 2000 and 2018. We identified the most common pitfalls present in past literature and propose a method for assessing current (and future) malware research. Our goal is towards integrating science and engineering best practices to develop further, improved research by learning from issues present in the published body of work. As far as we know, this is the largest literature review of its kind and the first to summarize research pitfalls in a research methodology that avoids them. In total, we discovered 20 pitfalls that limit current research impact and reproducibility. The identified pitfalls range from (i) the lack of a proper threat model, that complicates paper’s evaluation, to (ii) the use of closed-source solutions and private datasets, that limit reproducibility. We also report yet-to-be-overcome challenges that are inherent to the malware nature, such as non-deterministic analysis results. Based on our findings, we propose a set of actions to be taken by the malware research and development community for future work: (i) Consolidation of malware research as a field constituted of diverse research approaches (e.g., engineering solutions, offensive research, landscapes/observational studies, and network traffic/system traces analysis); (ii) design of engineering solutions with clearer, direct assumptions (e.g., positioning solutions as proofs-of-concept vs. deployable); (iii) design of experiments that reflects (and emphasizes) the target scenario for the proposed solution (e.g., corporation, user, country-wide); (iv) clearer exposition and discussion of limitations of used technologies and exercised norms/standards for research (e.g., the use of closed-source antiviruses as ground-truth).

Close

Botacin, Marcus; Moia, Vitor Hugo Galhardo; Ceschin, Fabricio; Henriques, Marco Amaral A; Grégio, André

Understanding uses and misuses of similarity hashing functions for malware detection and family clustering in actual scenarios Journal Article

Forensic Science International: Digital Investigation, 38 , pp. 301220, 2021, ISSN: 2666-2817.

Abstract | Links | BibTeX | Tags:

@article{BOTACIN2021301220,
title = {Understanding uses and misuses of similarity hashing functions for malware detection and family clustering in actual scenarios},
author = {Marcus Botacin and Vitor Hugo Galhardo Moia and Fabricio Ceschin and Marco A Amaral Henriques and André Grégio},
url = {https://www.sciencedirect.com/science/article/pii/S2666281721001281
https://secret.inf.ufpr.br/papers/marcus_similarity_hashing.pdf},
doi = {https://doi.org/10.1016/j.fsidi.2021.301220},
issn = {2666-2817},
year = {2021},
date = {2021-01-01},
journal = {Forensic Science International: Digital Investigation},
volume = {38},
pages = {301220},
abstract = {An everyday growing number of malware variants target end-users and organizations. To reduce the amount of individual malware handling, security analysts apply techniques for finding similarities to cluster samples. A popular clustering method relies on similarity hashing functions, which create short representations of files and compare them to produce a score related to the similarity level between them. Despite the popularity of those functions, the limits of their application to malware samples have not been extensively studied so-far. To help in bridging this gap, we performed a set of experiments to characterize the application of these functions on long-term, realistic malware analysis scenarios. To do so, we introduce SHAVE, an ideal model of similarity hashing-based antivirus engine. The evaluation of SHAVE consisted of applying two distinct hash functions (ssdeep and sdhash) to a dataset of 21 thousand actual malware samples collected over four years. We characterized this dataset based on the performed clustering, and discovered that: (i) smaller groups are prevalent than large ones; (ii) the threshold value chosen may significantly change the conclusions about the prevalence of similar samples in a given dataset; (iii) establishing a ground-truth for similarity hashing functions comparison has its issues, since the clusters originated from traditional AV labeling routines may result from a completely distinct approach; (iv) the application of similarity hashing functions improves traditional AVs’ detection rates by up to 40%; and finally (v) taking specific binary regions into account (e.g., instructions), leads to better classification results than hashing the entire binary file.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Botacin, Marcus; Domingues, Felipe Duarte; Ceschin, Fabrício; Machnicki, Raphael; Alves, Marco Antonio Zanata; de Geus, Paulo Lício; Grégio, André

AntiViruses under the Microscope: A Hands-On Perspective Journal Article

Computers & Security, pp. 102500, 2021, ISSN: 0167-4048.

Abstract | Links | BibTeX | Tags:

@article{BOTACIN2021102500,
title = {AntiViruses under the Microscope: A Hands-On Perspective},
author = {Marcus Botacin and Felipe Duarte Domingues and Fabrício Ceschin and Raphael Machnicki and Marco Antonio Zanata Alves and Paulo Lício de Geus and André Grégio},
url = {https://www.sciencedirect.com/science/article/pii/S0167404821003242
https://secret.inf.ufpr.br/papers/marcus_av_handson.pdf},
doi = {https://doi.org/10.1016/j.cose.2021.102500},
issn = {0167-4048},
year = {2021},
date = {2021-01-01},
journal = {Computers & Security},
pages = {102500},
abstract = {AntiViruses (AVs) are the main defense line against attacks for most users and much research has been done about them, especially proposing new detection procedures that work in academic prototypes. However, as most current and commercial AVs are closed-source solutions, in practice, little is known about their real internals: information such as what is a typical AV database size, the detection methods effectively used in each operation mode, and how often on average the AVs are updated are still unknown. This prevents research work from meeting the industrial practices more thoroughly. To fill this gap, in this work, we systematize the knowledge about AVs. To do so, we first surveyed the literature and identified existing knowledge gaps in AV internals’ working. Further, we bridged these gaps by analyzing popular (Windows, Linux, and Android) AV solutions to check their operations in practice. Our methodology encompassed multiple techniques, from tracing to fuzzing. We detail current AV’s architecture, including their multiple components, such as browser extensions and injected libraries, regarding their implementation, monitoring features, and self-protection capabilities. We discovered, for instance, a great disparity in the set of API functions hooked by the distinct AV’s libraries, which might have a significant impact in the viability of academically-proposed detection models (e.g., machine learning-based ones).},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Ceschin, Fabricio; Botacin, Marcus; Lüders, Gabriel; Gomes, Heitor Murilo; Oliveira, Luiz; Gregio, Andre

No Need to Teach New Tricks to Old Malware: Winning an Evasion Challenge with XOR-Based Adversarial Samples Inproceedings

Reversing and Offensive-Oriented Trends Symposium, pp. 13–22, Association for Computing Machinery, Vienna, Austria, 2020, ISBN: 9781450389747.

Abstract | Links | BibTeX | Tags:

Botacin, Marcus; Ceschin, Fabricio; de Geus, Paulo; Grégio, André

We Need to Talk About AntiViruses: Challenges & Pitfalls of AV Evaluations Journal Article

Computers & Security, pp. 101859, 2020, ISSN: 0167-4048.

Abstract | Links | BibTeX | Tags:

@article{BOTACIN2020101859,
title = {We Need to Talk About AntiViruses: Challenges & Pitfalls of AV Evaluations},
author = {Marcus Botacin and Fabricio Ceschin and Paulo de Geus and André Grégio},
url = {http://www.sciencedirect.com/science/article/pii/S0167404820301310
https://secret.inf.ufpr.br/papers/marcus_av.pdf},
doi = {https://doi.org/10.1016/j.cose.2020.101859},
issn = {0167-4048},
year = {2020},
date = {2020-04-29},
journal = {Computers & Security},
pages = {101859},
abstract = {Security evaluation is an essential task to identify the level of protection accomplished in running systems or to aid in choosing better solutions for each specific scenario. Although antiviruses (AVs) are one of the main defensive solutions for most end-users and corporations, AV’s evaluations are conducted by few organizations and often limited to compare detection rates. Moreover, other important factors of AVs’ operating mode (e.g., response time and detection regression) are usually underestimated. Ignoring such factors create an “understanding gap” on the effectiveness of AVs in actual scenarios, which we aim to bridge by presenting a broader characterization of current AVs’ modes of operation. In our characterization, we consider distinct file types, operating systems, datasets, and time frames. To do so, we daily collected samples from two distinct, representative malware sources and submitted them to the VirusTotal (VT) service for 30 consecutive days. In total, we considered 28,875 unique malware samples. For each day, we retrieved the submitted samples’ detection rates and assigned labels, resulting in more than 1M distinct VT submissions overall. Our experimental results show that: (i) phishing contexts are a challenge for all AVs, turning malicious Web pages detectors less effective than malicious files detectors; (ii) generic procedures are insufficient to ensure broad detection coverage, incurring in lower detection rates for particular datasets (e.g., country-specific) than for those with world-wide collected samples; (iii) detection rates are unstable since all AVs presented detection regression effects after scans in different time frames using the same dataset and (iv) AVs’ long response times in delivering new signatures/heuristics create a significant attack opportunity window within the first 30 days after we first identified a malicious binary. To address the effects of our findings, we propose six new metrics to evaluate the multiple aspects that impact the effectiveness of AVs. With them, we hope to assess corporate (and domestic) users to better evaluate the solutions that fit their needs more adequately.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Security evaluation is an essential task to identify the level of protection accomplished in running systems or to aid in choosing better solutions for each specific scenario. Although antiviruses (AVs) are one of the main defensive solutions for most end-users and corporations, AV’s evaluations are conducted by few organizations and often limited to compare detection rates. Moreover, other important factors of AVs’ operating mode (e.g., response time and detection regression) are usually underestimated. Ignoring such factors create an “understanding gap” on the effectiveness of AVs in actual scenarios, which we aim to bridge by presenting a broader characterization of current AVs’ modes of operation. In our characterization, we consider distinct file types, operating systems, datasets, and time frames. To do so, we daily collected samples from two distinct, representative malware sources and submitted them to the VirusTotal (VT) service for 30 consecutive days. In total, we considered 28,875 unique malware samples. For each day, we retrieved the submitted samples’ detection rates and assigned labels, resulting in more than 1M distinct VT submissions overall. Our experimental results show that: (i) phishing contexts are a challenge for all AVs, turning malicious Web pages detectors less effective than malicious files detectors; (ii) generic procedures are insufficient to ensure broad detection coverage, incurring in lower detection rates for particular datasets (e.g., country-specific) than for those with world-wide collected samples; (iii) detection rates are unstable since all AVs presented detection regression effects after scans in different time frames using the same dataset and (iv) AVs’ long response times in delivering new signatures/heuristics create a significant attack opportunity window within the first 30 days after we first identified a malicious binary. To address the effects of our findings, we propose six new metrics to evaluate the multiple aspects that impact the effectiveness of AVs. With them, we hope to assess corporate (and domestic) users to better evaluate the solutions that fit their needs more adequately.

Close

Botacin, Marcus; de Geus, Paulo Lício; Grégio, André

Leveraging branch traces to understand kernel internals from within Journal Article

Journal of Computer Virology and Hacking Techniques, 2020, ISSN: 2263-8733.

Abstract | Links | BibTeX | Tags:

Botacin, Marcus; Zanata, Marco; Grégio, André

The self modifying code (SMC)-aware processor (SAP): a security look on architectural impact and support Journal Article

Journal of Computer Virology and Hacking Techniques, 2020, ISSN: 2263-8733.

Abstract | Links | BibTeX | Tags:

Sun, R; Botacin, M; Sapountzis, N; Yuan, X; Bishop, M; Porter, D E; Li, X; Gregio, A; Oliveira, D

A Praise for Defensive Programming: LeveragingUncertainty for Effective Malware Mitigation Journal Article

IEEE Transactions on Dependable and Secure Computing, pp. 1-1, 2020.

Links | BibTeX | Tags:

Botacin, Marcus; ~a, Giovanni Bert; de Geus, Paulo; Grégio, André; Kruegel, Christopher; Vigna, Giovanni

On the Security of Application Installers and Online Software Repositories Conference

Detection of Intrusions and Malware, and Vulnerability Assessment, Springer International Publishing, Cham, 2020, ISBN: 978-3-030-52683-2.

Abstract | Links | BibTeX | Tags:

@conference{10.1007/978-3-030-52683-2_10b,
title = {On the Security of Application Installers and Online Software Repositories},
author = {Marcus Botacin and Giovanni Bert{~a}o and Paulo de Geus and André Grégio and Christopher Kruegel and Giovanni Vigna},
editor = {Clémentine Maurice and Leyla Bilge and Gianluca Stringhini and Nuno Neves},
url = {https://link.springer.com/chapter/10.1007/978-3-030-52683-2_10
https://secret.inf.ufpr.br/papers/marcus_dimva_bundle.pdf},
isbn = {978-3-030-52683-2},
year = {2020},
date = {2020-01-01},
booktitle = {Detection of Intrusions and Malware, and Vulnerability Assessment},
pages = {192--214},
publisher = {Springer International Publishing},
address = {Cham},
abstract = {The security of application installers is often overlooked, but the security risks associated to these pieces of code are not negligible. Online public repositories have been one of the most popular ways for end users to obtain software, but there is a lack of systematic security evaluation of popular public repositories. In this paper, we bridge this gap by analyzing five popular software repositories. We focus on their software updating dynamics, as well as the presence of traces of vulnerable and/or trojanized applications among the top-100 most downloaded Windows programs on each of the evaluated repositories. We analyzed 2,935 unique programs collected in a period of 144 consecutive days. Our results show that: (i) the repositories frequently exhibit rank changes due to applications fast climbing toward the first positions; (ii) the repositories often update their payloads, which may cause the distribution of distinct binaries for the same intended application (binaries for the same applications may also be different in each repository); (iii) the installers are composed by multiple components and often download payloads from the Internet to complete their installation steps, posing new risks for users (we demonstrate that some installers are vulnerable to content tampering through man-in-the-middle attacks); (iv) the ever-changing nature of repositories and installers makes them prone to abuse, as we observed that 30% of all applications were reported malicious by at least one AV.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}

Close

Botacin, Marcus; Grégio, André; Alves, Marco Antonio Zanata

Near-Memory & In-Memory Detection of Fileless Malware Inproceedings

The International Symposium on Memory Systems, pp. 23–38, Association for Computing Machinery, Washington, DC, USA, 2020, ISBN: 9781450388993.

Abstract | Links | BibTeX | Tags: antivirus, malware, pattern matching, processing in memory

Botacin, Marcus; Galante, Lucas; de Geus, Paulo; Grégio, André

RevEngE is a Dish Served Cold: Debug-Oriented Malware Decompilation and Reassembly Inproceedings

Proceedings of the 3rd Reversing and Offensive-Oriented Trends Symposium, Association for Computing Machinery, Vienna, Austria, 2019, ISBN: 9781450377751.

Abstract | Links | BibTeX | Tags:

@inproceedings{10.1145/3375894.3375895,
title = {RevEngE is a Dish Served Cold: Debug-Oriented Malware Decompilation and Reassembly},
author = {Marcus Botacin and Lucas Galante and Paulo de Geus and André Grégio},
url = {https://doi.org/10.1145/3375894.3375895
https://secret.inf.ufpr.br/papers/roots_revenge.pdf},
doi = {10.1145/3375894.3375895},
isbn = {9781450377751},
year = {2019},
date = {2019-11-28},
booktitle = {Proceedings of the 3rd Reversing and Offensive-Oriented Trends Symposium},
publisher = {Association for Computing Machinery},
address = {Vienna, Austria},
series = {ROOTS’19},
abstract = {Malware analysis is key for cybersecurity overall improvement. Analysis tools have been evolving from complete static analyzers to decompilers. Malware decompilation allows for code inspection at higher abstraction levels, easing incident response. However, the decompilation procedure has many challenges, such as opaque constructions, irreversible mappings, semantic gap bridging, among others. In this paper, we propose a new approach that leverages the human analyst expertise to overcome decompilation challenges. We name this approach "DoD---debug-oriented decompilation", in which the analyst is able to reverse engineer the malware sample on his own and to instruct the decompiler to translate selected code portions (e.g., decision branches, fingerprinting functions, payloads etc.) into high level code. With DoD, the analyst might group all decompiled pieces into new code to be analyzed by other tool, or to develop a novel malware sample from previous pieces of code and thus exercise a Proof-of-Concept (PoC). To validate our approach, we propose RevEngE, the Reverse Engineering Engine for malware decompilation and reassembly, a set of GDB extensions that intercept and introspect into executed functions to build an Intermediate Representation (IR) in real-time, enabling any-time decompilation. We evaluate RevEngE with x86 ELF binaries collected from VirusShare, and show that a new malware sample created from the decompilation of independent functions of five known malware samples is considered "clean" by all VirusTotal's AVs.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Close

Ceschin, Fabrício; Botacin, Marcus; Gomes, Heitor Murilo; Oliveira, Luiz S; Grégio, André

Shallow Security: On the Creation of Adversarial Variants to Evade Machine Learning-Based Malware Detectors Inproceedings

Proceedings of the 3rd Reversing and Offensive-Oriented Trends Symposium, Association for Computing Machinery, Vienna, Austria, 2019, ISBN: 9781450377751.

Abstract | Links | BibTeX | Tags:

Botacin, Marcus; de Geus, Paulo Lício; Grégio, André

``VANILLA'' malware: vanishing antiviruses by interleaving layers and layers of attacks Journal Article

Journal of Computer Virology and Hacking Techniques, 2019, ISSN: 2263-8733.

Abstract | Links | BibTeX | Tags:

@article{Botacin2019,
title = {``VANILLA'' malware: vanishing antiviruses by interleaving layers and layers of attacks},
author = {Marcus Botacin and Paulo Lício de Geus and André Grégio},
url = {https://secret.inf.ufpr.br/papers/marcus-vanilla.pdf
https://doi.org/10.1007/s11416-019-00333-y},
doi = {10.1007/s11416-019-00333-y},
issn = {2263-8733},
year = {2019},
date = {2019-06-11},
journal = {Journal of Computer Virology and Hacking Techniques},
abstract = {Malware are persistent threats to any networked systems. Recent years increase in multi-core, distributed systems created new opportunities for malware authors to exploit such capabilities. In particular, the distributed execution of a malware in multiple cores may be used to evade currently widespread single-core-based detectors (e.g., antiviruses, or AVs) and malware analysis solutions that are unable to correlate data from multiple sources. In this paper, we propose a technique for distributing the malware functions in several distinct ``vanilla'' processes to show that AVs can be easily evaded. Therefore, our technique allows malware to interleave of layers of attacks to remain undetected by current AVs. Our goal is to expose a real menace and to discuss it so as to provide insights for the development of better AVs. We discuss the role of distributed and multicore-based malware in current and future threat scenarios with practical examples that we specially crafted for testing (e.g., a distributed sample synchronized via cache side channels). We (i) review multi-threaded/processed implementation issues (from kernel and userland) and present a multi-core-based monitoring solution; (ii) present strategies for code distribution, exemplified via DLL injectors, and discuss their weak and strong points; and (iii) evaluate how real security solutions perform when exposed to distributed malware. We converted real, serial malware to parallel code and showed that current AVs are not fully able to detect multi-core malware.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Close

Botacin, Marcus; Galante, Lucas; Ceschin, Fabricio; Santos, Luigi Carro Paulo Cesar; de Geus, Paulo Licio; Gregio, Andre; Zanata, Marco

The AV says: Your hardware definitions were updated! Conference

14th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC 2019), IEEE, 2019, ISBN: 978-1-7281-4770-3.

Links | BibTeX | Tags:

Botacin, Marcus; Kalysch, Anatoli; Grégio, André

The Internet Banking [in]Security Spiral: Past, Present, and Future of Online Banking Protection Mechanisms Based on a Brazilian Case Study Inproceedings

Proceedings of the 14th International Conference on Availability, Reliability and Security, pp. 49:1–49:10, ACM, Canterbury, CA, United Kingdom, 2019, ISBN: 978-1-4503-7164-3.

Links | BibTeX | Tags:

Beppler, Tamy; Botacin, Marcus; Ceschin, Fabrício; Oliveira, Luiz E S; Grégio, André

L(a)ying in (Test)Bed: How Biased Datasets Produce Impractical Results for Actual Malware Families’ Classification Inproceedings

Lin, Zhiqiang; Papamanthou, Charalampos; Polychronakis, Michalis (Ed.): Information Security, pp. 381–401, Springer International Publishing, Cham, 2019, ISBN: 978-3-030-30215-3.

Abstract | Links | BibTeX | Tags: learning (artificial intelligence)

@inproceedings{10.1007/978-3-030-30215-3_19,
title = {L(a)ying in (Test)Bed: How Biased Datasets Produce Impractical Results for Actual Malware Families’ Classification},
author = {Tamy Beppler and Marcus Botacin and Fabrício Ceschin and Luiz E S Oliveira and André Grégio},
editor = {Zhiqiang Lin and Charalampos Papamanthou and Michalis Polychronakis},
url = {https://link.springer.com/chapter/10.1007/978-3-030-30215-3_19
https://secret.inf.ufpr.br//papers/malware_textures_tamy.pdf},
isbn = {978-3-030-30215-3},
year = {2019},
date = {2019-01-01},
booktitle = {Information Security},
pages = {381--401},
publisher = {Springer International Publishing},
address = {Cham},
abstract = {The number of malware variants released daily turned manual analysis into an impractical task. Although potentially faster, automated analysis techniques (e.g., static and dynamic) have shortcomings that are exploited by malware authors to thwart each of them, i.e., prevent malicious software from being detected or classified accordingly. Researchers then invested in traditional machine learning algorithms to try to produce efficient, effective classification methods. The produced models are also prone to errors and attacks. Novel representations of the ``subject'' were proposed to overcome previous limitations, such as malware textures. In this paper, our initial proposal was to evaluate the application of texture analysis for malware classification using samples collected in-the-wild in order to compare them with state-of-the-art results. During our tests, we discovered that texture analysis may be unfeasible for the task at hand, if we use the same malware representation employed by other authors. Furthermore, we also discovered that naive premises associated to the selection of samples in the datasets caused the introduction of biases that, in the end, produced unreal results. Finally, our tests with a broader unfiltered dataset show that texture analysis may be impractical for correct malware classification in a real world scenario, in which there is a great variety of families and some of them make use of quite sophisticate obfuscation techniques.},
keywords = {learning (artificial intelligence)},
pubstate = {published},
tppubtype = {inproceedings}
}

Close

Ceschin, Fabrício; Pinage, Felipe; Castilho, Marcos; Menotti, David; Oliveira, Luis S; Gregio, André

The Need for Speed: An Analysis of Brazilian Malware Classifiers Journal Article

IEEE Security Privacy, 16 (6), pp. 31-41, 2018, ISSN: 1540-7993.

Abstract | Links | BibTeX | Tags: Brazilian malware classifers, Feature extraction, invasive software, learning (artificial intelligence), Machine learning, machine-learning systems, malware, malware classification, pattern classification, security, Security of data, Support vector machines

Botacin, Marcus; de Geus, Paulo Lício; Grégio, André

The other guys: automated analysis of marginalized malware Journal Article

Journal of Computer Virology and Hacking Techniques, 14 (1), pp. 87–98, 2018, ISSN: 2263-8733.

Abstract | Links | BibTeX | Tags:

Botacin, Marcus; Geus, Paulo Lício De; Grégio, André

Who Watches the Watchmen: A Security-focused Review on Current State-of-the-art Techniques, Tools, and Methods for Systems and Binary Analysis on Modern Platforms Journal Article

ACM Comput. Surv., 51 (4), pp. 69:1–69:34, 2018, ISSN: 0360-0300.

Links | BibTeX | Tags: Binary analysis, HVM, introspection, malware, security, SMM

Botacin, Marcus; Geus, Paulo Lício De; Grégio, André

Enhancing Branch Monitoring for Security Purposes: From Control Flow Integrity to Malware Analysis and Debugging Journal Article

ACM Trans. Priv. Secur., 21 (1), pp. 4:1–4:30, 2018, ISSN: 2471-2566.

Links | BibTeX | Tags: branch monitor, debug, malware, ROP

Afonso, Vitor; Kalysch, Anatoli; Müller, Tilo; Oliveira, Daniela; Grégio, André; de Geus, Paulo Lício

Lumus: Dynamically Uncovering Evasive Android Applications Inproceedings

Chen, Liqun; Manulis, Mark; Schneider, Steve (Ed.): Information Security, pp. 47–66, Springer International Publishing, Cham, 2018, ISBN: 978-3-319-99136-8.

Abstract | Links | BibTeX | Tags:

Sun, R; Yuan, X; Lee, A; Bishop, M; Porter, D E; Li, X; Grégio, André; Oliveira, Daniela

The dose makes the poison — Leveraging uncertainty for effective malware detection Inproceedings

2017 IEEE Conference on Dependable and Secure Computing, pp. 123-130, 2017.

Links | BibTeX | Tags: