We here present the PhD thesis and Masters dissertations developed in the SECRET lab and/or advised and/or co-advised by SECRET members.

Rogue One: Rebelling Against Machine Learning (In) Security
PhD Thesis by Fabrício Ceschin (UFPR)
Advisor: André Grégio (UFPR)
Co-Advisor: Heitor Murilo Gomes (Victoria University of Wellington)
Archived Here  

Machine Learning (ML) is widely used in many cybersecurity tasks nowadays and it is considered state-of-the-art because it helps to improve the detection of new attacks, keeping pace with their evolution. However, ML-based solutions may be too difficult to evaluate in some scenarios, making them prone to gaps and pitfalls that could invalidate their use in practice. One of the reasons for that is that cybersecurity data follows a non-stationary distribution due to its constantly changing nature to evade detection, requiring special attention. Thus, it is essential to know how to correctly use ML in cybersecurity, considering all the challenges that are faced during the proposal or deployment of defense solutions. In this thesis, I propose to investigate the main challenges of applying Machine Learning to cybersecurity, showing how existing solutions fail and, in some cases, proposing possible mitigations to them. Based on that, I present a critical analysis of the state-of-the-art literature and point directions toward adequate ways for future research. The main objectives of this work are to (i) understand the main problems of applying ML in cybersecurity; (ii) detect what can be improved; (iii) what is the future of ML for security; and (iv) reduce the gap between industry and academy. Finally, the main contributions of this thesis are (i) an extensive analysis of the recent literature regarding ML applied to cybersecurity in a comparative way; (ii) directions for cybersecurity research considering its particularities and how to correctly apply ML to improve quality and allow their effective use in real-world applications; and (iii) a set of modules or frameworks to support and improve further ML solutions for cybersecurity that can be used by both industry and academy.

Avaliação da Viabilidade de Modelos Filogenéticos na Classificação de Aplicações Maliciosas
PhD Thesis by Thalita Scharr Rodrigues Pimenta (UFPR)
Advisor: André Grégio (UFPR)
Archived Here  

Thousands of malicious codes are created, modified with the support of tools of automation and released daily on the world wide web. Among these threats, malware are programs specifically designed to interrupt, damage, or gain access unauthorized access to a system or device. To facilitate identification and categorization of common behaviors, structures and other characteristics of malware, enabling the development of defense solutions, there are analysis strategies that classify malware into groups known as families. One of these strategies is Phylogeny, a technique based on the Biology, which investigates the historical and evolutionary relationship of a species or other group of elements. In addition, the use of clustering techniques on similar sets facilitates reverse engineering tasks for analysis of unknown variants. a variant refers to a new version of malicious code that is created from modifications of existing malware. The present work investigates the feasibility of using phylogenies and methods of grouping in the classification of malware variants for the Android platform. Initially 82 related works were analyzed to verify experiment configurations of the state of the art. After this study, four experiments were carried out to evaluate the use of similarity measures and clustering algorithms in the classification of variants and in the similarity analysis between families. In addition to these experiments, a Flow of Activities for Malware grouping with five distinct phases. This flow has purpose of helping to define parameters for clustering techniques, including measures of similarity, type of clustering algorithm to be used and feature selection. After defining the flow of activities, the Androidgyny framework was proposed, a prototype for sample analysis, feature extraction and classification of variants based on medoids and unique features of known families. To validate Androidgyny were Two experiments were carried out: a comparison with the related tool Gefdroid and another with copies of the 25 most populous families in the Androzoo dataset.

On the Malware Detection Problem: Challenges & Novel Approaches
PhD Thesis by Marcus Botacin (UFPR)
Advisor: André Grégio (UFPR)
Co-Advisor: Paulo Lício de Geus (UNICAMP)
Archived Here Summary Here 

Malware is a major threat to most current computer systems, causing image damages and financial losses to individuals and corporations, thus requiring the development of detection solutions to prevent malware to cause harm and allow safe computers usage. Many initiatives and solutions to detect malware have been proposed over time, from AntiViruses (AVs) to sandboxes, but effective and efficient malware detection remains as a still open problem. Therefore, in this work, I propose taking a look on some malware detection challenges, pitfalls and consequences to contribute towards increasing malware detection system’s capabilities. More specifically, I propose a new approach to tackle malware research experiments in a practical but still scientific manner and leverage this approach to investigate four issues: (i) the need for understanding context to allow proper detection of localized threats; (ii) the need for developing better metrics for AV solutions evaluation; (iii) the feasibility of leveraging hardware-software collaboration for efficient AV implementation; and (iv) the need for predicting future threats to allow faster incident responses.

Need for Speed: Analysis of Brazilian Malware Classifiers’ Expiration Date
Master Dissertation by Fabrício Ceschin (UFPR)
Advisor: André Grégio (UFPR)
Co-Advisor: David Menotti (UFPR)
Archived Here

New malware variants are produced and released daily to deceive users and overcome defense solutions, thus demanding continuous improvements on these mechanisms (e.g., antiviruses constant updates). Although most malware samples are usually “generic” enough to infect the same type of operating system world-widely, some of them are tied to the specificities regarding the cyberspace of certain target countries. In this work, we present an analysis of thousands of malware samples collected in the Brazilian cyberspace along several years, including their evolution and the impact of this evolution on malware classification. We also share a labeled dataset of this Brazilian malware set to allow other experiments and comparisons by the community. This dataset is representative of the Brazilian cyberspace and contains profiles of known-bad and known-good programs based on binaries’ static features. Our analysis leveraged machine learning algorithms (in particular, we evaluated four popular off-the-shelf classifiers: Support Vector Machines, Multilayer Perceptron, KNN and Random Forest) to classify the programs of our dataset as malware or goodware (including experiments with thresholds) and to identify the potential concept drift that occurs when the subject of a classification scheme evolves as time goes by. We also provide extensive details about our dataset, which is composed of 38, 000 programs – 20, 000 labeled as known malware, collected from malicious email attachments/infected users (triaged in both cases by a major Brazilian financial institution with a country-wide distributed network) between 2013 and early 2017. For the sake of reproducibility and unbiased comparison, we make the feature vectors produced from our database publicly available. Finally, we discuss the results of the conducted experiments, whose analysis evidences the existence of concept drift on programs, either goodware and malware, and shows that it is not possible to say that there is seasonality in our dataset.

Hardware-Assisted Malware Analysis
Master Dissertation by Marcus Botacin (UNICAMP)
Advisor: Paulo de Geus (UNICAMP)
Co-Advisor: André Grégio (UFPR)
Archived Here

Today’s world is driven by the usage of computer systems, which are present in all aspectsof everyday life. Therefore, the correct working of these systems is essential to ensure themaintenance of the possibilities brought about by technological developments. However,ensuring the correct working of such systems is not an easy task, as many people attemptto subvert systems working for their own benefit. The most common kind of subversionagainst computer systems are malware attacks, which can make an attacker to gain com-plete machine control. The fight against this kind of threat is based on analysis proceduresof the collected malicious artifacts, allowing the incident response and the developmentof future countermeasures. However, attackers have specialized in circumventing analysissystems and thus keeping their operations active. For this purpose, they employ a seriesof techniques called anti-analysis, able to prevent the inspection of their malicious codes.Among these techniques, I highlight the analysis procedure evasion, that is, the usage ofsamples able to detect the presence of an analysis solution and then hide their maliciousbehavior. Evasive examples have become popular, and their impact on systems securityis considerable, since automatic analysis now requires human supervision in order to findevasion signs, which significantly raises the cost of maintaining a protected system. Themost common ways for detecting an analysis environment are: i) Injected code detec-tion, since injection is used by analysts to inspect applications on their way; ii) Virtualmachine detection, since they are used in analysis environments due to scalability issues;iii) Execution side effects detection, usually caused by emulators, also used by analysts.To handle evasive malware, analysts have relied on the so-called transparent techniques,that is, those which do not require code injection nor cause execution side effects. Away to achieve transparency in an analysis process is to rely on hardware support. Inthis way, this work covers the application of the hardware support for the evasive threatsanalysis purpose. In the course of this text, I present an assessment of existing hardwaresupport technologies, including hardware virtual machines, BIOS support, performancemonitors and PCI cards. My critical evaluation of such technologies provides basis forcomparing different usage cases. In addition, I pinpoint development gaps that currentlyexists. More than that, I fill one of these gaps by proposing to expand the usage ofperformance monitors for malware monitoring purposes. More specifically, I propose theusage of the BTS monitor for the purpose of developing a tracer and a debugger. Theproposed framework is also able of dealing with ROP attacks, one of the most commonused technique for remote vulnerability exploitation. The framework evaluation shows noside-effect is introduced, thus allowing transparent analysis. Making use of this capability,I demonstrate how protected applications can be inspected and how evasion techniquescan be identified.