The massive amount of data produced by security solutions have been creating a strong dependency on automated methods for knowledge discovery. Attacks against computer systems make use of several transmission channels and formats (e.g., network traffic, binary files, text, chained system calls etc.), which difficult their observation among unsuspicious data. Machine learning techniques are a great aid for separating data into classes, but they need to be correctly deployed. In this course, we will show how to adequately apply machine learning algorithms to the security data science process. To do so, we will discuss key concepts about the subject and present practical examples with free, open source tools.


  • Python Version: make sure you are using Python 3.5 or higher.
  • Libraries: all the python libraries used can be found in the file “requirements.txt”. To install them, just run the following command (using pip):

pip install -r requirements.txt

  • Datasets: the datasets located at folder “./datasets/” are going to be used in our entire course. They are all in .zip extension. When extracting, make sure the .csv are located in this same folder. To extract, use the following command from a terminal:

unzip <filename>.zip.

Cite our Work

If you want to cite this course in your work, please cite our paper F. Ceschin, F. Pinage, M. Castilho, D. Menotti, L. S. Oliveira and A. Gregio, “The Need for Speed: An Analysis of Brazilian Malware Classifiers,” in IEEE Security & Privacy, vol. 16, no. 6, pp. 31-41, Nov.-Dec. 2018. doi: 10.1109/MSEC.2018.2875369, which motivated us to create this content. Here is the bibtex of our work, if you want to cite us:


author={F. {Ceschin} and F. {Pinage} and M. {Castilho} and D. {Menotti} and L. S. {Oliveira} and A. {Gregio}},

journal={IEEE Security Privacy},

title={The Need for Speed: An Analysis of Brazilian Malware Classifiers},





keywords={invasive software;learning (artificial intelligence);pattern classification;Brazilian malware classifers;machine-learning systems;malware classification;Malware;Feature extraction;Support vector machines;Machine learning;Security;Security of data},