Malware Detection Using a Heterogeneous Distance Function

Authors

  • Martin Jureček Faculty of Information Technology, Czech Technical University in Prague
  • Róbert Lórencz Faculty of Information Technology, Czech Technical University in Prague

Keywords:

Malware detection system, feature selection, similarity measure, k-nearest neighbors classifier, partitioning around medoids

Abstract

Classification of automatically generated malware is an active research area. The amount of new malware is growing exponentially and since manual investigation is not possible, automated malware classification is necessary. In this paper, we present a static malware detection system for the detection of unknown malicious programs which is based on combination of the weighted k-nearest neighbors classifier and the statistical scoring technique from [12]. We have extracted the most relevant features from portable executable (PE) file format using gain ratio and have designed a heterogeneous distance function that can handle both linear and nominal features. Our proposed detection method was evaluated on a dataset with tens of thousands of malicious and benign samples and the experimental results show that the accuracy of our classifier is 98.80 %. In addition, preliminary results indicate that the proposed similarity metric on our feature space could be used for clustering malware into families.

Downloads

Download data is not yet available.

Downloads

Published

2018-07-26

How to Cite

Jureček, M., & Lórencz, R. (2018). Malware Detection Using a Heterogeneous Distance Function. COMPUTING AND INFORMATICS, 37(3), 759–780. Retrieved from https://www.cai.sk/ojs/index.php/cai/article/view/2018_3_759