In recent years, cybercriminals use new malware or variants in order to effectively evade inspection from security mechanisms. The honeypot is able to capture the malware cybercriminals are using. With the increasing number of captured malware from honeypots, if IT security people can’t distinguish old, variant or new malware in order to further analysis, government organizations and enterprises can’t prevent for new types attack model quickly.
Although today there are many scholars propose a lot of researches to analyze malware, most of them focus on single file type of malware. It is not suitable the honeypot malware that are mostly mixed with source code and binary files. Therefore, it still lacks an effective and quick analysis tool for the honeypot malware.
We propose honeypot malware analysis system combining source files and binary files. We use the syntax structure of source code files, the image vector of binary files, file name and file structure as our features to measure malware similarity. We adopt incremental clustering as our clustering algorithm to quickly classify the old known malware and new types of malware. After several experimental evaluations, our system can effectively and quickly cluster honeypot malware. Finally, we also compare the performance with virustotal and other researches, and the result confirms that our system can achieve better clustering efficiency.
