Machine Learning for Cybersecurity

Automated Vulnerability Detection in Source Code Using Minimum Intermediate Representation

Vulnerability is one of the root causes of network intrusion. An effective way to mitigate security threats is to discover and patch vulnerabilities before an attack. Traditional vulnerability detection methods rely on manual participation and incur a high false positive rate. The intelligent vulnerability detection methods suffer from the problems of long-term dependence, out of vocabulary, coarse detection granularity and lack of vulnerable samples.
This paper proposes an automated and intelligent vulnerability detection method in source code based on the minimum intermediate representation learning.
First, the sample in the form of source code is transformed into a minimum intermediate representation to exclude the irrelevant items and reduce the length of the dependency. Next, the intermediate representation is transformed into a real value vector through pre-training on an extended corpus, and the structure and semantic information are retained. Then, the vector is fed to three concatenated convolutional neural networks to obtain high-level features of vulnerability. Last, a classifier is trained using the learned features.
To validate this vulnerability detection method, an experiment was performed. The empirical results confirmed that compared with the traditional methods and the state-of-the-art intelligent methods, our method has a better performance with fine granularity.

Keywords: cyber security, vulnerability detection, program slice, transfer learning, representation learning
Read the full article published in March 2020 by Applied Sciences, by Xin Li & co (Beijing University): https://www.mdpi.com/2076-3417/10/5/1692