Microsoft is applying machine learning and deep neural networks to its software security approach. The company announced a new research project, neural fuzzing, designed to augment traditional fuzzing techniques, discover vulnerabilities, and learn from past software experiences.
The research is based on Microsoft’s Security Risk Detection tool that incorporates artificial intelligence to find and detect software bugs.
Fuzzing is a software security testing technique used to find vulnerabilities in complex software solutions. “Fuzzing involves presenting a target program with crafted malicious input designed to cause crashes, buffer overflows, memory errors, and exceptions,” Microsoft researchers wrote.
Some fuzz testing categories include: blackbox fuzzers that rely on sample input files generate new inputs, whitebox fuzzers that analyze the target program statically or dynamically to help search for new inputs, and greybox fuzzers that uses a feedback loop to guide their search. Microsoft’s new category, neural fuzzing, uses machine learning models to learn from the feedback loop of a greybox fuzzer, according to the researchers. The team says they have been able to improve the code coverage, code paths, and crashes of four input formats: ELF, PDF, PNG and XML.
“We present a learning technique that uses neural networks to learn patterns in the input files from past fuzzing explorations to guide future fuzzing explorations. In particular, the neural models learn a function to predict good (and bad) locations in input files to perform fuzzing mutations based on the past mutations and corresponding code coverage information,” the researchers wrote in their research.
According to William Blum, from the Microsoft Security Risk Detection engineering team, this is only just the beginning of what can be used when applying deep neural networks to fuzz testing.
“We could also use it to learn other fuzzing parameters such as the type of mutation or strategy to apply. We are also considering online versions of our machine learning model, in which the fuzzer constantly learns from ongoing fuzzing iterations,” Blum wrote in a post.