Many developers turn to Stack Overflow to ask questions, share programming knowledge and learn from others, but the amount of information available on the online community can be overwhelming. To tackle this, a group of researchers have developed the Crowd Knowledge Answer Generator (CROKAGE), a new solution designed to help developers easily find relevant information and explanations on Stack Overflow. 

“Developers often search for relevant code examples on the web for their programming tasks. Unfortunately, they face two major problems. First, the search is impaired due to a lexical gap between their query (task description) and the information associated with the solution. Second, the retrieved solution may not be comprehensive, i.e., the code segment might miss a succinct explanation. These problems make the developers browse dozens of documents in order to synthesize an appropriate solution,” the researchers wrote in a paper.

To address this, CROKAGE aims to take the description of a programming task as a query and then provide the relevant code snippets and explanations so that developers can easily use the code in their projects.  

In order to develop CORKAGE, the team trained a word-embedding model with FastText using millions of Q&A threads from the website as “the training corpus” and expanded the natural language query to include unique open-source software library and function terms. 

According to the team, CROKAGE outperformed six baselines, including the state-of-art research tool BIKER, and produced better results than BIKER in terms of relevance of the suggested code examples, benefit of the code explanations, and the overall solution quality (code + explanation).

“A combination relevant code and corresponding explanation is very likely to help a developer understand both the solution to their problem and how best to implement that code in practice,” Ben Popper, director of content at Stack Overflow, wrote in a blog post

However, Popper added that CROKAGE still has some limitations, if the query is poorly formulated, the tools will not suggest on how to improve the query. 

“Like any other search tool, the results, though encouraging, are not perfect,” Popper wrote. “The team is still investigating other factors that could not only help find higher quality answers, but also improve the synthesized solution offered up as a final result.”

The solution is limited to Java queries for now, but the researchers are looking to have an expanded version open to the public soon. More information is available in the original paper.