In a study of its development teams, Meta has noticed a strong correlation of developer satisfaction dipping whenever diff review times are slow based on several metrics at the company. Diff reviews refer to any individual set of changes made to the codebase.
Diff reviews which can catch bugs, teach best practices, and ensure high code quality are required at the company with no exception, leading them to try to fix the problem.
One culprit of dissatisfaction is that the slowest 25% of diff reviews amounted to well over a day whereas the median hours in review for a diff was a much more reasonable few hours based on the “Time in Review” metric in 2021, which is how long a diff spends waiting on review across all of its review cycles.
“Simply optimizing for the speed of review could lead to negative side effects, like encouraging rubber-stamp reviewing. We needed a guardrail metric to protect against negative unintended consequences. We settled on “Eyeball Time” – the total amount of time reviewers spent looking at a diff. An increase in rubber-stamping would lead to a decrease in Eyeball Time,” Louise Huang, Seth Rogers, and James Saindon wrote in a Meta blog post.
Meta then tested queuing up diffs in the same way that streaming services transition smoothly into the next show to try to make a diff review flow state, resulting in the Next Reviewable Diff feature.
“We use machine learning to identify a diff that the current reviewer is highly likely to want to review. Then we surface that diff to the reviewer after they finish their current code review,” the blog post says. “We make it easy to cycle through possible next diffs and quickly remove themselves as a reviewer if a diff is not relevant to them.”
Meta found that the feature resulted in a 17% overall increase in review actions per day and that engineers that use this flow perform 44 percent more review actions than the average reviewer.
The company also built in a new reviewer recommendation system that allows reviewers that are available to review a diff and are more likely to be great reviewers to be prioritized which resulted in a 1.5% increase in diffs reviewed within 24 hours. The model now also supports backtesting and automatic retraining as well.
Lastly, Meta built Nudgebot, which determines a subset of reviewers that are most likely to review a diff. It then sends them a chat ping with the appropriate context for the diff along with a set of quick actions that allow recipients to jump right into reviewing resulting in a 7% drop in Time In Review and a 12% drop in the proportion of diffs that waited longer than three days for review.