Demystifying Differential and Incremental Analysis for Static Code Analysis within DevOps

Published: December 6th, 2021

- SD Times

The DevOps Movement has many recommended practices for automation of processes and testing. However, across different market verticals, the requirements, practices, and cadence of releases very widely. For example, in the more security or safety relevant software markets, development processes often also include compliance to coding standards, or other security and safety practices that must be “baked” into the process in order to achieve compliance.

Where security is involved, we often refer to these extended DevOps automation processes as DevSecOps. Unfortunately, when dealing with massive code bases, the more in-depth practices and development pipelines become quite complex, and, more importantly, time consuming.

Take, for example, the Android (C++ and Java) Android Open Source Project. The project is now used as the base platform for a range of embedded IoT (Internet of Things) devices, including within automotive systems and medical devices – which, on its own, now consists of over 35 million lines of code (MLOC). Or, the Unreal Engine (UE), which is widely used across a broad-spectrum of different tasks in game development, consists of over 3.5 MLOC — and very complex code at that.

With these figures in mind, how can teams that build these modern applications and massive code bases stay agile, while improving the safety, security, and reliability of their systems by making use of DevOps automation?

Introduction to CI/CD

The DevOps Movement is often associated with the principles of Continuous Integration, Delivery, and Deployment.

Continuous Integration (CI) is the practice of automating the integration of code changes from multiple contributors into a single software project. This allows software developers to frequently merge code changes into a central repository where builds and tests are then run.

Automated tools are used to assert the new code’s correctness and ensure no regressions have been introduced before integration. In the event of failures, these changes can be easily and quickly reversed, resulting in greater confidence in each change made to the project.

Continuous Delivery (CD) is an extension of Continuous Integration, whereupon successfully meeting all of the quality and security checks and requirements, the software is automatically packaged. This results in a deployment-ready build artifact that has passed through a standardized test process.

Continuous Deployment (also CD — confusingly) is a further extension of Continuous Delivery where the code changes are then — finally — prepared automatically for a release to the production systems.

A typical CI/CD pipeline will include various phases — or stages — as shown by the visualization below:

Source: Joseph Evans’ Blog (https://www.jo3-w3b-d3v.com/devops/tools-pipelines)

CI/CD Pipelines

As illustrated above, upon each code change that a developer makes, a CI/CD Job would be initiated through the CI platform (e.g. Jenkins). Based on the outcomes relative to the Build Acceptance Testing (BAT) along with other verification and validation steps, the code changes will either be permitted to merge into the main branch (Master) or be rejected and returned to the committer for further changes.

However, one of the common problems of CI automation is when dealing with large codebases, which causes the CI/CD pipeline execution times to grow — hence feedback times of the success or failure of the commit. As CI/CD pipelines are built out further with code analysis and tests, etc., the problem is only exasperated.

The most obvious solution to this problem is to throw more machine resources at the problem, so that build times — hence feedback times — are reduced to whatever it is that we deem acceptable. But, this is not scalable and — sooner or later — the machinery costs and energy consumption will become impractical. This is where more sophisticated build systems and more modular package structures can help – by building only what is absolutely necessary and ensuring the minimum possible feedback times.

Incremental builds and modular builds are nothing new – the idea of an “update build” is very common for most compiler toolchains. By employing such techniques, we are being far less wasteful in terms of time, computation of resources, and — ultimately — energy, which is very important with the need for us all to reduce our energy consumption and our carbon footprint.

Can we therefore apply the same approaches as we do in compiler technology to our test and analysis steps? For dynamic testing, the answer is usually straightforward; by making a connection (dependency tree) between the test cases and the source code that they are testing, we can be selective as to which test cases are rerun and when. This ensures only those test cases that apply to the changed files are executed in the test stage, and provides the maximum possible efficiency and lowest possible runtimes.

For static analysis tools, more sophisticated examples tend to follow and build-in a similar approach to compiler toolsets by building in dependency-based incremental analysis capabilities.

Incremental and Differential Analysis

Incremental Analysis

Incremental Analysis encompasses the idea of performing an “update analysis”, much like a compiler toolchain will perform an “update build,” as opposed to a full, clean, rebuild. “Analysis objects” contain the analysis data for a module and timestamps or checksums for the files that it depends upon. By maintaining “analysis objects” from a prior analysis build, incremental analysis can initiate a “dependency check” to determine which — if any — of the source files have changed and need to be re-analyzed.

In Incremental Analysis, the whole application is essentially re-analyzed with the minimum effort. Done correctly, Incremental Analysis is much faster than a full analysis with no impact on the accuracy of the results.

However, in a worst-case scenario – imagine that there are code changes for a badly designed system, in addition to an update to a central module or interface that is used (or included) — ultimately — by all other modules in the system. In this instance, the Incremental Analysis time would then be equivalent to the full analysis time.

Sadly, software systems suffering from poor modularity — with unwanted dependencies and couplings — are common and refactoring isn’t always possible. When combined with very large, complex codebases that have very long build and analysis times, this compounds the issues of runtimes for CI/CD pipelines.

Differential Analysis

Put simply, Differential Analysis is an enhanced form of Incremental Analysis, designed for use in CI/CD pipelines, where codebases are large and complex, and build and analysis times are otherwise impractical for “fast feedback” for development teams.

By using system context data from previous full analysis builds on remote build servers, the local static analyzer examines only the files that are new or that have changed as reported by the version control system (e.g. Git or P4). Using this system context data, shared from the servers, the local static code analyzer is able to provide analysis results “as if” your entire system had been analyzed, even though only a small fraction of the system has actually been analyzed locally. Thus, Differential Analysis provides you with the shortest possible analysis times while reporting new issues in the changed code, and can facilitate “fast feedback” to development teams, regardless of the overall scale of the codebase or the lack of modularity in its design.

For static code analysis tools like Klocwork, Differential Analysis will also only report the new issues that were detected since the last build, or with respect to the current main branch. This helps ensure that their focus is maintained on the “new issues” that will degrade the quality of the codebase and lead to issues with compliance.

Source: Perforce (Sample Differential Analysis output in the Jenkins console)

Use Cases for Differential and Incremental Analysis

When trying to understand where developers would get the most value out of Incremental or Differential Analysis, it is important to consider the feedback times required for the developers to remain productive. Then balance that with the constraints created by the code base size and complexity, and the numbers of changes and commits being made since the previous build.

Source: Feature and Master Branches: StackOverflow (https://stackoverflow.com/questions/20755434/what-is-the-master-branch-and-release-branch-for)

Feature Branch CI Pipeline

Within a feature branch, as illustrated above, a developer typically would commit several versions of their branch and a CI job would run with each commit. In this case, Differential Analysis will save considerable time, and provide fast feedback and focused results based on the changes.

Differential Analysis will report the new issues or vulnerabilities in a specific change set of files (a subset of the application). For smaller applications, where build and analysis times are less of an issue, the entire application could be analyzed with each commit. Either way, this approach will benefit the development teams by reducing the number of issues merged to the main branch.

Develop Branch CI Pipeline

Within development teams where there are multiple developers, each working on their own branch. Post successful commits of the Feature branches, a further CI pipeline execution may be initiated for the Dev branch, prior to merging all of the changes to the Release branch. An Incremental Analysis would be extremely beneficial here, since it will carry all of the changes made by the various Feature branches and analyse them “incrementally” compared to the previous Release branch revision code. In many cases, where codebases are large and complex, this would save a tremendous amount of time, whilst still providing analysis results that are consistent with a full, clean build.

Additionally, as developers make daily changes to the Feature branches that end up eventually merged into the Dev branch, it is critical to have a cadence of re-establishing the most up-to-date system context data available, so that the next differential analysis runs performed on new Feature branches are as relevant and correct as possible.

Release Branch CD Pipeline

Finally, with the creation of each release candidate upon the Release stream, it is usually recommended that a full, clean analysis is run to ensure that the most accurate analysis data is provided for the whole application, and with no interference from previous runs. This independence is often a requirement of safety and security standards in producing the final policy scan results and compliance reports.

The Bottom Line

Maturing software development can be accomplished through established DevOps principles of automation and optimization. As highlighted in this article, the DevOps pipeline consists of recurring stages.

By reducing the overhead and unnecessary rework within these stages through methods like Differential and Incremental Analysis, automation of code reviews, and dependency-based rerunning of dynamic test cases, developers can move much faster — without compromising the security, quality, and safety of their end product. In all cases, the use of CI techniques to encourage detection — and ideally remediation — of new issues as early as possible in the software development life cycle can only improve development productivity.

Content provided by SD Times and Perforce Software. About the co-authors: Eran Kinsbruner is chief evangelist, SAST & DevOps, and senior director of Product Marketing at Perforce; and Steve Howard is static analysis specialist and technical services lead at Perforce.

Article Tags

Perforce

About SD Times

View all posts by SD Times

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management