Making quality changes

Column

Published: March 24th, 2014

- Ran Levy

Almost every successful startup company reaches a stage where the “quick and dirty” way of coding becomes a quality liability and makes it difficult for the company to continue to grow and develop. At MyHeritage we encountered this issue, but because we were already running a successful website and supporting a huge user base of millions, we couldn’t just stop what we were doing and “refactor” the required code. Doing a quality change in parallel with our company’s growth in product and complexity was an extreme challenge that required the right state of mind and the right set of people.

This, then, is the story of how MyHeritage created a real quality change in an extreme environment.

The beginning
A couple of days after I arrived at the company, I got some courage and decided to code something small by myself. My hands were a bit shaky, but I was thinking to myself, “What could possibly go wrong? If I really make a mess, the unit and integration tests will detect it.”

After I wrote a small piece of code, I planned to add a test and asked for a colleague’s help. “Unit what?” he answered. “Just open the website, browse to the page your code affects and see what it looks like there. Worst case, QA will let you know if it doesn’t work.”

The team understood very quickly that this is not the way to go, and that we needed to do some rethinking about some of our engineering practices.

The first step: Coding standards
The first thing we agreed to start with was adopting coding standards to be used by all developers. So with the help of our CTO and feedback from all developers we decided on the coding standards for MyHeritage that included in addition guidelines for error handling and the way to deal with legacy code.

Like any beginning, this one was not easy. In one code review after another, developers commented to each other about coding standards and asked people why they have warnings in their code. But after a while people have forgotten they used to code in a different manner.

Unit testing
Right after presenting the coding standards to R&D, we started working on having unit testing guidelines and an education plan. Beyond the obvious benefits of unit testing, these tests contribute a lot to code quality and keep classes coherent with well-defined roles and responsibilities.

We spent some time on initial research. As time was short and lots of tasks were in the pipeline, we needed to find a creative way to integrate our solution within the existing resources. So we decided to start with a pilot with only one team member. We presented the initial direction to the rest of the team and started with the pilot team member. We built on our knowledge and made ourselves familiar with advanced topics in how to implement unit testing.

After a few weeks, we presented our pilot experience and what we learned, and we extended the pilot. After a few more weeks everyone joined, and as time went by, writing unit tests as part of the coding tasks became the de facto habit of every team member.

The next phase: System testing
Some time after adopting the concept of unit testing, we started to dream about having a framework for automatic testing for QA, which was a top need for the company. So we did some research and decided to go on a pilot with Cucumber. (Cucumber is a testing tool notable for being created with test-driven development techniques.) We started developing the infrastructure and built a pilot for one of the most important flows in our website. When this was done, we presented it to the QA manager, who was willing to give it a try and allocated one of her team members for the pilot.

There was a lot of passion about this automated testing system, and our belief in the benefits that automated testing would bring to the company really helped get the project off the ground. After the successful pilot phase, the QA team started building more tests scenarios. Now, any skepticism that might have existed about the efficacy of automated testing has vanished, and all QA engineers have gone through training sessions. Now, part of MyHeritage’s definition of “done” for each feature is beginning to include automated tests.

Creating a risk-free staging environment
Before, our staging environment worked with our production database, which was a huge risk to production.

Myself, our CTO and one of my talented team members worked on a plan to build a true staging environment that will not be able to access, even by mistake, the production environment. Our IT team also lent a hand, and we have now a real staging environment that allows in one click building new environments—copying complete data from the production environment—to apply DB schema changes and test them before they reach production, among other benefits.

Technical design phase
During some code reviews, we noticed that often some basic questions are left unanswered. How is QA going to test it? How is this solution going to be scaled up? We realized that major changes are required in the code to allow testing before it goes live, and that beyond this, significant changes are required to allow the solution to become as scalable as it needs to be.

Another thing we realized is that we don’t have a proper design phase where developers refer to the non-functional requirements in addition to the functional ones, get feedback from others, and add design-related documentation to the feature that is about to be developed.

So in one of the next features that we developed, we created a short document that presented the feature’s design. We discussed the design document with the team members and got feedback from them not only about the actual design, but also about instituting a process of designing features and getting feedback as a procedure for the team. Most team members were in favor of the idea of having such template, and we decided to give it a shot.

Since we implemented and went through some rounds of this new design phase in our development cycle, we have managed to communicate clearly all aspects of the design to all stakeholders, reduce the overall development time of features, and deliver features with higher quality.

Increasing visibility
An important part of any quality process is collecting quality attributes data and making them visible. So first the team suggested finding a way to collect all PHP errors from all Web servers, daemon servers, etc. to a single DB table, where all application errors were already reported. After we had these results, we exposed them as part of R&D metrics so everyone would be aware of the situation.

Then we decided to add a mechanism for logging all long DB queries in the system so we have all information regarding them. Exposing this information not only helped us improve the response time of the website, but also enabled us to track serious contention issues on the DB that were spotted easily.

Later on we decided to expose all important R&D metrics such as average home page load time and others—issues that we act on regularly—to monitor and help us improve our performance by looking at “small measurable” changes.

Automatic monitoring tools
We were satisfied with the progress we had made so far, and we aimed to achieve more. Thus, the next issue we decided to focus on was the manual quality operations we have performed. Because they were manual and took precious time away from other activities, we performed them only once a week. But MyHeritage could not afford to wait that long to find important quality issues.

So next we gave priority to monitoring our application and PHP errors. We decided to automate the error-monitoring process, run analyses twice per day, and send an e-mail that summarizes all new errors and all existing errors that grew significantly. Errors were spotted the same day they reached production.

Another quality measure that is under QA’s responsibility is going over all our statistics charts and looking for anomalies in the number. This might not sound like a big deal, but imagine going over 2,500 charts all by yourself. This is not fun, and definitely not efficient.

So once again, we came up with the idea of building an infrastructure (coming soon as open source) to analyze all our charts’ data automatically. Our mathematically inclined developer built our system to be highly flexible and to allow the user to add charts to white lists, define thresholds, replace the analysis algorithm and much more.

But what is a true quality change?
So everything we’ve talked about so far definitely contributes to quality, but is this a quality change? Not to me. A quality change to me is what the organization is experiencing right now as I write this article.

A quality change to me is when my manager keeps asking me why the unit/integration tests are not running automatically and continuously.

A quality change to me is when one of my team members goes in his free time to a continuous integration conference, then comes back and launches a great system that runs all unit testing after each commit to the source control.

A quality change to me is when team members complain that we are not doing enough code reviews, and when people come with their own initiatives regarding tests.

A quality change to me is when all teams in the company talk about unit testing, when people argue about mocks vs. stubs, when the client team has continuous build and automatic sanity tests that come right after the build, when all teams have a flavor of the design template used by the back end team, when refactoring happens all the time, and when my manager expects and demands FUC (Feature flags, Unit testing, Code reviews) from everyone.

A quality change to me is when management agrees to stop all R&D activities, despite business needs, for three weeks to reduce the number of errors and develop a quality supporting mechanism. When all of R&D is recruited to complete their tasks, and when management asks when the next time you will be doing a quality sprint, that’s a mindset change around quality.

Putting it all together
So what have we achieved so far?
• Coding standards so everyone codes using the same style
• Unit testing that runs on every commit so we know we are not breaking anything
• System testing so we know end-to-end functionality still works
• Increasing visibility so we always what’s going on
• Automatic monitoring tools so we can easily monitor the production.

Ran Levy leads the back-end team at MyHeritage, where he has worked with the management team to create a culture of “quality as a way life” within the back-end team and R&D. Ran has 15 years’ experience in the technology industry both as a developer and architect in complex large-scale systems.

Article Tags

MyHeritage, standards, testing, unit-testing

About Ran Levy

View all posts by Ran Levy

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Making quality changes

Article Tags

Subscribe to SDTimes

About Ran Levy

Related Articles

Snyk announces new DAST solution for securing APIs and web apps

5 common assumptions in load testing—and why you should rethink them

BrowserStack adds Private Devices offering to enabling testing across variety of secured devices

3 ways test impact analysis optimizes testing in Agile sprints