Going ‘lights-out’ with DevOps

Published: July 29th, 2019

People sometimes describe DevOps as a factory. It’s a good analogy. Like a factory, code goes go in one end of the DevOps line. Finished software comes out the other.

I’d take the idea one step further. In its highest form, DevOps is not just any factory, but a ‘lights-out’ factory.

Also called a “dark factory,” a lights-out factory is one so automated it can perform most tasks in the dark, needing only a small team of supervisors to keep an eye on things in the control room. That’s the level of automation DevOps should strive for.

In a lights-out DevOps factory, submitted code is automatically reviewed for adherence to coding standards, static analysis, security vulnerabilities and automated test coverage. After making it through the first pass, the code gets put through its paces with automated integration, performance, load and end-to-end tests. Only then, after completing all those tests, is it ready for deployment to an approved environment.

As for those environments, the lights-out DevOps factory automatically sets them up, provisions them, deploys to them and tears them down as needed. All software configuration, secrets, certificates, networks and so forth spring into being at deploy time, requiring no manual fidgeting with the settings. Application health is monitored down to a fine-grained level, and the actual production runtime performance is visible through intuitive dashboards and queryable operator consoles (the DevOps version of the factory control room). When needed, the system can self-heal as issues are detected.

This might sound like something out of science fiction, but it’s as real as an actual, full-fledged lights-out factory. Which is to say, “real, but rare.” Many automated factories approach lights-out status, but few go all the way. The same could be said of DevOps.

The good news is that you can design a basic factory line that delivers most of the benefits of a “lights-out” operation and isn’t too hard to create. You’ll get most of the ROI just by creating a DevOps dark factory between production and test.

Here is a checklist for putting together your own “almost lights-out” DevOps solution. Don’t worry. None of these decisions are irreversible. You can always change your mind. It will just take some rework.

1. IaaS or PaaS or containers – I recommend PaaS or Containers. Infrastructure as a Service has its place, but it has its downsides on price-point and configuration management. When you’re running a VM, it’s always on, so your spend for the resource is 100 percent though your utilization isn’t maxed out, so you’re paying to keep the VM running even while it’s not in use. The setup and configuration are also more complex, as you have to deploy the bare-metal instance and then deploy a configuration. Lastly, running IaaS, it’s all too easy to just run bespoke VMs and fall back into old habits. I’m a big fan of PaaS because you get a nice price point and just the right amount of configurability, without the added complexity of full specification. Containers are a nice middle ground. The spend for a container cluster is still there, but if you’re managing a large ecosystem, the orchestration capabilities of containers could become the deciding factor.

2. Public cloud or on-premises cloud – I recommend public cloud. Going back to our factory analogy, a hundred years ago factories generated their own power, but that meant they also had to own the power infrastructure and keep people on staff to manage it. Eventually centralized power production became the norm. Utility companies specialized in generating and distributing power, and companies went back to focusing on manufacturing. The same thing is happening with compute infrastructure and the cloud providers. The likes of Google, Amazon and Microsoft have taken the place of the power companies, having developed the specialized services and skills needed to run large data centers. I say let them own the problem while you pay for the service.

There are situations where a private cloud can make sense, but it’s largely a function of organizational size. If you’re already running a lot of large data centers, you may have enough core infrastructure and competency in place to make the shift to private cloud. If you decide to go that route, you absolutely must commit to a true DevOps approach. I’ve seen several organizations say they’re doing “private cloud” when in reality they’re doing business as usual and don’t understand why they’re not getting any of the temporal or financial benefits of DevOps. If you find yourself in this situation, do a quick value-stream analysis of your development process, compare it to a lights-out process, and you’ll see nothing’s changed from your old Ops model.

3. Durable storage for databases, queues, etc. – I recommend using a DB service from the cloud provider. Similar to the decision between IaaS and PaaS, I’d rather pay someone else to own the problem. Making any service resilient means having to worry about redundancy and disk management. With a database, queue, or messaging service, you’ll need a durable store for the runtime service. Then, over time, you’ll not only have to patch the service but take down and reattach the storage to the runtime system. This is largely a solved problem from a technological standpoint, but it’s just more complexity to manage. Add in the need for service and storage redundancy and backup and disaster recovery, and the equation gets even more complex. Again, the cloud providers are more than willing to own those problems, and offer cost-effective, scalable solutions for common distributed services that need high durability.

4. SQL vs. NoSQL – Many organizations are still relational database-centric, as they were in the 90’s and 00’s, with the RDBMS the center of the enterprise universe. Relational still has its place, but cloud-native storage options like table, document, and blob provide super-cheap high-performance options. I’ve seen many organizations that basically applied their old standards to the cloud, and said, “Well, you can’t use blob storage because it’s not an approved technology,” or “You can’t use serverless because it’s an ‘unbounded’ resource.” That’s the wrong way to do it. You need to re-examine your application strategy to use the best approach for the price point.

I once had a client whose data changed fairly slowly (every few weeks) but had to be accessed much more frequently. First they tried querying the same static data with the same queries, over and over. The performance was OK, but execution time went down significantly when the DB cache was primed. Then there was a push to give the DB instance more RAM so it could hold more data in the cache. We offered an alternative where we just precomputed static read models and dumped them in blob storage. The cost of the additional storage was a couple dollars a month, where increasing the specs of the DB would have cost more than a hundred a month. We achieved faster performance for less cost, but it required re-evaluating our approach.

5. Mobile – Mobile builds are one of the things that can throw you for a loop. Android is easy, Mac is a little more complicated. You’ll either need a physical Mac for builds, or if you go with Azure DevOps, you can have it run on a Microsoft Mac instance in Azure. Some organizations still haven’t figured out that they need a Mac compute strategy. I once had a team so hamstrung by corporate policy, they were literally trying to figure out how to build a “hack-intosh” because the business wanted to build an iOS app but corporate IT shot down buying any Macs. Once we informed them we couldn’t legally develop on a “hack-intosh,” they just killed the project instead of trying to convince IT to use Mac infrastructure. Yes, they abandoned a project, with a real business case and positive ROI because IT was too rigid.

6. DB versioning – Use a tool like Liquibase or Flyway. Your process can only run as fast as your rate-limiting step, and if you’re still versioning your database by hand, you’ll never go faster than your DBAs can execute scripts. Besides, they have more important things to do.

7. Artifact management, security scanning, log aggregation, monitoring – Don’t get hung up on this stuff. You can figure it out as you go. Get items in your backlog for each of these activities and have a more junior DevOps resource ripple each extension through to the process as its developed.

8. Code promotion – Lay out your strategy to go from Dev to Test to Stage to Prod, and replace any manual setup like networking, certificates and gateways with automated scripts.

9. Secrets – Decide on a basic toolchain for secrets management, even if it’s really basic. There’s just no excuse for storing secrets with the source control. There are even tools like git-secret, black-box, and git-crypt that provide simple tooling and patterns for storing secrets encrypted.

10. CI – Set up and configure your CI tool, including a backup / restore process. When you get more sophisticated, you’ll actually want to apply DevOps to your DevOps, but for now just make sure you can stand up your CI tool in a reasonable amount of time, repeatedly, with backup.

Now that you’ve made some initial technology decisions and established your baseline infrastructure, make sure you have at least one solid reference project. This is a project you keep evergreen and use to develop new extensions and capabilities to your pipelines. You should have an example for each type of application in your ecosystem. This is the project people should refer to when they want to know how to do something. As you evolve your pipelines, updated this project with the latest and greatest features and steps.

For each type of deployment — database, API, front end and mobile — you’ll want to start with a basic assembly line. The key elements to your line will be Build, Unit Testing, Reporting, Artifact Creation. Once you have those, you’ll need to design a process for deploying an artifact into an environment (i.e. deploying to Test, Stage, Prod) with its runtime configuration.

From there, keep adding components to your factory. Choose projects in the order that gets you the most ROI, either by eliminating a constraint or reducing wait time. At each stage, try to make “everything as code.” Always create both a deployment and rollback and exercise the heck out of it all the time.

When it comes to tooling, there are more than enough good open-source options to get you started.

To sum up, going lights-out means committing to making everything code, automated, and tested. You may not get there with every part of your production line, but just by tackling the basics, you’ll be surprised how much you can get done in the dark.

LEARN MORE:
Starting a DevOps initiative requires cultural and technology shifts
Tasktop Illuminates the Value Stream
Your DevOps Initiatives Are Failing: Here’s How to Win
The Most Important Tool in DevOps: Value Stream Mapping
Avoiding The Hidden Costs of Continuous Integration
Bringing rich communication experiences where they Mattermost
Instana Monitoring at DevOps Speed
SD Times 2019 DevOps Showcase

Article Tags

cloud, containers, DevOps, DevOps Showcase, lights-out factory, PaaS, SQL

About Nate Berent-Spillson

Nate Berent-Spillson is a technology principal at software services provider Nexient.

View all posts by Nate Berent-Spillson

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Going ‘lights-out’ with DevOps

Article Tags

Subscribe to SDTimes

About Nate Berent-Spillson

Related Articles

Vibe Loop: AI-native reliability engineering for the real world

The AI productivity paradox in software engineering: Balancing efficiency and human skill retention

Snowflake introduces agentic AI innovations for data insights

Plotly brings vibe coding to visual data app development