Biotech firms turns to open source for speed

Published: March 4th, 2019

Founded 10 years ago by a group of MIT scientists, Massachusetts-based biotech firm Ginkgo Bioworks has found great success in leveraging a number of open-source technologies to speed up and automate a wide variety of synthetic biology laboratory tasks. The organization’s main focus is the genetic engineering of compound-producing bacteria for a range of industrial applications and Ginkgo senior software engineers Dan Cahoon and Chris Mitchell spoke with SD Times about how their combined computer and life science backgrounds have given them a unique opportunity to flex their skills and utilize their specific educations outside of more traditional routes for programmers.

Cahoon, whose background is in chemical and physical biology, as well as computer science, is part of the ‘Decepticon’ automation sprint team at Ginkgo. Presently, they’re collaborating with automation company Transcriptic to begin incorporating robots into the laboratory pipeline. Cahoon works on the front and back end as well as architectural aspects of the robotics platforms.

“We’re working on onboarding what they call work cells, which are basically a collection of several robots that we’ve put together in the lab (with the big robot arms) to essentially automate all of these lab tasks that [the scientists] would normally have to do,” Cahoon said. Most of these tasks involve handling fluids — transporting, mixing and centrifuging chemicals.

With their varied technology stack, relying on plenty of well-known, open-source libraries, as well as some focused directly on biotech, Mitchell says that Ginkgo has seen huge growth in throughput and speed over the years.

“I think the number we’re hitting is a three-times increase year-over-year in our throughput for the past five or six years, and we’re continuing on at that scale, which is pretty staggering,” said Mitchell, a life science PhD in addition to his developer role at Ginkgo. “I have about nine years of benchwork, which is actually doing the physical experimentation. And I see a single individual at Ginkgo can carry out pretty much the entire operations of an entire academic lab in a week. Ginkgo is sort of like a full-stack engineering operation where you write the DNA and you stitch it together and you test it, you learn from it and you do the entire thing. So Ginkgo’s is very much completely vertically integrated into its space. In terms of the scaling and what our automation stack has enabled us to do is that a single individual can optimize thousands of organisms and have those organisms custom built and tested within a few weeks.”

Cahoon compared the work of two or three of the aforementioned robotics platforms to around 100 human lab workers, and says their automation efforts mean that projects which utilize similar techniques and on similar scale can be pushed through rapidly, providing more time for what he calls “cool offshoot projects.” This includes a recent experiment which saw scientists at Ginkgo sample DNA from a flower, extinct for around 100 years, which was preserved in a museum.

“We took their scent-producing genes and put them in our yeast platform and it has produced these smells from a flower that no longer grows,” Cahoon said. “So you can now smell at Ginkgo these flowers that are actually extinct.”

Though extant sources also provide fragrance profiles from DNA. In collaboration with a flavor and fragrance company, Ginkgo used the same yeast-based platform to produce the compounds that make roses smell the way they do, in mass.

Mitchell broke down what software goes where in this long chain from idea to trial to completed experiment.

“Essentially, our whole infrastructure is running on Docker, so everything is containerized, largely,” Mitchell said. “The orchestration of that right now is done by Rancher and so we use GitLab for spinning things up and down and handling our development and deployment lifecycle. In terms of running the work, we use a variety of back-ends for web servers, the majority being Ruby-on-Rails and Django. For some small microservices, we’ll use Slack. There’s some other miscellaneous things written in Go and Node, and that’s largely just because we have some library that we wanted to use that integrates support in Node. I think GraphQL is one of the best examples of that. That ecosystem was developed in JavaScript, so it makes sense to use Node to run that instead of some other layer. For running tasks and analyzing data, we use Jupyter. For a lot of the ad hoc analysis by users, Celery runs a lot of our work. Celery uses RabbitMQ as its broker with Redis as its back-end. And Airflow is another tool that we utilize. On the machine learning side, we take advantage of TensorFlow and Keras for trying to learn from our data and make better predictions. Our front-ends are all React, with some Redux in there, usually for our state store. And Apollo for stitching together different GraphQL templates to sort of unify our data.”

The most important aspect of their jobs developing in this full-stack synthetic biology operation, Mitchell said, is accessibility from varied classes of users throughout the organization.

“At Ginkgo, you have these two worlds, I like to think of. One is sort of the physical sample-handling,” Mitchell said. This world involves the robotics platforms that expedite the physical laboratory work such as mixing liquids and centrifuging. “There’s a lot of sample-lineage tracking with that, which is essentially a giant graph of what samples, what reagents and what molecules were in that sample and now comprise a new sample — the tracking of how much of something there was, how much it took, which robot did it. That lets you get insight into things like where is my systematic variation coming into my analysis.”

Mitchell says the second world involves how that data is used, queried, processed and referenced.

“A lot of that is building different automated pipelines as well as enabling ad hoc pipelines for users to perform additional analyses or refine other measurements,” Mitchell said. “So a lot of that is handling things like ‘What is the provenance of your data?’ so ‘How do you make these analyses reproducible and how do you make them scalable?’ ‘How do you make them automated so that when somebody comes to the lab tomorrow, their answers are already sitting in front of them?’ ‘How do you make that data accessible to a variety of classes of users?’ We have users who are designing organisms, so they’re interested in biological questions. But the model at Ginkgo is that we distribute the work between different silos. We have the silo that is the people who are running the machines, and they also have access to that data, but they ask different questions like ‘What is the health of my instrument?’ ‘Where is most of my time being spent?’ ‘How can I further optimize my pipeline and increase the throughput and scale of details?’ So a lot of what my team does is say ‘How do we expose this data to different users to make it interactive at the many levels of scale that our users encounter?’ The person submitting experiments for the biological side might have 10 samples they’re looking at. The person running it might have 10,000 samples.”

Cahoon says the next step for his ‘Decepticon’ team is bringing on even more and speeding up the existing robotics platforms, but he says the work he’s already done at Ginkgo, and the organization itself, has been a perfect fit and a unique experience from a both a life science and computer science perspective.

“Biology has so much potential for doing things, like when we brought back the extinct flowers for example,” Cahoon said. “We’ve done that on the platform that I’ve worked on. That’s, I think, incredibly cool. I’ve also been very hands-on with the scientists, talking with them, coming up with things to really solve day-to-day issues and figure out how we can scale up the science. There are so many smart people here, so it’s just constant learning. And I think that’s just super special.”

Article Tags

AI, airflow, celery, Django, Docker, Ginkgo Bioworks, GraphQL, JavaScript, Jupyter, machine learning, MIT, Node, open source, RabbitMQ, Rancher, Redis, robotics, Ruby, TensorFlow

About Ian C. Schafer

Ian C. Schafer is a multimedia reporter and undeniable nerd living and working in New York City and on Long Island.

View all posts by Ian C. Schafer

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Biotech firms turns to open source for speed

Article Tags

Subscribe to SDTimes

About Ian C. Schafer

Related Articles

Amazon launches spec-driven AI IDE, Kiro

Akka introduces platform for distributed agentic AI

This week in AI dev tools: Gemini API Batch Mode, Amazon SageMaker AI updates, and more (July 11, 2025)

Docker Compose gets new features for building and running agents