Founded 10 years ago by a group of MIT scientists, Massachusetts-based biotech firm Ginkgo Bioworks has found great success in leveraging a number of open-source technologies to speed up and automate a wide variety of synthetic biology laboratory tasks. The organization’s main focus is the genetic engineering of compound-producing bacteria for a range of industrial applications and Ginkgo senior software engineers Dan Cahoon and Chris Mitchell spoke with SD Times about how their combined computer and life science backgrounds have given them a unique opportunity to flex their skills and utilize their specific educations outside of more traditional routes for programmers.
Cahoon, whose background is in chemical and physical biology, as well as computer science, is part of the ‘Decepticon’ automation sprint team at Ginkgo. Presently, they’re collaborating with automation company Transcriptic to begin incorporating robots into the laboratory pipeline. Cahoon works on the front and back end as well as architectural aspects of the robotics platforms.
“We’re working on onboarding what they call work cells, which are basically a collection of several robots that we’ve put together in the lab (with the big robot arms) to essentially automate all of these lab tasks that [the scientists] would normally have to do,” Cahoon said. Most of these tasks involve handling fluids — transporting, mixing and centrifuging chemicals.
With their varied technology stack, relying on plenty of well-known, open-source libraries, as well as some focused directly on biotech, Mitchell says that Ginkgo has seen huge growth in throughput and speed over the years.
“I think the number we’re hitting is a three-times increase year-over-year in our throughput for the past five or six years, and we’re continuing on at that scale, which is pretty staggering,” said Mitchell, a life science PhD in addition to his developer role at Ginkgo. “I have about nine years of benchwork, which is actually doing the physical experimentation. And I see a single individual at Ginkgo can carry out pretty much the entire operations of an entire academic lab in a week. Ginkgo is sort of like a full-stack engineering operation where you write the DNA and you stitch it together and you test it, you learn from it and you do the entire thing. So Ginkgo’s is very much completely vertically integrated into its space. In terms of the scaling and what our automation stack has enabled us to do is that a single individual can optimize thousands of organisms and have those organisms custom built and tested within a few weeks.”
Cahoon compared the work of two or three of the aforementioned robotics platforms to around 100 human lab workers, and says their automation efforts mean that projects which utilize similar techniques and on similar scale can be pushed through rapidly, providing more time for what he calls “cool offshoot projects.” This includes a recent experiment which saw scientists at Ginkgo sample DNA from a flower, extinct for around 100 years, which was preserved in a museum.
“We took their scent-producing genes and put them in our yeast platform and it has produced these smells from a flower that no longer grows,” Cahoon said. “So you can now smell at Ginkgo these flowers that are actually extinct.”
Though extant sources also provide fragrance profiles from DNA. In collaboration with a flavor and fragrance company, Ginkgo used the same yeast-based platform to produce the compounds that make roses smell the way they do, in mass.
Mitchell broke down what software goes where in this long chain from idea to trial to completed experiment.
“Essentially, our whole infrastructure is running on Docker, so everything is containerized, largely,” Mitchell said. “The orchestration of that right now is done by Rancher and so we use GitLab for spinning things up and down and handling our development and deployment lifecycle. In terms of running the work, we use a variety of back-ends for web servers, the majority being Ruby-on-Rails and Django. For some small microservices, we’ll use Slack. There’s some other miscellaneous things written in Go and Node, and that’s largely just because we have some library that we wanted to use that integrates support in Node. I think GraphQL is one of the best examples of that. That ecosystem was developed in JavaScript, so it makes sense to use Node to run that instead of some other layer. For running tasks and analyzing data, we use Jupyter. For a lot of the ad hoc analysis by users, Celery runs a lot of our work. Celery uses RabbitMQ as its broker with Redis as its back-end. And Airflow is another tool that we utilize. On the machine learning side, we take advantage of TensorFlow and Keras for trying to learn from our data and make better predictions. Our front-ends are all React, with some Redux in there, usually for our state store. And Apollo for stitching together different GraphQL templates to sort of unify our data.”
The most important aspect of their jobs developing in this full-stack synthetic biology operation, Mitchell said, is accessibility from varied classes of users throughout the organization.
“At Ginkgo, you have these two worlds, I like to think of. One is sort of the physical sample-handling,” Mitchell said. This world involves the robotics platforms that expedite the physical laboratory work such as mixing liquids and centrifuging. “There’s a lot of sample-lineage tracking with that, which is essentially a giant graph of what samples, what reagents and what molecules were in that sample and now comprise a new sample — the tracking of how much of something there was, how much it took, which robot did it. That lets you get insight into things like where is my systematic variation coming into my analysis.”
Mitchell says the second world involves how that data is used, queried, processed and referenced.
“A lot of that is building different automated pipelines as well as enabling ad hoc pipelines for users to perform additional analyses or refine other measurements,” Mitchell said. “So a lot of that is handling things like ‘What is the provenance of your data?’ so ‘How do you make these analyses reproducible and how do you make them scalable?’ ‘How do you make them automated so that when somebody comes to the lab tomorrow, their answers are already sitting in front of them?’ ‘How do you make that data accessible to a variety of classes of users?’ We have users who are designing organisms, so they’re interested in biological questions. But the model at Ginkgo is that we distribute the work between different silos. We have the silo that is the people who are running the machines, and they also have access to that data, but they ask different questions like ‘What is the health of my instrument?’ ‘Where is most of my time being spent?’ ‘How can I further optimize my pipeline and increase the throughput and scale of details?’ So a lot of what my team does is say ‘How do we expose this data to different users to make it interactive at the many levels of scale that our users encounter?’ The person submitting experiments for the biological side might have 10 samples they’re looking at. The person running it might have 10,000 samples.”
Cahoon says the next step for his ‘Decepticon’ team is bringing on even more and speeding up the existing robotics platforms, but he says the work he’s already done at Ginkgo, and the organization itself, has been a perfect fit and a unique experience from a both a life science and computer science perspective.
“Biology has so much potential for doing things, like when we brought back the extinct flowers for example,” Cahoon said. “We’ve done that on the platform that I’ve worked on. That’s, I think, incredibly cool. I’ve also been very hands-on with the scientists, talking with them, coming up with things to really solve day-to-day issues and figure out how we can scale up the science. There are so many smart people here, so it’s just constant learning. And I think that’s just super special.”