Ted Dunning, the newly appointed vice president of the Apache Incubator, is a Big Data scientist in a world of coders.

Currently the chief application architect at Hadoop distribution company MapR, the longtime Apache Software Foundation contributor and project mentor took over as the ASF’s vice president of incubation in April. Tasked with keeping Apache Incubator projects in accordance with open-source standards and with fostering new communities, Dunning will play an important role in nurturing the software and Big Data technologies the nonprofit organization supports over the next several years.

Dunning’s varied 40-year career sprouted from what he called a “compulsion to compute,” driven by a lifelong fascination with data: processing it, analyzing it and drawing insights from it back before it became very “Big.”

As vice president of incubation, Dunning said he sees his role as that of an open-source cheerleader.

“Apache doesn’t produce software; it doesn’t select projects,” he said. “Software comes to Apache and projects self-select. Apache is about building community first—one of the mottos is building community over code. We need to foster good projects that can build into good communities. I want Apache to be a very open and welcoming place, and the Incubator is the gateway.”

(Related: Hadoop and beyond: A primer on Big Data for the little guy)

Dunning has been involved with open source since the mid-1970s on projects such as the XPL0 programming language and the Apex operating system. Over the past several decades he also got a Ph.D. in computing science, worked on advanced computing research projects for DARPA (the U.S. government’s Defense Advanced Research Projects Agency), and joined or founded nearly half a dozen startups spanning behaviorally targeted ads, financial risk management insights, identity theft detection, and online streaming and recommendations for music, movies and TV.

In the late 2000s, Dunning began interacting with the Apache Software Foundation community, ultimately committing to and mentoring a plethora of projects along with joining MapR.

On its face Dunning’s career seems like a random assortment of research positions, business ventures and technologies. Yet underlying every professional decision, open-source contribution and new idea is the theme of identifying larger patterns. Whether in examining user behaviors or gleaning Big Data insights to optimize a larger process, Dunning comes at programming from an exploratory scientific perspective, always with a sense of wonder.

“I’m a geek who’s suddenly fashionable; never would’ve guessed,” he said. “The ability to go out and actually try to find these patterns is so exciting. A friend of mine used to talk about squeezing the brain of Mother Nature. Whether it’s astronomy, genomics, biology, commerce, how people speak and communicate, or how machines and networks communicate: These are all examples of how these patterns exhibit in the real world. I’m just stunned when people are unmoved by that.”

Photo credit: Ellen Friedman

Follow the data
The turning point toward Big Data for Dunning came in 1984 when he joined New Mexico State University’s Computing Research Laboratory to work on large-scale projects for DARPA. He experimented with projects on statistical symbol and genomic analysis, machine translation, and forays into computer vision and robotics.

The lab started as one of five centers of excellence funded by the state, but “Within a few years, we were one of the few human language technology (HLT) contractors for DARPA,” said Dunning. “That’s where a lot of the techniques came from that I’ve been able to apply in many different situations.”

In the mid-1990s, startup culture lured Dunning to California. He left New Mexico State in 1996 to work at Aptex, a startup spun off from HNC Software. There Dunning helped build the first behaviorally targeted advertising system primarily for the company’s biggest customer, the InfoSeek search engine, using what he called context vector technology to transform raw user data into ad insights.

“The ability to target ads based on what people did and what they clicked on was a very interesting opportunity,” said Dunning. “That work was based quite literally on research I’d done on sequences in symbols. I’d previously thought of the sequences as language, either human or genomic, but it could be applied to sequences representing things you typed into a query engine; places you visit and the content of websites.”

When HNC Software bought back Aptex in late 1999, Dunning continued his symbol sequencing work at Musicmatch. He applied the same data principles to build some of the first commercially viable music recommendation engines around early Internet radio, integrating the recommendations into streaming protocols. Dunning’s name can be found on several of the first patents around the technology.

When Yahoo bought Musicmatch in 2004, Simon Ferrett, a systems administrator at Slacker Radio who worked with Dunning at Musicmatch, followed as Dunning cofounded Veoh Networks, a user-generated video content platform that served as a precursor of sorts to YouTube. Veoh built video recommendation engines and generated behavioral analytics around the modern notion of multi-modal recommendation—looking at multiple kinds of behavior integrated into a coherent view of what causes people to act.

“If you can retain the full nuance of [users’] actions, whether it’s scrolling on a website, looking at reviews or playing a video, you can make much better-informed recommendations because they are talking to you, telling you what they like and don’t like,” said Dunning. “We also used that same behavioral knowledge to predict what was going to be popular, allowing us to populate a peer-to-peer network that acted almost as a self-organized content-delivery network to substantially decrease streaming costs.”

Ferrett spoke about how Dunning brought the same data-informed problem-solving perspective to Veoh, and also how the way he approached code then and now makes Dunning a good fit for his current role in the Apache Incubator.

“Whenever I had some issues with the code I was writing and wasn’t sure if I was attacking it the right way, Ted had a great way of looking at it,” said Ferrett. “Some of the code Ted writes hews a bit closer to the way a professor would write it—assuming an infinitely perfect computer with an infinite drive, etc.—but the concepts were sound. For him to be reviewing these sorts of startup projects seems like the best combination of applying that theoretical and analytical perspective to other folks’ code, mentoring to make sure it’s done in the correct manner.”

Dunning left Veoh in 2007, but between then and the beginning of his work with the ASF, he founded one more startup: ID Analytics. The company offered consumer risk-management software with real-time behavioral insights to identify credit and financial identity fraud. LifeLock, an identity theft protection company, bought ID Analytics in 2012.

“On the face of it, music recommendations, identity fraud, Internet advertising and genomics look very different, but at their deep heart they have a lot of similarities: The ways you find order and structure in these domains,” said Dunning. “We pioneered special kinds of database technologies around this idea of graph theoretical anomalies so we could find synthetic identities and run-of-the-mill identity fraudsters. I think we were the first to prove the existence of the synthetic identity industry.”

The open-source philosopher
Throughout Dunning’s research days and career progression through a string of startups, he stayed involved in the open-source community. Dunning’s work in open source began as an undergrad in electrical engineering at the University of Colorado in 1975, when he joined the 6502 Interest Group, one of the oldest computer clubs in the United States. Every Tuesday night they met at the Colorado School of Mines, which birthed XPL0, Apex OS, the FOCAL language and other breakthroughs, all hand-assembled and coded into a mainframe.

Dunning later earned his M.S. in computer science from New Mexico State University, and graduated with a Ph.D. in computing science from the U.K.’s University of Sheffield in 1999.

“If we go back to the dark ages of sorts, I’ve been involved in open-source software for a very, very long time and open-source has changed a lot. We have an Internet now,” said Dunning. “Open source used to be folks getting together and swapping floppies. Now worldwide you have these global communities, and the capabilities are just earth-shaking.”

In today’s world of GitHub and mainstream open source, the free software veteran brings a more measured open-source philosophy to mentoring ASF projects. Dunning began working with Hadoop in 2007 and 2008, participating at first on the mailing list and then as a committer for Apache Mahout, followed by committing to and mentoring projects like Apache Storm, Lucene, Flink, Kylin, Drill and Myriad.

Taylor Goetz, the project management committee (PMC) chair of the Apache Storm project and a technical staff member at Hadoop development company Hortonworks, said that as a mentor on the Storm project, Dunning helped steer debate about Storm’s initial incubation. According to Goetz, Dunning’s presence was important in guiding the Storm committee through what it meant to be an Apache project.

“When [Storm] first started in the incubator, none of us [PMC members] had any experience operating as an Apache project,” said Goetz. “So when we were accepted into the Incubator and it was kind of like ‘Finding Nemo’ when all the fish escape into the ocean in bags and one fish just says ‘Now what?’ Ted was really instrumental in helping us navigate those waters, figuring out all those processes and procedures around release licenses that can be pretty daunting.”

Photo credit: Philip Kademan

Though Hortonworks and MapR are competing in the enterprise Hadoop market, Goetz drew attention to a recent YouTube video where, when talking about his new role, Dunning symbolically took off his MapR hat. Goetz said it speaks to Dunning’s personality that he’s approaching this role from a vendor-neutral stance.

“That meant a lot to me. You have to learn to be an Apache person and embrace the Apache way, because our contributor licensing agreements are with Apache, not our employer,” said Goetz. “Ted is very reasonable and empathetic, which are two extremely important traits when contributing to open-source communities. He understands the Apache philosophy and the organizational dynamics at play. He doesn’t look at projects through rose-colored glasses. He sees places where improvements can be made in helping fledgling projects become successful.”

(Related: MapR declines Open Data Platform invitation)

Dunning talked about the interplay between the “great leaps” indicative of modern open-source development and the slow-and-steady progress of taking continuous steps to improve a piece of software. In ultimately getting an open-source project ready for enterprise adoption, he stressed an exacting emphasis on adherence to standards and licenses.

“Part of Apache’s core mission is making software that’s safe for restricted business environments, which means we as an organization pay huge attention to licensing hygiene,” Dunning said. “That’s often the furthest thing from an excited developer’s mind. It’s really important for building a community around commercial adoption. One of the key risks of open source is not knowing where that code came from. With Apache there’s traceability of every line of code and whose responsibility that piece of code was, and it’s a concern that needs to be met for projects that want to be in the big time.”

Incubating Big Data’s future
Dunning said his role in the incubator is one of facilitation, not control. He stressed that the ASF serves primarily as an open-source charity rather than a corporation. The ASF organizational structure rotates as well, so he knows his position heading up the incubator is not a permanent gig.

“It’s an opportunity to contribute in a new kind of way,” said Dunning. “It’s been a long and interesting ride, and it’s exciting to see how [open source] has progressed. Admittedly it’s much, much easier now because of acceptance and the communication we have now to work on open source.”

Going forward, Dunning has many ideas about where the ASF can expand grow through the Apache Incubator. One of his most active goals is extending the open-source Big Data communities farther into Europe and Asia.

“Kylin is a project that started in China’s eBay facilities and it’s now in Apache. There’s a cultural gap to be sure, but there’s huge enthusiasm around embracing open source,” said Dunning.

“The SINGA project originally out of Singapore University that deals with neural networks and deep learning was pushed into Apache and is now a very competitive machine learning project. Tajo is another project out of an Asian development group showing the same trend. It makes the world a lot bigger.”

Dunning also drew attention to the Incubator’s growing focus on integration-oriented projects such as Apache Zeppelin, which he said is breaking ground in providing visualization across different modes of computation. Finally, he mentioned a collection of science-related U.S. government projects including research from NASA’s Jet Propulsion Laboratory (JPL) coming into Apache as open-source projects.

While explaining his vision for the different paths the open-source community around Apache projects might take, that sense of wide-eyed excitement rung throughout. Dunning said his scientific fascination with patterns led him into Big Data, and it’s what motivates him to keep up.

“There are always funny contradictions in life,” said Dunning. “Because things change so quickly, no one really has more than five years of applicable experience in the Big Data world. But on the other hand some old fogeys—which I try not to be—complain about reinvention. Supercomputer guys complain about microprocessor people reinventing optimization techniques. Database people complain about Big Data people. I don’t see it so much as reinvention as this fascinating and joyful realization of patterns occurring over time and across domains. I find the fact that the world does exhibit order; exhibit patterns, as just wondrous.”