As a software developer, reviewing peer-reviewed literature can seem like an alien activity. Many developers do not look sufficiently before they leap, and they end up repeating what’s already out there. Such a cavalier attitude will not be tolerated for your thesis. At the end of your Ph.D., in addition to your thesis being examined, you’ll be required to undergo a Thesis Defense. You must publicly present your research and be interrogated by your examiners and audience. Their job is to identify weaknesses in your work. It would be devastating if they pointed out your objective had already been achieved years earlier. You cannot make a claim to novelty if such knowledge already existed. To avoid this risk, your literature review must be exhaustive.

I found Google Scholar to be an excellent tool for searching peer-reviewed literature. It’s available for free, so you can start your literature review even before applying for your Ph.D. Some of the articles Google Scholar links to are also available for free. Downloading them will give you a taste for academic writing, which tends to be verbose: Every point must be expanded in detail, every argument supported by evidence and citations. Reading such articles will give you a good feel for the style of discourse.

Unfortunately, most articles Google Scholar links to require payment to download. For these, you should wait until you’ve enrolled in your course. Most university libraries pay annual subscriptions to literature databases and can absorb such payments for you.

You can also get a feeling for the existing body of work by searching open-source software on repositories such as SourceForge and GitHub. However, I was surprised how few research groups ran their projects as open source. It’s rare to find one that publishes its source code, even rarer to find one with a proper technical, social and political infrastructure. According to Jono Bacon’s book “The Art of Community,” that means it’s an organization that packages its project up into a distribution; performs regular releases; produces tutorials and examples; has a message forum; has a defect tracker; accepts contributions; engages with its community; and so on. This seems a missed opportunity, because it’s fertile ground for gathering feedback and observations.

Once you’ve completed your literature review, and are satisfied your objective is worthwhile, proper research can begin. But what constitutes “proper research”? The first step is to choose your epistemology and methodology.

Epistemology: Epistemology isn’t a word you hear very often outside of academic circles. It’s Greek for “theory of knowledge.” It’s a philosophical idea related to what we know and on what basis. The short of it is, everything you say in your thesis rests on certain base assumptions. Not everybody will agree with those assumptions. That’s perfectly defensible as long as you spell out what they are. But the problem is, some may be buried so deep in your subconscious that you overlook them.

To quote a leader in this field, Michael Crotty: “At every point in our research—in our observing, our interpreting, our reporting—we inject a host of assumptions… Such assumptions shape for us the meaning of research questions, the purposiveness of research methodologies, and the interpretability of research findings. Without unpacking these assumptions and clarifying them, no one (including ourselves!) can really divine what our research has been.” In his book, “The Foundations of Social Research: Meaning and perspective in the research process,” he puts it even more forcibly: “Without [clarifying our assumptions], research is not research.”
Action Research: There are several accepted, academically rigorous methodologies that can frame an open-source project as a Ph.D. thesis. The methodology I chose was called Action Research. The idea is to break your overall research into smaller cycles of building something, reflecting on what you’ve built, and then building some more. If you’re thinking this sounds a lot like the industry practice of Iterative Development, you’d be right. In fact the phases of a typical Action Research cycle fit neatly into the stages of Iterative Development, providing an elegant way to marry industrial practice with research methodology.

The main difference is that the “reflecting on what you’ve built” phase needs to be much longer and more rigorous for Action Research than you’d typically undertake for Iterative Development. But that turns out to be a great thing for industrial practice.

One of the most important factors in software development is scope: deciding what to include and what to leave out. Scope creep and feature bloat are recognized risks, impacting development costs and release schedules. Good developers carefully apply rules of thumb as Google’s Joshua Bloch explained in his presentation, “How to Design a Good API and Why It Matters”: every design decision should “minimize conceptual weight.” We should strive to “kill several birds with one stone.” But an implicit difficulty in evaluating this is knowing what the birds are.