How to get (almost) everything you ever wanted in one (not very) easy step

Published: September 26th, 2012

At some stage in their careers, many software developers experience a gnawing feeling of being on the outside looking in: They read blogs, but they don’t write them; they use open-source projects, but they don’t contribute to any; they’re good at what they do, but they know they could be more.

If you’re like me, this gnawing feeling extends to a laundry list of aspirations you’d like to achieve “one day”:
• be part of an open-source project
• write a blog
• be an expert in some area
• speak at a high-profile conference
• write a book
• be more qualified

The problem, of course, is that each of these requires a substantial investment of your time. There are many reasonable objections against doing any of them: What if nobody uses your open-source project? What if nobody reads your blog? What have you got to speak about at a conference? Is all that studying worth it just for a qualification? And so on.

This article presents a way to reduce those objections, to lower the barrier of entry so that doing one (in fact, all) of the above becomes more palatable. You can do this by combining everything on the list around a single project; you get one, you get all. There’s still a lot of work involved, but it’s overlapping work. You get more bang for your buck.

Start with a Big Idea open source thesis
Let’s consider how to combine the laundry list into one, self-reinforcing project.

First, think of a topic for an open-source project. Then submit a proposal to your local university to do a doctoral thesis framed around it. Propose that your thesis focus on applicability to industry (i.e., to non-academics) so it will be developed in close collaboration with industry “practitioners” (developers). You’ll achieve this by generating feedback from regular open-source releases, writing blog entries, responding on message forums, conducting case studies, and through other media. If you manage to generate sufficient interest, you’ll automatically be an expert in your particular project. In turn, this may lead to conference speaker invitations or writing a book.

The important point is that each of the laundry-list items reinforces the other. Even if some don’t pan out individually—say, your blog isn’t very popular—they’ll still have strengthened the whole. And it’s a great marriage: Commercial developers can find a lot of value in strengthening their theoretical foundations, and researchers are really interested in successful examples of applying research methodologies to industry problems. Win-win.

You may wonder why you should listen to me on any of this. All I can offer is that I’ve done it: I’ve gone from having none of the items on the list to having an open-source project, a blog, a Ph.D., and being a speaker at industry conferences—all in a little over four years. (I haven’t written the book yet, but hopefully I’ve set myself up for one.)

So this approach has definitely been a great experience for me. How can it work for you?

Less talking, more doing
There’s already good information available on how to write blogs, start open-source projects, and write Ph.D.’s and most of the other items on the list. Where there isn’t much information is how to combine them: how to write an open-source project framed as a Ph.D. thesis. This article will offer concrete guidance. But before we begin, it’s worth clearing up some misconceptions about what doing a Ph.D. actually entails.

First, the good news: Ph.D.’s don’t really have any exams or courses, at least in the undergraduate sense. A Ph.D. is more like one big piece of coursework. And if you have a Masters degree or even just a solid Bachelors degree, you can probably enrol straight in.

The bad news: No exams and no courses means very little structure. If you have difficulty motivating yourself or working without supervision, a Ph.D. may not be right for you. But bare in mind, motivation and independence are also strong prerequisites for a successful open-source project, blog or book, so you’ll need to commit to this same skillset to accomplish pretty much any of the items on the laundry list.

Once you’re committed, a daunting question is: Where to start? It helps to understand your goal. The goal of any Ph.D. is to produce a thesis that, among other things, makes some “novel contribution to knowledge” and “demonstrates mastery of research.” Both of these are rather abstract phrases. You may be thinking, what does an open-source project in the form of a Ph.D. thesis actually look like?
#!
Objective: Your thesis must have an objective. That may sound obvious, but it can be surprisingly hard after years of research to succinctly define what your objective was. Doing so is critical. It presents a far more compelling thesis if your objective is clearly defined at the start and, more importantly, regularly reinforced throughout.

You need to ensure your thesis weaves a coherent narrative. Its introduction should state its objective, and its conclusion should conclude that objective. Your objective should be reflected in the title of your thesis. It should be referred to frequently throughout its pages as justification as to why you are researching each section. Each section should end by stating what the next section will accomplish. Each section should begin by summarizing what has been accomplished so far. Do everything you can to prevent the reader from becoming lost.

At the end of your Ph.D., your thesis must be examined. Examining a thesis is a brutal job. Your examiner is being asked to judge a piece of work written by somebody (i.e. you!) who has spent far more time studying a particular area than they have. They are unlikely to view your work favorably if they cannot understand it. And it helps enormously toward their understanding if your thesis is clear and consistent in its purpose.

Once you have clearly defined your objective, the next step is to orient it within the wider context of your chosen field. This is accomplished through a literature review.

Literature Review: An important goal of a Ph.D. is to make some “novel contribution to knowledge.” This implies an understanding of what the current knowledge is. Your thesis must spend considerable time reviewing the existing literature. It must research existing strengths and weaknesses, and identify shortcomings in the current body of work. It must define the “gap in knowledge” that it intends to fill.

As a software developer, reviewing peer-reviewed literature can seem like an alien activity. Many developers do not look sufficiently before they leap, and they end up repeating what’s already out there. Such a cavalier attitude will not be tolerated for your thesis. At the end of your Ph.D., in addition to your thesis being examined, you’ll be required to undergo a Thesis Defense. You must publicly present your research and be interrogated by your examiners and audience. Their job is to identify weaknesses in your work. It would be devastating if they pointed out your objective had already been achieved years earlier. You cannot make a claim to novelty if such knowledge already existed. To avoid this risk, your literature review must be exhaustive.

I found Google Scholar to be an excellent tool for searching peer-reviewed literature. It’s available for free, so you can start your literature review even before applying for your Ph.D. Some of the articles Google Scholar links to are also available for free. Downloading them will give you a taste for academic writing, which tends to be verbose: Every point must be expanded in detail, every argument supported by evidence and citations. Reading such articles will give you a good feel for the style of discourse.

Unfortunately, most articles Google Scholar links to require payment to download. For these, you should wait until you’ve enrolled in your course. Most university libraries pay annual subscriptions to literature databases and can absorb such payments for you.

You can also get a feeling for the existing body of work by searching open-source software on repositories such as SourceForge and GitHub. However, I was surprised how few research groups ran their projects as open source. It’s rare to find one that publishes its source code, even rarer to find one with a proper technical, social and political infrastructure. According to Jono Bacon’s book “The Art of Community,” that means it’s an organization that packages its project up into a distribution; performs regular releases; produces tutorials and examples; has a message forum; has a defect tracker; accepts contributions; engages with its community; and so on. This seems a missed opportunity, because it’s fertile ground for gathering feedback and observations.

Once you’ve completed your literature review, and are satisfied your objective is worthwhile, proper research can begin. But what constitutes “proper research”? The first step is to choose your epistemology and methodology.

Epistemology: Epistemology isn’t a word you hear very often outside of academic circles. It’s Greek for “theory of knowledge.” It’s a philosophical idea related to what we know and on what basis. The short of it is, everything you say in your thesis rests on certain base assumptions. Not everybody will agree with those assumptions. That’s perfectly defensible as long as you spell out what they are. But the problem is, some may be buried so deep in your subconscious that you overlook them.

To quote a leader in this field, Michael Crotty: “At every point in our research—in our observing, our interpreting, our reporting—we inject a host of assumptions… Such assumptions shape for us the meaning of research questions, the purposiveness of research methodologies, and the interpretability of research findings. Without unpacking these assumptions and clarifying them, no one (including ourselves!) can really divine what our research has been.” In his book, “The Foundations of Social Research: Meaning and perspective in the research process,” he puts it even more forcibly: “Without [clarifying our assumptions], research is not research.”
#!
Action Research: There are several accepted, academically rigorous methodologies that can frame an open-source project as a Ph.D. thesis. The methodology I chose was called Action Research. The idea is to break your overall research into smaller cycles of building something, reflecting on what you’ve built, and then building some more. If you’re thinking this sounds a lot like the industry practice of Iterative Development, you’d be right. In fact the phases of a typical Action Research cycle fit neatly into the stages of Iterative Development, providing an elegant way to marry industrial practice with research methodology.

The main difference is that the “reflecting on what you’ve built” phase needs to be much longer and more rigorous for Action Research than you’d typically undertake for Iterative Development. But that turns out to be a great thing for industrial practice.

One of the most important factors in software development is scope: deciding what to include and what to leave out. Scope creep and feature bloat are recognized risks, impacting development costs and release schedules. Good developers carefully apply rules of thumb as Google’s Joshua Bloch explained in his presentation, “How to Design a Good API and Why It Matters”: every design decision should “minimize conceptual weight.” We should strive to “kill several birds with one stone.” But an implicit difficulty in evaluating this is knowing what the birds are.

Once out of their initial planning phases, software projects have a tendency to lurch from immediate issue to immediate issue, dealing with each new requirement as it arises. Considering requirements in isolation invariably means the burden of large-scale redesign to satisfy any one requirement will seem onerous; a smaller-scale, more imperfect but less impactful alternative will always seem the better option. Over time, many such small, imperfect design decisions inevitably degrade the quality of the software.

Reflection, on the other hand, allows you to consider many weeks’ worth of problems in a holistic light: You can see all the birds at once, and an approach that once seemed over-engineered now appears justified. Surfacing all the issues at the same time clears a path forward that otherwise would have seemed prohibitive.

Another benefit of choosing Action Research is it intrinsically defines a structure for your thesis. You can do three or four Action Research cycles over the course of your Ph.D., and arrange them as chapters. In turn, each chapter can be broken down into the “planning,” “acting,” “observing” and “reflecting” phases of each cycle. This immediately helps focus your research.

I did, however, find this structure to be a double-edged sword. A common problem a lot of Ph.D. candidates face is doing all their research—and little of their thesis writing—up front. Then after several years, they face the stressful task of trying to document all their work as a coherent narrative. Action Research can help here, because it permits you to document each cycle as a self-contained chapter as you go.

However, it can also hurt. I discovered this because, while I wrote up most of my research each year, I confess I left a few sections unfinished. Action Research makes it much more difficult to go back and complete those sections later, because so much of Action Research is concerned with reflections. Proper reflections are constrained by the context of their time. Attempting to cast one’s mind back not just to the particular thoughts, but also to the overall mindset of two or three years prior, is a formidable task. It requires you to remember what you knew, what you didn’t know, and what you thought you knew but have since learned differently. And you must disregard that more recent knowledge, lest you end up writing a revisionist history. It’s very difficult, and I would warn against trying to document Action Research retroactively.

Action Research was a hit with my examiners. One wrote, “I was very moved by the candidate’s choice of methodology… So many [candidates] in computing disciplines propose new architectural approaches and prototype their approach, but [ultimately] readers are [only] left with ideas, which might work if only someone in industry ever adopted them. [This candidate’s] approach… meant that his system is proven both to academic and to industry developer communities. As an examiner, on realising how widely used the system is and then reading the adoption study results, I breathed a sigh of relief. What a delight to see good ideas actually used!”

Having broken your thesis into a series of Action Research cycles, it remains to discuss what you should do within each of these cycles. Much of this comes down to simply sound software engineering: planning, developing, testing, documenting. An important part is the holistic reflection discussed earlier. Another important part is how to gather feedback and observations from industry.
#!
Grounded Theory: If you’re going to be developing your thesis in close collaboration with industry developers, that collaboration needs to be framed in a research context. This can take the form of interviews, case studies, adoption studies, or other academically recognized approaches.

A useful methodology here is Grounded Theory. It’s well suited to interviewing because it helps you gather observations without biasing your interviewees. Grounded Theory proceeds by first gathering the data, then codifying the data into categories and themes, and finally constructing hypotheses. It stresses that outcomes should be explicitly emergent, rather than simply verification of existing hypotheses.

You compose lists of directed but unbiased, open-ended questions that give people room to talk openly, while at the same time guiding them into the gap defined in your literature review. You then document, compare and categorize what emerges. Ideally the resultant categories will include your own understanding of the problem, but should also expose perspectives you hadn’t considered.

Like reviewing literature, interviewing is an alien activity for most developers. You may find your interview sessions start out stilted and take some time to warm up, both for the interviewer and the interviewee. This is understandable. Users of open-source software are conditioned to sporadic levels of support. Most community message forums, blogs and defect trackers are maintained on a voluntary basis. Responses are not guaranteed and can be minimal. It is therefore surprising (and highly regarded) for an open-source team to not only respond to a user’s immediate query, but also follow up to understand their broader use case. Users generally respond well to such diligence: “[this author] is a very driven developer who really puts effort in trying to match [his or her project] with the use cases of the users.” Those users that consent to interviews provide valuable input to the evolution of the project, something they have a vested interest in.

Similarly, many researchers have a deep interest in reading conversations with industry developers. Such conversations can be hard for researchers to come by for several reasons. First, interviewing and analyzing results is a time consuming process. Second, it can be difficult for researchers to gain access to industry figures. Third, it can be expensive to allocate budget to conduct such field operations.

All these factors mean that any work produced in this area is valuable and generally well received. As one of my early reviewers put it: “The authors include results from interviews conducted on senior software development practitioners, that actually justify their thoughts… The work is exceptionally well motivated.” My examiners agreed: “In addition to the [Action Research] methodology, the candidate skilfully uses a variety of techniques and tools (interviews, forums, blogs with special attention to industry practitioner’s feedback) for collecting information later used to design further research cycles. These [are] cleverly mixed, providing meaningful research results, difficult to dispute.”

This last point, that it makes your research difficult to dispute, is a decisive weapon in your arsenal. You’ll be on much stronger ground during your Thesis Defense if you have a battery of detailed quotes from industry developers vouching for the efficacy of your research. Another way to make your research difficult to dispute is to have portions of it peer-reviewed ahead of time. This can be accomplished through conference papers and journal articles.

Release early, release often
As mentioned earlier, one of the dangers Ph.D. candidates face is producing several years of research without ever trying to document it into a coherent narrative. They then face an uphill task at the end. One excellent way to mitigate this risk is to apply the software development practice of Release Early, Release Often to your thesis itself.

You can achieve this by periodically packaging up sections of your thesis and publishing them as “findings to date.” Such findings are typically published either as 10- to 15-page conference papers or 20- to 25-page journal articles. Given your overall thesis will run to approximately 100 pages, you should be able to extract four to six such publications over the course of your Ph.D.

There are different rankings of both conferences and journals, with different levels of difficulty in being published in them, and corresponding levels of prestige if you do. You should make sure you target realistic publications; you’re unlikely to be accepted by Science or Nature! Personally, I found journal articles easier than conference papers. Although they’re required to be longer and more detailed, you’re generally given multiple chances to revise and resubmit based on reviewer feedback.

I found this cycle helpful to improve the text and have the article accepted. Conversely, conference papers are generally a one-shot affair: You write the paper, submit it, and if it doesn’t get accepted, you have no recourse. You can submit to a different conference, but then you’re met with a different panel of reviewers who may have different judging criteria.

Many software developers begin their careers by graduating from university. On graduation day, they are surrounded by their peers. Some will go on to work in academia, most into the industry. The way it’s supposed to work is that the academic community surveys the trail up ahead, cutting through the dense forest of possibilities and clearing a promising path. Industry then follows behind, laying down a proper road and bringing in the heavy equipment.

Unfortunately, in practice, academia and industry tend to fork shortly after graduation, pursuing tangential goals with limited communication between them. This is a huge shame. There is significant value in maintaining close relations between research and industry: Commercial developers can benefit from strengthening their theoretical foundations, and researchers can draw lessons from successful applications of research methodologies to industrial problems. Ultimately, this combination works to the benefit of all.

I offer my own thesis as a concrete example. The examiners summarized it as “a very satisfying, inspiring, carefully laid out, meticulously planned and executed thesis.” So something like this should stand you in good stead.

Richard Kennard is an independent consultant with more than 15 years of industry experience. He is an active member of the open-source community, the founder of Metawidget, and a speaker at conferences, including JavaOne and Red Hat Summit. He holds a Ph.D. in Software Engineering from the University of Technology, Sydney.

Article Tags

open source, Ph.D.

About Richard Kennard

View all posts by Richard Kennard

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

How to get (almost) everything you ever wanted in one (not very) easy step

Article Tags

Subscribe to SDTimes

About Richard Kennard

Related Articles

Google’s Agent2Agent protocol finds new home at the Linux Foundation

Open source wins again! Redis adds GNU AGPL license to its offering

Report: Keeping up with patches is the number one challenge when using open source software

Sonatype reveals 18,000 malicious open source packages in its Q1 Open Source Malware Index