Monday, October 1, 2012

Irreproducible Results

In my BBS days (probably in 1992) I stumbled upon an interesting text file. It claimed to contain the instructions to build a nuclear bomb! It was formatted in a sensible manner and written in good English. The content was highly entertaining, though I wasn't sure if smoke detectors' americium is really fissible material.

Alright, the text was from The Journal of Irreproducible Results - I think the text was titled "Let's Build a Nuclear Bomb". At the end of the file, there were teasers of other articles, such as "Let's Build a Solar System" or "Let's Make Contact With an Extra-Terrestrial Civilization" (if I recall correctly). Sounds entertaining but not really reproducible.

Is research supposed to be reproducible? If you read a research paper and take a look at its fancy charts and other visualizations, can you actually reproduce the results? If the paper claims a discovery that enables you to build an atomic bomb out of smoke detectors, can you prove it right or wrong? (Sure, a paper like that would probably not pass peer review phase, but hypothetically...)

Today - 20 years after I found the text about building a nuclear bomb - research often involves huge amounts of data. Often, that data is actually proprietary. The results may be entirely based on secret data that is not available to anyone, and cannot be (easily) generated by anyone else than its owner and generator. Still, the research can be held in great value - actually, that is how different products are certified for use.

As astonished I am about the fact that reproducing results is difficult, I am also about the fact that some have actually acknowledged the problem within the scientific community and developed tools and approaches to alleviate it. Lately, I have been looking at knitr which is an R module for dynamic report generation. Using knitr, there can be a direct link between the document and the original data set(!). Another interesting approach is lightweight markup languages such as Markdown which I evaluated in my bachelor's thesis some years ago. In a way, this is "open source research", where you open up the methods and data to anyone who is interested in them.

As a sidenote: Some years ago, the Finnish-language Master's thesis Kotileikkej√§ ja lastenkasvatuspuuhia was written by Juhapekka Tolvanen. An important aspect - though in no way related to the topic of this thesis in the field of sociology - was that it was released with the GNU Free Documentation License and the LaTeX source code was (and is) downloadable from the author's homepage, thus making the thesis printable and publishable in different formats. However, plain LaTeX source code is not very readable (though Mr. Tolvanen might disagree) - and most of the formatting of the actual text content could have been done with Markdown.

For smaller projects it is maybe easier to incorporate transparency, distribution of data sets used in the research and demonstrating exactly how the results were produced, but can this approach be more widely adopted by peer-reviewed journals? For that, I don't know the answer, I just know the direction where I would like them to go and where I will go.