The glass box design philosophy
There is an interesting paradox in context of developing data analysis software. On one side, there are clear benefits of designing tools that are easy to use, robust and require as little manual intervention or user expertise as possible. Such design philosophy allows more users to take advantage of the tools and apply them automatically to large heterogeneous datasets. On the other side, blindly applying tools that are not fully understood or do not provide useful information on whether the input data meets their assumptions can raise serious concerns. Developers take not only great pride in the quality of their software but also feel responsible for how the software is being used. Inexperienced users can misuse a “black box” tool and obtain misleading results. Whether we like it or not, such situations can lead to bad reputation misattributed to the tool itself.
Ease of use seems to be at odds with avoiding misuse. Extending your user base to less experienced users can lead to mistakes. Is there a way of designing “black box” tools that can minimize misuses? I believe there is — I call it the glass box philosophy.
The glass box principles
Write educational documentation
The documentation for a data analysis tool should aspire not only to describe how to run the tool but also to explain the theory and assumptions behind the analysis. In this, it should resemble more of an academic handbook than instruction manual. It does not mean that developers must come up with their original content if such exists already. The purpose of educational documentation is that if the user wants to understand how a tool works they can learn about it from documentation in a programming language agnostic way.
Verify or visualize assumptions
Most data analysis tools will consist of several steps each with their distinctive assumptions. If a step fails to produce expected results (in a silent way) it could have catastrophic consequences down the road. Unfortunately, in many cases, it is very hard or even impossible to programmatically verify if the results of a particular step meet quality requirements of the subsequent step. Thus, it is important that data analysis software provides an option to verify or visualize those assumptions. Such reporting capabilities might not be necessarily taken advantage by the user directly but can provide an extra layer of transparency. For example, more experienced users or reviewers of a paper describing the results could audit the reports generated by data analysis.
Guide dissemination of the results
Obviously, the purpose of running a data analysis tool is to learn something about the data from the results. In clear majority of the cases, users will share the findings with people who were not involved in the analysis and might not know the details of the tool. Whether it is an internal report or an academic paper the user that runs the tool and obtained the result bears the responsibility of explaining to others what the analysis entailed. Here the tool developers can also help by providing boilerplate language summarizing the inner workings of the tool (referencing relevant external materials when possible). Peers of the user who performed the analysis will appreciate such feature because it allows them to understand better what exactly happened to the data. In case of robust tools that use heuristics to adapt to the input data, the boilerplate language should also automatically adapt to accurately describe the analysis path that was taken.
FMRIPREP — an example of a glass box application
To better understand how a glass box application should look like let’s have a look at an example. FMRIPREP is an MR data preprocessing tool that takes whatever comes out of a magnetic resonance scanner and prepares it for higher level analysis. It was designed to adapt to a range of different scan types and use heuristics to provide quality results on a data produces by different scanners. The robustness and ease of use make it appealing to use, but also susceptible to misuse. So how does FMRIPREP implements the glass box principles?
Educational documentation
Documentation provided by the developers of FMRIPREP goes beyond the instruction how to use it. It includes a detailed explanation of the data processing workflow — together with figures and references to relevant literature. This rich documentation allows interested users to understand what happens to their data. The documentation does not rely on any knowledge of Python (which is the language FMRIPREP is written in).
Reports
Preprocessing performed by FMRIPREP consist of many interdependent steps. Some of them cannot be validated in an automatic way which leads to a need for visual reports. For every processed piece of data, FMRIPREP produces an HTML report that includes figures and animations designed to highlight different data processing steps. Those reports enable users to quickly verify the validity of individual steps without the need to write any custom code or open intermediate results using specialized software.
Citation boilerplate
FMRIPREP is targeted for research use, and thus its uses will most likely lead to scientific publications. One cannot assume that readers of those publications will be familiar with FMRIPREP, so there is a need to provide an abbreviated description of processing performed by FMRIPREP. The documentation website provides such boilerplate text ready to be reused in publications that used FMRIPREP. Because FMRIPREP runs the slightly different type of processing depending on the inputs, the boilerplate text can be easily adapted via JavaScript controls listing different input options.
Cost
I hope I made a convincing argument for building robust, easy to use software that also excels in transparency. It is worth noting that the extra steps that need to be taken to turn a black box analysis tool into a glass box analysis tool require extra effort. After all guides for interpreting results and the educational documentation will not write itself and code for reporting tools will not appear out of anywhere. Nonetheless, I do feel that the glass box philosophy is worth pursuing and may reduce the amount of user support necessary for the analysis tool. Furthermore, some of the addition necessary to turn your app into a glass box (for example documentation) could be contributed by users themselves. This is a great opportunity to grow your open source contributor network.
PS I by no means invented the term “glass box” — it has been used previously (for example in the context of software testing). However, because this term fits so well with these design principles I decided to hijack it.
Originally published at blog.chrisgorgolewski.org.