11.3. An example experiment: TFDvTLD

To help understand the Hackystat facilities for experimentation, let's consider the following hypothetical experiment. Suppose that we want to investigate whether differences exist between the test first design (TFD) method and the test last design (TLD) method. (To oversimplify, think of "test first design" as an approach where you write each unit test before writing the code that implements the unit test, while "test last design" is the more traditional approach of writing the unit tests after the corresponding production code has been implemented.) Let's call this experiment TFDvTLD.

Our hypothetical hypothesis is that people who do TFD will take longer to finish a sample program than people who do TLD, but that people who do TFD will generate unit tests that exhibit higher coverage of the system than the unit tests written by people using TLD.

Our TFDvTLD experiment will share a design property of many (but not all) experimental studies: the approach will be to split the set of subjects into groups, where members of different groups do different things. (By the way, the experimental jargon for the "different things" that a group of subjects do is called "treatment", and each type of "different thing" is called an "independent variable"). Thus, in our TFDvTLD experiment, there will be two treatments: one treatment will consist of using the TFD method, while the other treatment will consist of using the TLD method. (Thus, the independent variable is the "development method", and it has two values, "TLD" and "TFD".)

To test the hypothesis of our TFDvTLD experiment, we will want to collect two kinds of data: the amount of time each subject spent developing their system, and the coverage resulting from the unit tests. (In experimental jargon, these are called the "dependent variables".)

Let's assume that we are just starting this research, so we decide to begin with a pilot study in which we try out the experiment with just two subjects whose emails are "smith@hawaii.edu" and "jones@hawaii.edu". They will develop a simple system called "bowling" that computes the scores associated with a bowling game. The goal of this pilot study is not to confirm or disconfirm the hypothesis, but rather to validate the experimental design. For example, we might want to make sure that when we ask "smith@hawaii.edu" to do TFD, we can verify that he is really doing TFD. We might also want to make sure that the way we measure development time for the subjects actually corresponds to the time it took for them to do the work. Once we feel confident of our experimental design via one or more pilot studies, then we can go ahead and unleash it on a larger number of subjects to actually test the hypothesis.

A final caveat: the experimental method described in this example arises from the "positivist" scientific tradition in which the research goal is to discover causal relationships (such as "if you use test first design, then you will obtain higher test case coverage"). However, this is just one of many valid approaches to software engineering experimentation. There are many alternative experimental methods that do not involve controlling the behavior of subjects by asking them to carry out a prescribed task in a prescribed way with the goal of discovering causal relationships. For example, an alternative method called "grounded theory" involves close monitoring of individuals and groups as they carry out their daily tasks, with the goal of constructing theories that explain aspects of their behavior. We believe that Hackystat has a role to play in these contexts as well, and we hope in future to provide more specialized support in Hackystat for them.

Having introduced our example experiment, let's now see how we can use Hackystat to support its configuration, management, and analysis.