5.3. Concepts: Reports, Charts, Streams, Y-Axes, Reduction Functions, Filter Functions

The previous section provided a kind of introductory tour of some of the telemetry facilities available in Hackystat. Before we get to the details into the telemetry language facilities, let's first introduce the basic building blocks of Software Project Telemetry--reports, charts, streams, y-axes, reduction functions, and filter functions---and see how they relate to each other. Figure 5.9, “ The relationship between telemetry reports, charts, and streams ” illustrates three of these building blocks.

Figure 5.9.  The relationship between telemetry reports, charts, and streams


The relationship between telemetry reports, charts, and streams

5.3.1. Telemetry Reports

A Telemetry Report is a named set of Telemetry Charts that can be generated for a specified Project over a specified time interval. The goal of a Telemetry Report is to discover how the trajectory of different process and product metrics might influence each other over time, and whether these influences change depending upon context.

For example, Figure 5.9, “ The relationship between telemetry reports, charts, and streams ” shows a Telemetry Report consisting of two Charts. Both Charts show "Unit Test Dynamics Telemetry", which is an analysis of trends in the percentage of Active Time allocated to testing, the percentage of source code devoted to testing, and the percentage of test coverage that results from this effort and code. The Charts share the same time interval (from the week of 20-February-2005 to the week of 05-June-2005) and project (hacky2004-all). The Charts differ in that one shows Unit Test Dynamics data for the hackyZorro module within the hacky2004-all project, while the other report shows Unit Test Dynamics data for the hackyCGQM module with the hacky2004-all project. Interestingly, the Unit Test Dynamics telemetry trends for the two module have a very different shape, indicating differences in the underlying approach to development of these two modules.

5.3.2. Telemetry Charts

A Telemetry Chart is a named set of Telemetry Streams that can be generated for a specified Project over a specified time interval. The goal of a Telemetry Chart is to display the trajectory over time of one or more process or product metrics.

For example, Figure 5.9, “ The relationship between telemetry reports, charts, and streams ” shows two instances of the same Telemetry Chart. This chart, entitled "Unit Test Dynamics Telemetry", contains three telemetry streams: ActiveTime-Percentage, JavaSLOC-Percentage, and JavaCoverage-Percentage. You can see references to these three streams in the legend accompanying each Chart. The legends associated with the two Charts illustrate that Streams can be parameterized: the top Chart contains streams that are parameterized for the hackyZorro module, while the bottom Chart contains streams that are parameterized for the hackyCGQM module.

5.3.3. Telemetry Streams

Telemetry Streams are sequences of a single type of process or product data for a single Project over a specified time interval. Telemetry Streams are best thought of as a kind of Abstract Data Type for representation of one or more series of metric data values of a similar type. In addition, Telemetry Streams support basic arithmetic operations.

Figure 5.9, “ The relationship between telemetry reports, charts, and streams ” shows three kinds of telemetry streams, each with its own color. The red line in each chart is an ActiveTime-Percentage stream, the blue line in each chart is a JavaSLOC-Percentage stream, and the green line in each chart is a JavaCoverage-Percentage stream.

Figure 5.10, “ Definition of ActiveTime-Percentage Telemetry Stream ” shows the definition of the ActiveTime-Percentage telemetry stream.

Figure 5.10.  Definition of ActiveTime-Percentage Telemetry Stream


Definition of ActiveTime-Percentage Telemetry Stream

This definition shows that Telemetry Streams can be parameterized, and that the definition of a Telemetry Stream can consist of an arithmetic expression whose operators consist of Reduction Functions. Note also that when defining a Telemetry Stream, you specify the "visibility" of the definition, which can be either "private" (available to you only) or "global" (available to all users of this server).

5.3.4. Telemetry Y-Axis definitions

Telemetry information is typically represented as chart displaying a two dimensional trend line with a single X axis and one or more Y axes. In Telemetry, the X axis values are time values determined by the values chosen by the user with the "Interval" selector when invoking the analysis. Thus, the X axis always consists of a sequence of consecutive days, or weeks, or months.

The Y axis (or axes) values in a telemetry chart depend upon the stream expressions that are provided in the chart definition. We have found that there are three parameters of the Y axis that telemetry chart designers prefer to control: (1) the label that appears with the axis, which indicates the units associated with this Y axis values; (2) whether the values that are generated along the Y axis should be integers or doubles (i.e. contain a fractional part), and (3) whether the chart should be generated with a fixed upper and/or lower bound (as opposed to letting the chart layout mechanism decide automatically what the lower and upper bound values are based upon the actual data being displayed).

For example, consider a chart that displays a stream expression that divides one stream returning successful tests by another stream returning the total number of tests and multiplies the result by 100. The designer of that chart intends the stream expression to represent "Test Success Percentage", and (perhaps to facilitate visual comparison with other charts) would like the Y axis to always start at zero and end with 100.

To enable this control over the Y axis, the Telemetry Definition Language includes a construct called y-axis. Figure 5.11, “ Definition of a Y-Axis ” shows the definition of a Y axis conforming to the example scenario above.

Figure 5.11.  Definition of a Y-Axis


Definition of a Y-Axis

As you will see later, every stream expression in a Chart definition must be associated with a defined Y Axis such as the "percentageYAxis" definition given above.

5.3.5. Telemetry Reduction Functions

Telemetry Reduction Functions are Java classes that form the "atomic" building blocks for the definition of Telemetry Streams. While Hackystat users can interactively define, edit, and delete their own Streams, Charts, Y-Axes, and Reports using a web-based interface to a Hackystat server, Telemetry Reduction Functions are "hard-wired" into a server at the time it is built. You can think of Reduction Functions as the fixed "alphabet" from which any number of "words" (Telemetry Streams) can be created by users. These "words" can be composed into any number of "sentences" (Telemetry Charts), and the sentences can be composed into any number of "paragraphs" (Telemetry Reports).

Figure 5.10, “ Definition of ActiveTime-Percentage Telemetry Stream ” illustrates that the ActiveTime-Percentage Telemetry Stream, which can be interactively defined by a user, is defined in terms of the ActiveTime Reduction Function.

[Note]Note

Just as "atoms" are not literally atomic, and can be further decomposed into protons, neutrons, and electrons, Telemetry Reduction Functions are not literally atomic. In reality, a Telemetry Reduction Function tends to be composed from operations on Hackystat internal data structures, such as DailyProjectData instances or even raw sensor data. However, this internal decomposition is not observable at the user level.

5.3.6. Telemetry Filter Functions

As will be discussed further below, a potential problem can arise with the use of Telemetry expressions on large systems: the creation of Telemetry Charts with dozens or hundreds of different trend lines. When a Chart gets complex in this way, its usability decreases, and it becomes difficult or impossible to extract meaningful information about the process or products of development.

Telemetry Filter Functions are special Java classes that can be used to extend the "alphabet" of the Telemetry Definition Language, just like reduction functions. Unlike reduction functions, which provide a bridge between the raw sensor data and the user-level telemetry language, filter functions provide a way for users to control which of the potentially dozens or hundreds of individual streams returned by a reduction function actually appear on a Chart.

As a simple example, the user might observe that a telemetry chart contains many dozens of trend lines, but only a handful contain non-zero data values--the remainder are just "noise". This could happen, for example, in a Project containing many dozens of members where only a few of them are working on the Project at any one time. In this case, the user could add the "FilterZero" filter function to the streams definition in order to produce a Chart that contains only the trend lines for members with non-zero Active Time.

5.3.7. Why is this language so complicated?

By now, some of you are undoubtably wondering why the Telemetry Definition Language is so complicated---surely there is a simpler way to show metrics without all of these strange language constructs? While there may well be a simpler way (and we have not stopped searching for it), the current form of the language is the simplest one that we have discovered so far that satisfies the underlying requirements for the language. In a nutshell, these requirements are: (1) Do not hard-code specific metrics or analyses into the telemetry language; (2) Avoid combinatorial explosions; and (3) Automate display and presentation.

Requirement (1) results from the fact that Hackystat is a "framework" that is configured for a given software engineering measurement context with specific sensors and sensor data types. For example, the Hackystat-UH configuration provides metrics and analyses geared toward Java-based software development in a student context. An example of a useful telemetry report in this configuration is one that helps students "load balance", or allocate approximately equal amounts of Active Time to their projects. On the other hand, the Hackystat-HPC configuration provides metrics and analysis geared toward high performance computing software development using C++. An example of a useful telemetry report in this context is one that helps understand how the proportion of "serial" vs. "parallel" code changes over the course of development. Because different software engineering contexts result in radically different kinds of measures and analysis goals, the telemetry language must be independent of these constructs. To provide this independence, the language provides the Reduction Function API to define the "alphabet" for the Telemetry Language, since this enables a given configuration of Hackystat to provide whatever alphabet is appropriate to the kinds of analyses useful to that context.

Requirement (2) results from our direct experiences with Telemetry. One thing we discovered was that telemetry-based inquiry often led to the wish to compare the same telemetry streams across different developers, or different modules, or even some combination of developers and modules. In the case of the Hackystat Project, with approximately 10 active developers, 70 modules, and dozens of different forms of sensor data, the set of potentially interesting telemetry charts and reports is very large. To combat this problem, we factored the language into four parameterizable levels (streams, charts, y-axes, and reports), which minimizes the total number of definitions required while maintaining flexibility in what can be displayed.

Another consequence of Requirement (2) on the Telemetry Language is the ability to create arithmetic expressions during the definition of telemetry streams. Many interesting streams involve additions of streams (such as the sum of active time on multiple modules) or divisions (such as calculating of the percentage of code constituting test cases.) By providing arithmetic expressions in telemetry streams, we eliminate the need to implement all of these variations in reduction functions.

Finally, we discovered that charts could easily become overly complex with dozens or hundreds of trend lines for Projects with many modules or developers. This led us to address this form of combinatorial explosion with Filter Functions.

Requirement (3) results from our belief that developers don't want to fiddle with the fonts, colors, and layouts associated with telemetry. Instead, we designed the language so that reasonable chart and stream layouts appear without requiring any explicit specification from the user. We learned through experience that developers do want a certain amount of control over the representation of Y-axis values, and introduced the y-axis language construct as a response.

While the above provides us with a soul-soothing rationalization for the complexity of the Telemetry Definition Language, it does not dispute the fact that the Telemetry Language is, indeed, complicated. We have already discovered one heuristic for Telemetry design that has proven helpful in simplifying the user-level experience of the language: make Report definitions parameterless. We have found that the hardest part of Telemetry for new users is the "Parameter Values" field. In our configurations, therefore, we typically define Reports to invoke Charts with all parameters specified. This means that new users can use the Telemetry Report command to "play around" without needing to understand the details of the language: they simply specify a project, a time interval, and a report, and that's enough to generate a valid request to the server.

The next sections document how to define Telemetry Streams, Charts, Reports, and Y-Axes. The Reference Guide contains chapters documenting the Telemetry Reduction Functions and Telemetry Filter Functions available in this configuration.