5.8. Managing Chart and Report complexity with Filter Functions

Now that you understand how to define Streams, Y-Axes, Charts, and Reports, and understand their relationship to each other, it is possible to motivate the final language construct in Software Project Telemetry: Filter Functions.

When Software Project Telemetry is applied in "larger scale" software development settings, simple definitions can often produce Charts with dozens to hundreds of lines, creating a real usability problem. For example, the Hackystat software system has over 70 modules (i.e. top-level workspaces), which means that the Streams created by the "Workspace" Reduction Functions when supplied with a wildcard parameter will have over 70 lines. Such Charts are almost always difficult to display and interpret. Example 5.1, “Top Level Workspace Coverage without Filtering” illustrates how this problem can occur with a simple set of Telemetry definitions along with the Chart that results when it is invoked on the Hackystat project.

Example 5.1. Top Level Workspace Coverage without Filtering

y-axis yAxis(label) = {
  label, "double", 0, 100
};

streams TopLevelWorkspaceCoverageStreams() = {
  "Coverage by top level workspaces",
  WorkspaceCoverage("Percentage", "**", "line")
};

chart TopLevelWorkspaceCoverageChart() = {
  "Coverage by top level workspaces",
  (TopLevelWorkspaceCoverageStreams(), yAxis("Percent"))
};

One possible way to avoid the Chart complexity in the example above is to replace the wildcard specification of modules with an explicit specification of the modules to include in the chart. But that creates problems of its own: how do you pick which modules to group together in a chart? Is it really much of win to replace 1 chart containing 70 lines by 10 separate charts containing 7 lines?

As another example of the problem, consider a Project with many Members, only a few of whom are active at a time. Displaying a Chart using the MemberActiveTime reduction function with the wildcard parameter would result in a Chart with only a few non-zero Streams, and the remaining zero-valued streams effectively "cluttering" the Chart and enlarging the legend without adding useful information. We could, of course, solve this by creating a new reduction function called "NonZeroMemberActiveTime", but that solution requires writing new code. More importantly, we would have to create "NonZero" versions of every other reduction function. And, finally, this is just one kind of "filtering" we might wish to perform.

The Telemetry Definition Language provides a more general solution to the problem of "chart complexity" with a language construct called "filter functions". Filter functions are an extensible part of the language "alphabet", just like reduction functions, but have a different syntactic usage and semantic meaning. While reduction functions accept String parameters and return a collection of Telemetry Streams, filter functions accept a collection of Telemetry Streams (in the form of a stream expression) and return a (typically smaller) collection of Telemetry Streams.

A simple example of a filter function is FilterZero. This filter function accepts a stream expression, and returns a collection of Streams consisting of the subset of the passed Streams containing at least one non-zero data point. Returning to our example of MemberActiveTime above, instead of writing an additional reduction function called "NonZeroMemberActiveTime", we can instead just wrap the call to MemberActiveTime in the FilterZero filter function, which will result in the Chart containing Streams for only those members who actually had Active Time for the given interval, as illustrated below:

FilterZero(MemberActiveTime(filePattern, cumulative))

A more sophisticated filter function is called "Filter". The Filter filter function supports four parameters. The first is the stream expression describing the set of streams to be filtered. The second parameter indicates the kind of computation over the data points in each stream, and can have the values "Avg", "Max", "Min", "Last", "Delta", or "SimpleDelta". The third parameter indicates how the set of Streams should be filtered, and can have the values "Above", "Below", "Top", "Bottom", "TopPercent", or "BottomPercent". The fourth parameter provides the numeric threshold or cut-off value required for the third parameter.

Example 5.2, “Filtered Workspace Coverage” shows how these filter functions can be used together to create a more interesting and useful Chart for Workspaces.

Example 5.2. Filtered Workspace Coverage

y-axis yAxis(label) = {
  label, "double", 0, 100
};

streams TopLevelWorkspaceCoverageStreams() = {
  "Filtered workspace coverage",
  Filter(FilterZero(WorkspaceCoverage("Percentage", "**", "line")), "SimpleDelta", "Bottom", 3)
};

chart TopLevelWorkspaceCoverageChart() = {
  "Workspaces with most significant decrease in coverage",
  (TopLevelWorkspaceCoverageStreams(), yAxis("Percent"))
};

This Chart is defined by first building a Stream expression containing the telemetry for unit test coverage for all modules. Next, two filter functions are invoked: FilterZero gets rid of all Streams with only zero values, and Filter further reduces the set of Streams to the three with the most significant decrease in value during the interval. The effect is to produce a Chart showing the modules that are decreasing the most in test coverage and thus potentially most in need of additional quality assurance resources.