Telemetry Report

Developer Services Home | Public Server | Stable Release | Last Build | Build Archive | Telemetry Report | Defect Tracking | People | About

One of the analysis systems we have developed with Hackystat is called "Software Project Telemetry". The basic idea is to take the process and product measures gathered for a project, and display them as trends over time at the grain size of days, weeks, or months. While this idea is simple in theory, in practice it is quite challenging to design a usable, extensible, and expressive mechanism for display and interpretation of process and product metric trends. Software Project Telemetry addresses this design problem through a sophisticated declarative language for defining telemetry charts, along with a "plugin" architecture to allow support for new process and product metrics.

Software Project Telemetry can support project decision making in a variety of ways. First, it can help discover "baseline" values in your projects: typical values for process or product measures that can aid in estimation or help discover the impact of process/product/tool changes on your development practices. Second, telemetry can reveal the occurrence of "anomalous" conditions: a sudden spike (or valley) in a process or product measure can indicate that the development conditions have changed in some way. Third, telemetry can help reveal "co-variance" in process or product measures. For example, it might show that when code churn exceeds some threshold value, the frequency of daily build failures increases. For details on the principles and practices of Software Project Telemetry, consult the Hackystat User Guide chapter or our research publications linked to the home page of this site.

At the Collaborative Software Development Laboratory, we have implemented a Telemetry Wall consisting of a nine-headed PC controlled by "Telemetry Control Center" (TCC) software that provides a rotating set of telemetry charts to developers.

While the TCC is useful for developers physically located in the lab, we have decided to make a subset of these charts available on this page as a service to external developers. In addition, the charts on this page are annotated to provide people unfamiliar with Hackystat and Software Project Telemetry some examples of how Software Project Telemetry is currently being used on an active development project.

The following sections show how Software Project Telemetry is applied to a variety of project management issues, including Release Planning and Tracking, Project Quality, Active Module Detection, Integration Build Failures, and so forth. In each section, we provide both a static "historical examples" of telemetry charts that illustrate an interesting phenomenon revealed by telemetry, as well as "real time data", which is one or more charts representing an analysis of our current project metrics on a daily basis. These sections are:

1. Release Cycle Issue Tracking

We use Issue Tracking telemetry to track trends in the open and closed issues over a specific release. This helps us manage progress toward the delivery of a stable release of Hackystat and helps us to identify schedule slippage.

Historical Example:

The following example is taken from the Hackystat version 7.3 release cycle, which lasted from mid-January to mid-March.

Issue Tracking (Version 7.3 and 7.4):

This telemetry chart shows the total and remaining issues on the last day of each week during the 7.3 and 7.4 release cycle. The line on the top indicates the total number of issues scheduled for that release, while line at the bottom indicates the number of remaining issues. The telemetry shows that we did not schedule everything up-front, but added new issues almost every week. It also shows that during the last half of the release cycle, we were able to make consistent progress toward zero open issues, indicating delivery of the stable release.

Issue Closure (Version 7.3 and 7.4):

This telemetry chart shows the cumulative number of closed issues and the cumulative amount of developer "active time" on a weekly basis during the 7.3 and 7.4 release cycle. Though the active time required for an individual issue varies significantly with the actual issue in question, over time these differences appear to "smooth out". This telemetry indicates that for this release cycle, our issue closure rate is pretty constant (3.5 - 4.5 active time hours per issue). The near linear relationship in this release cycle is provocative: if this same relationship between cumulative active time and issue closure holds in future release cycles, it would provide evidence for a predictive relationship.

Real Time Data:

This section shows the Issue Tracking and Issue Closure telemetry charts for the (ongoing) Version 7.5 release cycle. Do these charts exhibit the same trends present in the Version 7.3 and 7.4 release cycle? When can we make the Version 7.5 release? Are we as productive as we were in the last release?

Issue Tracking (version 7.5) Issue Closure (version 7.5)

2. Project Level Product Metrics and Quality Indicators

While the Release Planning telemetry might help us schedule our releases more accurately, what about the quality of our system? Does the schedule sacrifice quality?

The Hackystat project collects a wide variety of process and product measures related to quality, including: the rate of integration build failure, the frequency of unit test invocation, the success rate of unit tests, the coverage of unit tests, the presence of code review on a module, the number of active developers in a given module, and the density of issues detected by static code analyzers such as Checkstyle, PMD, and FindBugs. Any single measure provides only a small slice of insight into project quality, but when taken together and tracked over time, more meaningful and useful insights begin to emerge.

This section focuses on software product metrics, while Section 5 focuses on process metrics. We use both types of metrics to guide our decision-making.

Historical Example:

For the historical example, we show three charts: one for system size, the other two for two different perspectives on product quality. The charts are taken from the Hackystat version 7.3 release cycle.

Lines of Code (Version 7.3):

This chart shows source line of code for different types of language in the system. It clearly shows that the Hackystat is predominantly a Java system, but that it also includes code in several other languages, including C/C++, Perl, and even Lisp.

With each release, the system size increases. With Version 7.3 release, our Java code size broke 200K SLOC line.

Unit Test Coverage (Version 7.3):

This chart shows line level unit test coverage: the percentage of source code lines exercised by unit test cases. Only the coverage for Java code is shown.

It's interesting to note that though coverage telemetry is largely stable, it exhibits a slight downward trend. Understanding what this trend means requires diving down into the module-level coverages, as will be illustrated in Section 4.

As a side note: The February 19 dip in coverage is an example of an "anomalous" data point. After investigation, we determined that metrics were not collected correctly on those days, causing the change. This also shows robustness of telemetry analysis: occasional incorrect metrics have little impact on the long-term value of telemetry charts. You can even use telemetry charts to ensure that your metrics are collected correctly, as illustrated in Section 6 below.

Code Issue reported by FindBugs (Version 7.3):

This chart shows the number of issues reported by one of the static code analyzers we use: FindBugs. One major drawback of static code analyzers is that they can have fairly high false-positive rate. To overcome this problem, we are in the process of validating the FindBugs issues by assigning them to one of the 3 treatment groups: to fail our build, to be monitored, or to be tossed.

Real Time Data:

The following telemetry charts show the real time data for the Hackystat project over the past seven weeks. Usually there are not big variations in these charts. We have two hypotheses to explain this: (1) The project level analysis is too coarse for a relatively large system like Hackystat, so that any change in individual modules tend to be smoothed out at project level; (2) Our development process itself is stable, and a stable process generally results in stable product measures. One way to verify the hypotheses is to look at the module level analysis results in the next section.

Lines of Code (change chart) Unit Test Coverage Code Issue Density

3. Module Level Product Metrics and Quality Indicators

For a relatively large system like Hackystat, certain metrics are better measured for individual modules rather than for the system as a whole. For example, you saw above that Project-level quality metrics changed very little over time. When measured at the module-level, it is possible to see more interesting differences between modules.

Historical Example:

This historical example shows telemetry charts for different quality perspective at individual module level. Note that while the Hackystat system consists of over 70 modules, only 3 of them are used as examples here. If you are interested in telemetry charts for other modules, you can always go to our public server to run telemetry analysis.

The following charts clearly indicates that different modules have different traits with respect to quality measures.

Unit Test Coverage:

The charts are generated from 3 Hackystat modules: Core_Kernel, Core_Common, and Core_Installer.

Please refer to Section 2 for a discussion of the meaning of these charts.

Code Issue reported by FindBugs:

The charts are generated from 3 Hackystat modules: Core_Kernel, Core_Common, and Core_Installer.

Please refer to Section 2 for a discussion of the meaning of these charts.

Real Time Data (Selected Modules):

The following charts show code issue trends in selected Hackystat modules over the past seven weeks. Do you see differences between these modules? Are there any modules exhibiting worrisome trends?

Module Core_Kernel Module Core_Common Module Core_Installer

4. Module Filtering

The Hackystat software system consists of over 250,000 lines of code, divided among more than 70 modules. As with any project of this size, it is hard to know what is going on in each individual module.

As with any project of this size, some of these modules are under active development, and some are relatively dormant. As with most large, distributed software development projects, it is often not obvious where the "hotspots" of activity are in the system. The Software project telemetry language offers filter functions to help identify these "hotspot" modules.

The following charts show how different metrics can be used to identify "hotspots" of activity in the system.

Historical Example:

This historical examples are taken from the Version 7.3 release cycle.

Active Module Filtering:

The Active Module telemetry helps us understand the location of development effort, and whether a module that was "dormant" has suddenly gone "active", or vice versa. Such transitions can often help explain other development process changes, such as integration build failures or quality changes.

Three charts are provided from three different perspectives: active time, file commit, and code churn. Ideally, they would yield the same result.

Test Coverage Level Filtering:

The two charts show the most covered and least covered modules in the Hackystat system. Though different modules have different characteristics, such as GUI code is inherently difficult to test, modules with extremely low coverage usually indicate lack of quality assurance effort.

Test Coverage Change Filtering :

The two charts show the modules with most significant increase and decrease in unit test coverage during version 7.3 release. A significant drop in test coverage is a bad trend we wish to avoid.

Real Time Data for Active Modules:

The real-time data shows the top 5 active modules during the past seven weeks. Are developers working in the same areas of the system as during Version 7.3, or has the focus of development changed?

Active Time Perspective File Commit Perspective Code Churn Perspective

Real Time Data for Most and Least Covered Modules:

The real-time data shows the top 5 most covered and top 5 least covered modules during the past seven weeks. Do we see any improvement in least covered modules?

Most Covered Modules Least Covered Modules

Real Time Data for Modules with most increased and most decreased:

The real-time data shows the modules with most significant coverage change. Who is shirking in test effort and why?

Modules with most increased Coverage Modules with most decreased Coverage

5. Development Process and Correlational Analysis

We use a nightly integration build system called Cruise Control with customized tasks to automatically build and test the latest committed version of the entire Hackystat system, as well as runs the various static analysis mechanisms over the code. Developers often do not test their changes against the entire source code base before committing them, since a full build and test can take over 15-20 minutes. On the other hand, a commit that breaks the nightly build slows down development as a whole. Our goal is for developers to learn how to test "just the right amount" of the system before committing their changes. This is harder said than done.

What developer processes lead to integration build failure? How can integration build failures be kept to a minimum? Though telemetry charts are not designed to provide direct answers by themselves, you can begin to investigate this issue by comparing the integration build failure telemetry chart to other telemetry charts.

Historical Example:

The following data is taken from the weeks surrounding the Version 7.3 release cycle.

Integration Build Failures:

This chart shows the number of integration build failures per week during the time of the Version 7.3 release cycle. Compare this chart to the ones below.

This chart shows the number of integration build failures per week during the time of the Version 7.3 release cycle. Since we do the integration build once per day, the maximum number of failures in a week is seven. As you can see, we hit a point in the middle of the project where we had 7 nightly failures over a two week period: that's a build failure every other day on average. This was too much, and we took corrective measures to drive the daily build rate down.

FileCommit:

The red line shows the project-level "FileCommit" metric on a weekly basis. A commit might contain multiple files. The project-level FileCommit measure is an aggregation of the number of files in all commits from all developers of the project. In general a higher measure indicates more work is being done. However, it's not always the case because the measure depends on developer's commit habits.

CodeChurn:

The "CodeChurn" metric is related to file commit. It computes the number of lines added and deleted in each revision. In general a higher measure indicates more work is being done. CodeChurn is less susceptible to developer's commit habits. However, in case of copy and paste and file rename, you might see a lot of bogus churns.

ActiveTime:

The "ActiveTime" metric computes the amount of active time a developer spent editing code inside IDE. "Active" means that idle time (the time when there is no editing activity) is excluded from computation. The red line shows the total active time in hours from all developers of the project. Note that active time is only a proxy for a limited subset of development effort.

LocalBuild:

"LocalBuild" measures the number of time that the build script is invoked by the developers on their work stations. A typical purpose is to test the modifications locally before committing them to the repository. Note that a build script generally consists of multiple tasks and developers usually invoke a subset of those task in each build.

FileCommit, CodeChurn, and ActiveTime all measure software development effort from different angles. Used together, they provide a means for data "triangulation" (validating metrics using data from different sources). The fact that all of them display the same trend for the period of time covered greatly increases our confidence that more work was done in the week of Feb. 19 than other weeks.

If you compare integration build failure chart with local build chart, they can be divided into three stages.

Does that mean effective corrective measure has been implemented during stage 2? Are the developers getting smarter reducing local build overhead while maintaining low integration build failure rate? What lessons can be learned so that we can stay in stage 3? The answers to these questions requires contextual information. Telemetry charts provides you empirically-guided decision-making support.

Real Time Data:

This section shows the integration build failure rate during the past seven weeks. Are we doing better or worse than we did during the Version 7.3 release cycle?

Integration Build Failures FileCommit CodeChurn ActiveTime LocalBuild

6. Metrics Validation

Though analysis based on telemetry has greater tolerance for incorrect metrics than model-based metrics approaches, we still strive to have correct metrics. Software telemetry turns out to have a nice self-validation property; in other words, we can use telemetry charts to help determine if the underlying sensor data is being collected correctly! We can accomplish this by monitoring charts for three kinds of situations:

Historical Example:

This data was taken from a one week time period during the Version 7.3 release cycle. Unlike other telemetry charts, for which a grain size of weeks or months is often most revealing, the Metrics Validation telemetry is best viewed at the grain size of days.

Sudden Value Change: The red line (line of code) suddenly increased by 3500 from March 15 to March 16. This prompted us to investigate the reason of the increase.
Normal: This is a normal chart indicating all sensors are working. There is no data point dropout. All metrics vary within reasonable ranges, and they co-vary with each other.
Inconsistency: The chart shows that the developer has development activities but he never invoked any unit test. This is unusual because all these metrics should co-vary under normal cases. The most like cause is that the developer did not install unit test sensor.

Real Time Data:

The following three charts show some of our validation telemetry for the past seven days. The first two charts show "server-side" validation: these show sensor data that is collected on the "server-side" through daily cron jobs. The third validation telemetry chart is for one of our developers, Philip Johnson, who has kindly agreed to let his data be shown to all the world. Can you detect any sensor drop-outs?

Server-side Validation (Commit, Coverage, FileMetrics) Server-side Validation (CodeIssue, Dependency, Issue) Client-side Validation (Philip Johnson's sensors)