Table of Contents
One very useful "unit of analysis" in Hackystat is the Project, which aggregates together the sensor data from a set of users and a set of Workspaces over a specific period of time. A second useful unit of analysis is the Day, which is, of course, a 24 hour period. Many analyses in Hackystat combine these two units of analysis together to produce abstractions of the raw sensor data for a given Project on a given Day. Furthermore, Project analyses at the interval of weeks or months are typically produced by first computing a set of analysis at the Day grain size, then combining them together in some fashion to obtain the Week or Month value.
Actually computing a Project analysis for a given Day can take non-trivial computational resources. For example, let's say that we want to compute the total number of Unit Tests invoked in a given Project for a given Day. First, we need to retrieve the set of Users in this Project. Then, for all the Users in the Project, we need to retrieve all of their Unit Test sensor data. Then we need to compute the Workspace associated with each instance of sensor data, and see if it matches the Workspaces associated with the Project. If the sensor data instance turns out to be associated with the Project, then we can use it to do the analysis (in this example, incrementing the total number of Unit Test invocations).
Now consider a Telemetry Unit Test chart that shows the total number of Unit Test over the previous seven weeks. If we request this chart on a daily basis, then the total number of Unit Tests for this Project on the current day will be recomputed 48 more times over the next seven weeks, until it finally goes out of range of the seven week "window" associated with this Telemetry chart. Multiply that by the many different kinds of analyses and situations in which a Project analysis for a given Day might be requested, and you can see that providing the ability to cache the analysis results, rather than recompute them from the raw sensor data each time they are requested, would be a significant performance improvement in the system.
Hackystat provides a facility called "DailyProjectData" to support caching of analysis results that are computed for a particular Project on a particular Day. While most DailyProjectData instances operate on sensor data of a single Sensor Data Type, this is not a requirement of the facility. While performance is a major motivation for DailyProjectData, another equally important benefit is abstraction: a client analysis that wishes to know the total Unit Test invocations for a given Project on a given Day can simply query a DailyProjectData instance to get the results, rather than having to provide the code to iterate through all Project members, get the raw data, determine whether the raw data applies to this Project, and so forth.
While DailyProjectData is intended to provide an abstract, high performance interface that simplifies the implementation of higher level analyses, we have found that in some cases, these classes have been implemented in such a way as to actually produce poor performance in certain contexts! The goal of this chapter is to introduce the basic concepts in the DailyProjectData abstraction, followed by a set of "best practices" that we hope will enable you to more easily design useful and efficient DailyProjectData abstractions that, in turn, simplify the development of Hackystat analyses.