20.2. Architectural elements: DailyProjectDataCache and DailyProjectData

The basic architecture of the DailyProjectData facility consists of two basic components: the DailyProjectDataCache, which manages instances of subclasses of DailyProjectData.

20.2.1. DailyProjectDataCache

The DailyProjectDataCache is a singleton ThreeKeyCache that maps [<Project>, <Day>, <AnalysisClass>] to an <AnalysisClassInstance> containing the associated analysis data. The DailyProjectDataCache is designed with two key properties: (1) When new data arrives for a given <Day> generated by a user who is a member of the associated <Project>, or when the definition of the <Project> changes, all of the associated analyses for that [<Project>, <Day>] pair are cleared. This is an aggressive, though safe policy. (2) If there is no cached <AnalysisClassInstance> available for the given [<Project>, <Day>, <AnalysisClass>] tuple, then one is automatically instantiated and returned.

The combination of (1) and (2) mean that clients need not perform any cache management themselves. They simply request <AnalysisClassInstances> from the cache and these instances will be constructed or reconstructed as necessary, and reused if possible.

20.2.2. DailyProjectData

The DailyProjectData class is an abstract class whose subclasses represent the set of possible <AnalysisClass> that can be contained in the DailyProjectDataCache. Each subclass must implement a getInstance() method that returns the cached instance of the class for the given <Project> and <Day>, or else constructs a new one, adds it to the cache, and returns the instance.

As with anything interesting, but in this case in particular, the devil truly is in the details. Some of the possible complications with the design of a DailyProjectData subclass includes the following.

First, the getInstance() method may take seconds or even minutes to complete. If it is invoked again during this interval, then a redundant analysis of the raw data can take place.

The DailyProjectData subclasses generally read in SensorData instances and perform some computation on them. When the analysis results are found, the instances must be careful to release any pointers to SensorData instances (or the temporary data structures constructed to hold them) so that the SensorData instances can be available for garbage collection. Otherwise, the heap may be used up.

DailyProjectData subclasses currently serve two clients: (1) the DailyProjectDetails command, which produces a summary of DailyProjectData as well as more detailed "drilldowns" into the data for a given day, and (2) the Telemetry analyses, which generally produce analyses on subsets of the DailyProjectDetails data, but over many days at a time. Care must be taken to minimize redundent computation when possible for these different styles of usage.

Some DailyProjectData subclasses have implemented their analyses via N passes over the raw sensor data, one pass per top-level workspace. While this is not terribly bad when N is small, in the case of Hackystat-7, this design results in 70+ passes over the raw sensor data. When the amount of raw sensor data is significant, this can produce significant slowdown in the computation.

Finally, DailyProjectData subclasses live in a highly multi-threaded environment. For example, the Telemetry Control Center can issue a dozen or more HTTP requests almost simultaneously, each of which will be handled by a separate thread in the web server, where each request could extract data from overlapping sets of hundreds of DailyProjectData subclass instances.

All of these details mean that while the DailyProjectData abstraction is intended to reduce the required amount of computation, an inappropriately designed DailyProjectData subclass can quite easily result in significantly more computation than, for example, a "custom" class developed for a single telemetry analysis that reads in the raw data each time. To avoid this outcome, the next sections present some fundamental design patterns for DailyProjectData subclasses, followed by example code illustrating these patterns taken from the DailyProjectFileMetric class.