20.6. Performance testing of DailyProjectData subclasses

A properly implemented DailyProjectData class should show good performance, which includes run-time properties such as the following: (a) the DailyProjectData subclass instance is not instantiated excessively; (b) the subclass does not read in the raw data more than necessary, (c) the size of data structures maintained by the DailyProjectData subclass in its instance variables are not excessively large, (d) the cached data structure containing the abstracted results is of an appropriate size and contains appropriate kinds of data to satisfy normal queries, (e) the summary string and drilldowns can be calculated quickly and efficiently, and (f) the most common expected queries by reduction functions can be calculated quickly and efficiently.

Performance testing is somewhat more complicated to set up than functional testing, but in the case of DailyProjectData subclasses, it is easier than you might think to implement a class that is functionally correct but provides unacceptably poor performance in "real world" conditions. One reason for presenting the design patterns in this chapter is to help developers avoid some of the common potential causes of poor performance (such as a heavyweight constructor). Another reason for the design patterns is to help developers design a modular structure for their implementation that simplifies the assessment, diagnosis, and correction of performance problems in a functionally correct class.

Setting up performance testing of a DailyProjectData class involves two activities: (1) creating a "real world" (or even "beyond real world") test data set that provides an adequate load on your implementation so that performance problems can be revealed, and (2) creating a "PerfEval" test class that runs your DailyProjectData code over the test data set, and which can be instrumented with performance analysis tools such as JProfiler in order to diagnose hot spots.

20.6.1. Creating a "real world" test data set

For performance evaluation of DailyProjectFileMetric, we used a snapshot of the Hackystat sensor data collected during a month in early 2006. At that point in time, Hackystat consisted of over 3,000 files and 300,000 LOC, and was collecting FileMetric data from two tools: SCLC and LOCC. This resulted in several thousand FileMetric entries per day, and thus almost 100,000 FileMetric entries over the course of the month. Such a data set was more than adequate to exercise the DailyProjectFileMetric implementation.

20.6.2. Creating a PerfEval test class

Once you have generated a test data set that can adequately exercise your DailyProjectData subclass, the next step is to create a class that runs your code over the test data and supports instrumentation for performance assessment. For simplicity's sake, it is easiest to create a class with a main() method that runs the performance tests, as illustrated next.

Example 20.5, “PerfEvalDailyProjectFileMetrics class (excerpt)” illustrates part of the PerfEvalDailyProjectFileMetric class used for performance evaluation.

Example 20.5. PerfEvalDailyProjectFileMetrics class (excerpt)

public class PerfEvalDailyProjectFileMetric {

  public static void main(String[] args) throws Exception {
    // Initialize variables needed for performance tests.
    // Note we must be using a database containing Project hackystat-7!
    Project project = ProjectManager.getInstance().getProject("Hackystat-7");
    Logger logger = ServerProperties.getInstance().getLogger();
    Day startDay = Day.getInstance("01-March-2006");
    Date d1, d2;
    int numDays = 15;
    // The first call forces a load of sensor data types, so we do this to warm up the system
    Day warmupDay = startDay.inc(-1);
    DailyProjectFileMetric.getInstance(project, warmupDay).getSummaryStrings(null);

    printHeapStatus(logger);
    performanceTestInitial(project, logger, startDay, numDays, true, false);
    // printHeapStatus(logger);
    // performanceTestUpdate(project, logger, startDay, numDays, true, false);
    // printHeapStatus(logger);
    // performanceInMemory(project, logger, startDay, numDays, true, false);
    // printHeapStatus(logger);
  }
  
  private static void performanceTestInitial(Project project, Logger logger, Day startDay,
      int numDays, boolean doFirst, boolean doSecond) throws Exception {
    Date d1, d2;
    // First check Metric2.
    Day day = startDay;
    d1 = new Date();
    if (doFirst) {
      logger.warning("\n\nPerformance Test Initial: Starting time trial: " + d1);
      for (int i = 0; i < numDays; i++) {
        day = day.inc(1);
        DailyProjectFileMetric metric = DailyProjectFileMetric.getInstance(project, day);
        List results = metric.getSummaryStrings(null);
        logger.warning(metric.getName() + " " + day + " " + ((String[]) results.get(0))[1]);
      }
      d2 = new Date();
      logger.warning("Elapsed time: " + (d2.getTime() - d1.getTime()));
    }

    // Now compare against the other version's performance.
    day = startDay;
    d1 = new Date();
    if (doSecond) {
      logger.warning("Starting time trial: " + d1);
      for (int i = 0; i < numDays; i++) {
        day = day.inc(1);
        DailyProjectFileMetric2 metric = DailyProjectFileMetric2.getInstance(project, day);
        List results = metric.getSummaryStrings(null);
        logger.warning(metric.getName() + " " + day + " " + ((String[]) results.get(0))[1]);
      }
      d2 = new Date();
      logger.warning("Elapsed time: " + (d2.getTime() - d1.getTime()));
    }
  }
}

This class illustrates several approaches to the performance evaluation of the DailyProjectFileMetric class. First, it shows how to iterate through a number of days, creating an instance of DailyProjectFileMetric for each day, and recording the total elapsed (wall clock) time required for execution of the class. Second, it shows how the performance analysis of the DailyProjectFileMetric class was compared to the performance of an older implementation (DailyProjectFileMetric2). Finally, by selecting the set of internal methods to execute and via parameterization of variables, this class can be used in conjunction with a performance analysis tool such as JProfiler to assess the location of hotspots in the implementation, in other words, areas where a disproportionate amount of time is being spent and is thus a candidate for redesign and optimization. In the case of DailyProjectFileMetric, we discovered that over half the time spent in the initial implementation on representative invocations was used up in a comparison of FilePatterns instances to top-level workspaces. By writing a method that optimized this comparison for a common case, we virtually eliminated this hotspot and doubled the performance of the code.