Hackystat collects sensor data about many different kinds of software engineering processes and products. One way to understand the diversity of information that Hackystat can collect is by the two dimensional grid illustrated in Table 15.1, “Static and dynamic, software and human data”.
Table 15.1. Static and dynamic, software and human data
| Static | Dynamic | |
|---|---|---|
| Software | Size, Complexity, Defects, Test Coverage | Performance, Unit Test outcomes, Build results |
| Human | Experience, Background, Team makeup, other demographic information | Developer behavioral "events" while developing software |
The first dimension classifies sensor data as either "static" or "dynamic". "Static" data refers to properties that can be inferred from inspection of the object being measured while "at rest". "Dynamic" data refers to "behavioral" properties of the object being measured while "in action".
The second dimension classifies the object from which the data is collected. Again, there are two elements in this dimension: "software" or "human".
Given the great variety of process and product data that can be collected by Hackystat, it is clearly useful for the system to provide a way of organizing that information. In Hackystat, this organization is accomplished by sensor data types, which basically specify a set of required fields and a set of optional fields for each instance of sensor data. This level of organization provides several benefits:
It clarifies what information must be collected by the sensor, and what information might be collected by the sensor.
Hackystat client-side code can check to make sure that sensors are collecting the required data before sending it to the server.
Hackystat server-side analyses can be assured of what information is available when processing sensor data of a given type.
Clearly, some sort of data organization is useful, but to motivate the specific way in which SDTs are implemented, it is necessary to also understand the basic way that data is created and processed in Hackystat. Figure 15.1, “ The flow of data in Hackystat ” illustrates the flow of Hackystat data.
In Hackystat, data is originally generated by the actions of developers as mediated through their interactions with tools in the development environment. Hackystat "sensors" are attached to these tools as "plugins". These sensors generate a stream of sensor data of potentially different types, depending upon the behaviors they observe.
Once an instance of sensor data is generated, it must be sent across the Internet to the Hackystat server. This involves encoding the sensor data into a String for transmission, and potentially caching the data temporarily on the client side if an Internet connection is not available. In Hackystat, a "middleware" application called "SensorShell" is responsible for relieving the individual sensor plugins from dealing with these issues: the sensors simply pass the sensor data instances to the SensorShell, and it takes care of the details of data transmission. The SensorShell "knows" about the Sensor Data Type definition associated with any instance of sensor data, and can check to make sure that the required fields are present and of the correct type before attempting to send the data to the server. (For more details, see Chapter 16, The SensorShell.)
Once the sensor data has made it across the Internet to the Hackystat server, it must be reconstituted into a sensor data instance and persistently stored for later analysis. Currently, sensor data is persisted to disk in a standardized XML format.
For a variety of reasons (some good, some historical), Hackystat currently eschews the conventional relational database approach to persisting data. Instead, Hackystat stores data in XML format in a subdirectory hierarchy with a standardized layout. In Hackystat, all sensor data is timestamped, and this timestamp is used to organize sensor data into a set of files each containing all the sensor data of a given type for a given day.
For example, assume the developer uses Emacs to open the file foo.java on January 31, 2006, at 4:06pm. This might result in the Emacs tool plugin creating an Activity entry whose timestamp value is "1138723608" (the UTC long value for 1/31/2006 4:06pm); whose tool value is "Emacs"; whose ActivityType value is "Open File"; and whose data value is "foo.java". Once this data is sent to the server, it will be stored in an XML file that might look like the following:
<?xml version="1.0" encoding="UTF-8"?> <sensor> <entry tstamp="1138723608" tool="Emacs" type="Open File" data="foo.java" pMap="0000"/> </sensor>
Note that the contents of an XML file does not contain information regarding the user who generated the data, or even the Sensor Data Type corresponding to this data. This information is encoded into the directory structure created to store and organize the data. Assume this developer's 12 character key was ytjhg34xyzzy, and that the Hackystat server administrator defined the hackystat server storage directory to be c:\hackystatdata. Then the XML file for the SensorData instance of type Activity that holds data for January 31, 2006 would be located in the following file:
c:\hackystatdata\users\ytjhg34xyzzy\data\Activity\2006-01-31.xml
It is hard to get the structure of a sensor data type correct the first time. Over time, as new insights into the appropriate information to be supplied with a sensor data become available, it would be useful to be able to change their structure. To do so successfully, this evolutionary mechanism must solve two problems: (1) the data that has already been received and stored must be upgraded to the new format; and (2) one cannot, in general, assume that users will upgrade their sensors simultaneously with updates to the server. Thus, the "evolved" sensor data type must continue to support the reception of "old" data in prior formats.