15.12. Evolving the structure of sensor data types

One very useful feature of sensor data types is their ability to support changes in their structure over time as experience with the process and product data under study provides a better understanding of how the data should be structured. Without support for evolution, anyone desiring to change the structure of an SDT would be faced with two problems: (1) all of the data already received by the server must be discarded; and (2) all of the users who have installed sensors that collect sensor data of that type would need to install upgraded sensors before their data could be useful again.

The Hackystat SDT implementation supports three basic kinds of structural change that avoid the above "worst case scenario" for many situations. First, a sensor data type designer can add new entryattributes to an SDT at any time, and as long as the new entryattributes declare a defaulter method, any sensor data not containing the new entryattributes will be implicitly supplied with the new fields and default values. Second, entryattributes can be deleted, and the presence of those fields in sensor data will be simply ignored. Finally, SDT designers can implement a "reorganizeData" method that provides a hook into the sensor data instantiation process and allows more complex types of evolution to occur, such as the movement of data from the pMap into a new entryattribute, or the fusion of several prior fields into a new one.

All sensor data types have a default implementation of the reorganizeData method, which does nothing. If an SDT designer implements the reorganizeData method, the code is invoked as part of the following chain of events during sensor data instantiation:

  1. The constructor for the SDT wrapper class associated with this sensor is invoked. This returns an typed instance for the sensor data.

  2. The reorganizeData() method associated with the wrapper class is invoked on the newly constructed instance. If the wrapper class does not include an implementation of reorganizeData(), then the default implementation provided in SensorDataEntry is used (which does nothing).

  3. The getErrors() method associated with the wrapper class is invoked on the newly constructed instance. All wrapper classes must implement this method. It returns either null if the class instance is semantically valid, or a String indicating the errors uncovered. It should check to make sure that the "evolved" structure conforms to the new expectations.

It is important to note that SDT evolution is non-destructive: any sensor data already persisted at the server in an old format will not be changed. Instead, it will be evolved into the new format as part of the process of reading the data into memory. While this incurs a certain amount of additional overhead, Hackystat provides a variety of caching mechanisms for sensor data that attempt to minimize the overhead. Of course, a substantial benefit of not changing old data is that mistakes in the implementation of the reorganizeData method or the incorrect deletion of entryattributes will not result in the loss of old data.

15.12.1. Example evolution: Adding a new entryattribute

Let's assume that we want to evolve our SimpleSdt SDT by adding a new entryattribute called "location". Carrying out this evolution involves the following steps: (1) updating sdt.simplesdt.xml to declare the new field, (2) updating the wrapper class to provide accessor and defaulter methods for this new field, (3) updating the ShellCommand class to include this new field in the getHelpString method, and (4) updating the test class to ensure that the SDT functions correctly in the presence of sensor data that does contain this field, as well as sensor data that does not. Let's look at each of these in turn.

15.12.1.1. Update sdt.simplesdt.xml to include the location entryattribute

Adding the new entryattribute is quite simple. We just need to make sure we include the defaulter method so that old sensor data have a value for this field.

<sensordatatypes>
  <sensordatatype name="SimpleSdt"
                  enabled="true"
                  wrapper="org.hackystat.doc.simplesdt.SimpleSdt"
                  shellcommand="org.hackystat.doc.simplesdt.SimpleSdtShellCommand"
                  docstring="Simple sensor data type."
                  contact="Philip Johnson (johnson@hawaii.edu)">
    <entryattribute name="fileName" />
    <entryattribute name="location" 
                    defaulter="org.hackystat.doc.simplesdt.SimpleSdt.getDefaultLocation" />
    <entryattribute name="elapsedTime" 
                    type="java.lang.Integer"
                    converter="org.hackystat.doc.simplesdt.SimpleSdt.getInteger"
                    defaulter="org.hackystat.doc.simplesdt.SimpleSdt.getDefaultElapsedTime" />
  </sensordatatype>
</sensordatatypes>

15.12.1.2. Update the wrapper class to provide accessor and defaulter methods

Again, this is quite straightforward. Here is the modified wrapper class implementation with two additional methods: getLocation and getDefaultLocation:

package org.hackystat.doc.simplesdt;

public class SimpleSdt extends SensorDataEntry {

  public SimpleSdt (Map keyValStringMap, SensorDataType sdt) throws SensorDataException {
    super(keyValStringMap, sdt);
  }

  public String getFileName() {
    return this.getAttributeValue("fileName");
  }

  public String getLocation() {
    return this.getAttributeValue("location");
  }

  public int getElapsedTime() {
    return ((Integer)(this.getConvertedAttributeValue("elapsedTime"))).intValue();
  }
  
  public String getErrors() {
    String errorMessage = "";
    if (getFileName() == null) { errorMessage += "Invalid fileName; "; }
    if (getElapsedTime() <= 0) { errorMessage += "Invalid elapsed time; "; }
    return (errorMessage == "") ? null : errorMessage;
  }
  
  public static Integer getInteger(String intString) {
    return new Integer(intString);
  }

  public static String getDefaultElapsedTime() {
    return "100";
  }

  public static String getDefaultLocation() {
    return "unknown";
  }
  
  public List getFilePaths() {
    List filePaths = new ArrayList();
    filePaths.add(getFileName());
    return filePaths;
  }
}  

15.12.1.3. Update the ShellCommand class to include the new field in its getHelpString method

Here is what the new getHelpString method would look like, with the location field added as an optional field:

  public String getHelpString() {
    return
    "SimpleSdt#set#tool=<tool>" + cr +
    "  Sets the Tool attribute value used in subsequent SimpleSdt sensor shell commands." + cr +
    "SimpleSdt#add#<key1>=<value1>#<key2>=<value2>#..." + cr +
    "  <keyN> is: fileName, elapsedTime, [location], [tool], [pMap]." + cr;
  }

15.12.1.4. Update the test case to check for the location field

Finally, we must update the test case to check that our SDT functions both with and without the location field. The first change is to the testSDT method, which should now check for the default value of location:

  public void testSDT() throws Exception {
    keyValStringMap.clear();
    keyValStringMap.put("tool", tool);
    keyValStringMap.put("tstamp", String.valueOf(timestamp.getTime()));
    keyValStringMap.put("fileName", fileName);
    keyValStringMap.put("elapsedTime", elapsedTimeString);
    pMap.put("environment", macOSX);
    keyValStringMap.put("pMap", pMap.encode());

    SimpleSdt entry = (SimpleSdt) SensorDataEntryFactory.getEntry(sensorType, keyValStringMap);

    assertEquals("Checking timestamp", timestamp, entry.getTimestamp());
    assertEquals("Checking tool", tool, entry.getTool());
    assertEquals("Checking fileName", fileName, entry.getFileName());
    assertEquals("Checking elapsedTime", 1000, entry.getElapsedTime());
    assertEquals("Checking environment", macOSX, entry.getProperty("environment"));
    assertEquals("Checking default location", "unknown", entry.getLocation());
  }

One should also implement a new test method that sets the location explicitly and tests that it is retrieved correctly, as illustrated next.

  public void testSDTWithLocation() throws Exception {
    String location = "Honolulu, HI";
    keyValStringMap.clear();
    keyValStringMap.put("tool", tool);
    keyValStringMap.put("tstamp", String.valueOf(timestamp.getTime()));
    keyValStringMap.put("fileName", fileName);
    keyValStringMap.put("elapsedTime", elapsedTimeString);
    keyValStringMap.put("location", location);
    pMap.put("environment", macOSX);
    keyValStringMap.put("pMap", pMap.encode());

    SimpleSdt entry = (SimpleSdt) SensorDataEntryFactory.getEntry(sensorType, keyValStringMap);

    assertEquals("Checking timestamp", timestamp, entry.getTimestamp());
    assertEquals("Checking tool", tool, entry.getTool());
    assertEquals("Checking fileName", fileName, entry.getFileName());
    assertEquals("Checking elapsedTime", 1000, entry.getElapsedTime());
    assertEquals("Checking environment", macOSX, entry.getProperty("environment"));
    assertEquals("Checking location", location, entry.getLocation());
  }

15.12.2. Example evolution: Deleting an entry attribute

Deleting an entryattribute is, in one sense, extremely easy: just remove the entryattribute from the SDT XML definition file, remove the accessors and any defaulters from the wrapper class, and remove the calls to the accessor from the test cases.

In another sense, deleting an entryattribute can be very hard, since the removal of an entryattribute and its associated accessor can break any higher level analyses that depend upon this accessor. You must be careful to address this ripple effect when deleting entryattributes.

15.12.3. Example evolution: Reorganizing the structure of sensor data

The most flexible form of sensor data type evolution is called data reorganization. The idea of data reorganization is to allow the developer a "hook" into the sensor data entry instantiation process in order to move data around from one field to another. This hook is the "reorganizeData()" method, which has a default implementation (which does nothing at all) in the SensorDataEntry class. However, if the wrapper class defines its own version of this method, then this method will be called after the wrapper class initializer and before the getErrors() method. This code can be used to alter the structure of the sensor data in whatever way is necessary to support the intended evolution.

As a concrete example, let's say that the developer of the SimpleSdt class eventually realizes that the "location" field is really not a required field for the EvolSdt sensor data type: not all tools should be required to provide a value for it. Instead, it should really be an optional property that may (or may not) be present in the property map associated with each EvolSdt sensor data instance. This kind of evolution requires two changes: (1) Delete the location field from the EvolSdt definition and wrapper classes. (2) Provide a reorganizeData() method that moves any existing location sensor data to the property map.

15.12.3.1. Reorganization: Delete the location field from the XML and wrapper class

This part is quite easy: it just returns the sdt.simplesdt.xml file and its wrapper class to its original state.

15.12.3.2. Reorganization: Provide a reorganizeData method to move existing location data to the pMap

Here is what such a reorganizeData method would look like:

  public void reorganizeData() {
    if (hasAttributeValue("location")) {
      this.putProperty("location", this.getAttributeValue("location"));
    }
  }

15.12.3.3. Design issues in SDT evolution

Properly evolving an SDT in a way that minimizes the ripple effect can be quite challenging. Consider the above example where the location field has moved from a "required" entryattribute to an "optional" property on the pMap. There are a number of implications of this change.

First, analysis code that depended upon the location entryattribute may no longer compile. Previously, the analysis code could access the location field using the getLocation method, as illustrated below.

SimpleSdt entry = // get a SimpleSdt instance somehow
String location = entry.getLocation()

After the change, the location field (if it exists) must be accessed by analysis code using the getProperty method:

String location = entry.getProperty("location")

However, note a more subtle difference: before the evolution, the getLocation() method would always return a String value, which would default to "unknown". After the evolution, the location variable could be null if the property is not present. This could result in run-time null pointer exceptions in analysis code at unpredictable times.

There are a number of ways to deal with this kind of issue. One approach is to keep the getLocation accessor in the SDT definition, but rewrite it to mimic the prior behavior:

  public String getLocation() {
    return this.getProperty("location", "unknown");
  }

Note that this code uses an alternative form of the getProperty method that allows a default value to be returned. A more dramatic approach might be to throw an exception if the analysis code invokes getLocation on an entry that does not have location data:

  public String getLocation() throws Exception {
    String location = this.getProperty("location");
    if (location == null) {
      throw new Exception("Attempt to access missing location data.");
    }
    return location;
  }

In this case, analysis code would need to explicitly check for the presence of the location field:

if (entry.hasProperty("location") {
  String location = entry.getLocation();
  // continue processing
}