17.2. Sensor requirements and design dimensions

The ultimate goal of a Hackystat sensor is to gather data about the process and/or products of development that can be processed by one or more Hackystat analyses in order to provide useful information. When designing a Hackystat sensor, you must consider at least the following set of issues: (1) the sensor data types and analyses related to your sensor, (2) whether your sensor is client-side or server-side, (3) the use of the SensorShell middleware, and (4) whether the implementation language will be Java or not.

17.2.1. Sensors, Sensor Data Types, and Analyses

The design of a sensor must take into account the way the data will be represented (in terms of one or more Sensor Data Types) and the way in which the data will be used (in terms of one or more server-side analyses).

A first scenario for the design of a new sensor is one in which you want to extend Hackystat to support a new tool that is similar to a tool already supported by the system. For example, you might want to add support for a new editor (such as JEdit) in a manner similar to the support for existing editors (such as Eclipse), or support for a new unit testing framework (such as NUnit) in a manner similar to an existing unit test framework (such as JUnit), and so forth. In these situations, you will probably design your sensor to use one or more existing sensor data types, and produce data that can be processed by one or more existing analyses.

When designing a sensor in these circumstances, the important issue is compatibility in the way your new sensor generates and represents its sensor data. The goal is to ensure that analyses will not break (or produce incorrect results) when they begin to process data by your new sensor.

To understand the importance of compatibility, consider the "StateChange" sensor data represented by the DevEvent sensor data type, and used by various analyses to generate the "Dev Time" metric. StateChange sensor data provides a measure of the time spent by developers actively modifying files using an editor. StateChange sensor data must be generated in a specific way. There must be a timer-based subprocess within the editor that "wakes up" every 30 seconds (by default, and which can be overridden by whatever "State Change Interval" is specified in the client's sensor properties file). This process then checks to see whether the active buffer has changed in size or identity, and if so generates a StateChange event. Any sensor that generates "StateChange" DevEvent data should take care to use the same algorithm, or else the "Dev Time" metric will not produce the intended results.

A second scenario for the design of a new sensor is one in which you are implementing a new sensor for a tool with the intent of also generating a new kind of analysis on that data. In this case, you will have to determine if the existing set of Sensor Data Types are satisfactory for representing the data you wish to analyze. If so, then your task is limited to designing the sensor as well as the analysis that operates on its data. For example, consider a tool that implements a special kind of complexity metric where negative numbers indicate bad designs and positive numbers indicate good designs. In this case, you might find that the FileMetric SDT can be used without change to represent these data values, but that you would need to implement specialized analyses to make this data easily 'actionable'. For example, you might want an analysis that uses this complexity data along with

A third scenario, which is the most interesting one from an intellectual standpoint (as well as the worst case scenario from an implementation standpoint) is when you realize during the course of your sensor design that you will also need to design new Sensor Data Types as well as new Analyses to achieve your goals. This generally indicates the expansion of Hackystat capabilities into entirely new domains, something we are always glad to see.

In all of the above cases, it is usually helpful to get feedback on your design and goals from the Hackystat developer community through the mailing lists. For more information on the mailing lists, see Chapter 12, New developer orientation.

17.2.2. Designing sensors for client-server tools

Certain kinds of software engineering tools, such as configuration management (CVS, Subversion) or issue tracking systems (Bugzilla, Jira), are typically implemented as client-server systems. When you want to design a sensor for these kinds of systems, you are confronted with a choice: do you want to develop the sensor for the client-side or the server-side of the system?

There is no unambiguously correct answer for this question: either choice involves trade-offs. To see this, consider the case of a configuration management system like CVS. The CVS server has a public protocol, which means there are many different CVS client implementations, from stand-alone command line clients, to stand-alone GUI clients, to plug-ins for editors and interactive development environments like Eclipse.

The problem with a client-side sensor solution for a system like CVS is that you will need to implement a separate sensor for each CVS client of interest. Worse, the ability to implement the sensor at all depends upon the ability of the client to support a "plugin" (or generate an XML data file). Some CVS clients, such as the Eclipse CVS plugin, would support the development of a Hackystat sensor quite easily, while others, such as WinCVS, do not have an extensible architecture and thus developing a sensor for it would be more difficult. In general, a client-side sensor solution is typically a "partial" solution: you will typically not be able to implement a sensor for every client, and so if a user chooses a client for which you do not have a sensor, their data will not be collected.

One way around this problem is to design the sensor to plug in to the server, not the client(s). The nice thing about this solution is that there is generally only one server implementation, and servers tend to have an extensible architecture that simplify the implementation of a sensor. By developing a sensor for the server side, you can collect data regardless of what client is invoked by the user, and you generally only have to implement one version of the sensor. Finally, a server-side solution often means you can collect retrospective data from the server's repository.

Unfortunately, the server-side solution has its own problems. First, it is also a partial solution! For example, in the case of CVS, there are various kinds of configuration management user behaviors (such as merge conflict resolution) that might be a source of very useful sensor data, but which occur only on the client side and is not visible to a server-side sensor. Second, the server-side solution requires "administrator" level capabilities. If you implement the sensor for a multi-user server, you need to set up a "usermaps.xml" file, which maps the server's user accounts to their corresponding Hackystat user accounts. This is explained in more detail in Section 2.7, “Sensors for multi-user tools and the usermaps.xml file”. This means you need to have access to the Hackystat account keys associated with all of the users whose data you wish to collect from this server. Furthermore, depending upon the server and the way you design the sensor, you might even need administrator-level access to the system running your server tool.

Thus, in the case of client-server tools, the sensor designer must make some hard choices. If you can be very sure that your user community (now and in the future) will only be using one (or very few) of the possible clients, then you may want to go ahead and implement a client-side sensor. The benefits of this approach are that you do not need an administrator to manage a usermaps file and thus have access to the Hackystat keys, and you may be able to collect interesting kinds of sensor data not accessible to a server-side solution. If, on the other hand, the tool clients are not extensible, or your users might using several different clients and you do not want the overhead of maintaining multiple sensor implementations, then a server-side implementation might be better.

17.2.3. The SensorShell middleware

Hackystat sensors tend to face many similar requirements, including SOAP-based sensor data transmission to the public server, buffering of sensor data, periodic background data transmission, local caching of data when the developer is working offline, logging, and sensor data validation.

To simplify the implementation of sensors, facilities for these shared requirements have been refactored into a kind of "middleware" system called SensorShell. SensorShell provides a high-level interface to sensors that can greatly reduces the size, complexity, and time associated with new sensor development.

Although it is not strictly required that sensors use SensorShell, we have not yet found a situation in which it is advantageous to "roll your own" version of the facilities provided by SensorShell. Chapter 16, The SensorShell provides details on the SensorShell requirements and API.

17.2.4. Java-based vs. non-Java based sensor implementations

While the Hackystat server can accept data from any software system observing the Hackystat SOAP protocol, in practice you will typically use the SensorShell rather than rolling your own communication protocol. If your sensor can be implemented in Java, then it's extremely straightforward to use the SensorShell using its Java API. If your sensor is implemented in another language, then you must determine a way to communicate with the SensorShell. This normally consists of running the SensorShell as a subprocess and sending sensor data to it through string-based process I/O.

Later versions of this chapter will illustrate both a Java-based sensor implementation, as well as the implementation of a sensor in Lisp using subprocess communication. We have also implemented a C# wrapper around the sensorshell to facilitate implementation of sensors for .NET.