As part of my duties as chair of the W3C Web Services Description Working Group I maintain a list of the Last Call issues against our spec. In an attempt to bring some software development methodology to the task of resolving those issues, I keep a graph of our status: how many issues are open, which are editorial, which have been closed, and which resolutions have been communicated back to the commenter.
![y1pdydSoBT9m0NS2EX6G8X5GqFTlT39HlXW_n3Y3sGubcgDWfiQ_dfE6r0Wmadl-IvNTwsdkknB8hU[1] y1pdydSoBT9m0NS2EX6G8X5GqFTlT39HlXW_n3Y3sGubcgDWfiQ_dfE6r0Wmadl-IvNTwsdkknB8hU[1]](http://jonathanmarsh.net/wp-content/uploads/2009/04/y1pdydsobt9m0ns2ex6g8x5gqftlt39hlxw-n3y3sgubcgdwfiq-dfe6r0wmadlivntwsdkknb8hu1-thumb1.jpg)
I keep this data in Microsoft Excel, which makes it simple to create the graph, but keeping data in the spreadsheet in sync with the data in the issues list sometimes proves challenging. Yesterday I added a new column to my data (which resolutions have been communicated back to the commenter) to help me judge whether this task will converge on the appropriate schedule. But I didn’t have historical data to work with, as I didn’t see a need to track this data in the spreadsheet until, well, yesterday.
I found the simplest way to extract time-based data was to use the CVS archive which is specifically designed to track changes to a document (and the data within it) over time. CVS gives me a good way to look back at each revision of the issues list and extract whatever the data I need, and associate a date with that data. All I needed to do was walk the CVS history to get a huge pile of dated data.
As an XSLT junkie, I of course found a way to do this within a single transformation. I modified my stylesheet to iterate through a list of specified versions of the issues list XML, extract each version individually from the CVS archive, calculate the totals of each of the classes of issues, and display them in an HTML table that can be copied directly into Excel for graphing.
The data I run the stylesheet over has a list of CVS versions I want to analyze:
<z:data xmlns:z="http://tempuri.org/microsoft.com/jmarsh">
<z:revision>1.69</z:revision>
<z:revision>1.68</z:revision>
</z:data>
It talkes a while to run (esp. without broadband), so I didn’t do the whole history in one go, though I don’t see why that wouldn’t work too. The XSLT that performs the extraction, run over each <z:revision> looks like this:
<xsl:for-each select="//z:data/z:revision">
<xsl:variable name="issues" select="document(concat
(’http://jigedit.w3.org/…/CVS/issues.xml/’, .))"/>
</xsl:for-each>
After appropriate totaling, I get a table with all the data I need to generate the graph.
Date Total Active Active (editorial) Closed Responded To
1.69 2005/04/26 214 25 15 115 59
1.68 2005/04/18 212 53 15 85 59
Not even wanting to run some sort of a build script to accomplish this (my Apple Computer days give me a pathological dislike of command lines), I just use the XML/XSLT browsing features in Microsoft Internet Explorer to kick off the transformation. I even did a self-styling trick that allows me to view the data by dropping the stylesheet itself into a browser window so I don’t have to keep two files together.
<?xml:stylesheet type="text/xsl" href="issues-totals.xsl"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0" xmlns:z="http://tempuri.org/microsoft.com/jmarsh"
xmlns:i="http://www.w3.org/2003/10/issues">
<z:data>
<z:revision>1.69</z:revision>
<z:revision>1.68</z:revision>
</z:data>
<xsl:template match="/">
<xsl:apply-templates select="*/z:data"/>
…
The bigger picture: It is becoming clear to me that one workable strategy to avoid information overload is simply to capture as much data as possible, and once you know what you want to look for you can run queries on the data. Hindsight is a wonderful thing! If you end up not needing the data, you only waste a few pennies of hard disk space. In this case, it’s not even my hard drive.
I’m sure I’ll expand on this theme over time as it can be applied to a lot of interesting product and user interface ideas.
