Globalization and information conditions

I am very interested in the influences of globalization today (outsourcing rocks - more on that later).  Over lunch I read a great commentary (even though it’s from a graduation speech) by Stephen T. Gray called Where the media end and you begin from the Christian Science Monitor [I think at some point soon the full text for this article will no longer be free].  It ties the pace of change in our world to the availability of information (media of all kinds).  Some nuggets that attempt to distill the gist of the article:

"People will go right on doing what they’re doing - until they get new information."

"New information drives choices, and choices drive change."

"With so much more information - so many more choices - change will come faster.  Add the rest of the world and it’s even more dramatic.  Half of humanity lives in information conditions like those of [America] 100 years ago.  Roughly a quarter lives in information conditions like those of [America] in the ’50s or ’60s.  All of these people are trying desperately to catch up, realizing that better information is the path to better lives.  As they do, change will accelerate in their world - and ours."

I’d add to the last bit that the three quarters of the world mentioned above doesn’t need to desparately seek better information conditions to make them available - those conditions will find them soon enough.

"Billions of people can’t get enough information to develop their native abilities.  In this century, those reserves of ability will be tapped as never before. … This will spur the fastest advance in human freedoms and quality of life in history."

Awesome.

Data-mining CVS

As part of my duties as chair of the W3C Web Services Description Working Group I maintain a list of the Last Call issues against our spec.  In an attempt to bring some software development methodology to the task of resolving those issues, I keep a graph of our status: how many issues are open, which are editorial, which have been closed, and which resolutions have been communicated back to the commenter.

y1pdydSoBT9m0NS2EX6G8X5GqFTlT39HlXW_n3Y3sGubcgDWfiQ_dfE6r0Wmadl-IvNTwsdkknB8hU[1]

I keep this data in Microsoft Excel, which makes it simple to create the graph, but keeping data in the spreadsheet in sync with the data in the issues list sometimes proves challenging.  Yesterday I added a new column to my data (which resolutions have been communicated back to the commenter) to help me judge whether this task will converge on the appropriate schedule.  But I didn’t have historical data to work with, as I didn’t see a need to track this data in the spreadsheet until, well, yesterday.

I found the simplest way to extract time-based data was to use the CVS archive which is specifically designed to track changes to a document (and the data within it) over time.  CVS gives me a good way to look back at each revision of the issues list and extract whatever the data I need, and associate a date with that data.  All I needed to do was walk the CVS history to get a huge pile of dated data.

As an XSLT junkie, I of course found a way to do this within a single transformation.  I modified my stylesheet to iterate through a list of specified versions of the issues list XML, extract each version individually from the CVS archive, calculate the totals of each of the classes of issues, and display them in an HTML table that can be copied directly into Excel for graphing.

The data I run the stylesheet over has a list of CVS versions I want to analyze:

<z:data xmlns:z="http://tempuri.org/microsoft.com/jmarsh">
  <z:revision>1.69</z:revision>
  <z:revision>1.68</z:revision>
</z:data>

It talkes a while to run (esp. without broadband), so I didn’t do the whole history in one go, though I don’t see why that wouldn’t work too.  The XSLT that performs the extraction, run over each <z:revision> looks like this:

<xsl:for-each select="//z:data/z:revision">
  <xsl:variable name="issues" select="document(concat
                           (’http://jigedit.w3.org/…/CVS/issues.xml/’, .))"/>
</xsl:for-each>

After appropriate totaling, I get a table with all the data I need to generate the graph.

             Date            Total    Active    Active (editorial)     Closed    Responded To
1.69      2005/04/26   214      25         15                       115        59
1.68      2005/04/18   212      53         15                       85          59

Not even wanting to run some sort of a build script to accomplish this (my Apple Computer days give me a pathological dislike of command lines), I just use the XML/XSLT browsing features in Microsoft Internet Explorer to kick off the transformation.  I even did a self-styling trick that allows me to view the data by dropping the stylesheet itself into a browser window so I don’t have to keep two files together.

<?xml:stylesheet type="text/xsl" href="issues-totals.xsl"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        version="1.0" xmlns:z="http://tempuri.org/microsoft.com/jmarsh"
        xmlns:i="http://www.w3.org/2003/10/issues">
  <z:data>
    <z:revision>1.69</z:revision>
    <z:revision>1.68</z:revision>
  </z:data>
  <xsl:template match="/">
    <xsl:apply-templates select="*/z:data"/>
    …

The bigger picture: It is becoming clear to me that one workable strategy to avoid information overload is simply to capture as much data as possible, and once you know what you want to look for you can run queries on the data.  Hindsight is a wonderful thing!  If you end up not needing the data, you only waste a few pennies of hard disk space.  In this case, it’s not even my hard drive.

I’m sure I’ll expand on this theme over time as it can be applied to a lot of interesting product and user interface ideas.