Reporting Stats

A few years ago when I was the integration architect in charge of my Global 500 customers EAI projects, the product that we used did not really offer any way to track the number of messages travelling through the system.

To get around this, we added a little library that recorded a small amount of information about each message. The idea was that every application (message flow) within the EAI system would be instrumented with this code and we would gain an oversight over which interfaces were the important ones and which ones made a return on their investment.

The sort of data we collected was:

  • Interface / Application / Process Name
  • Message Type
  • Message Size
  • Timestamp
  • Status (eg. IN, OUT, SUCCESS, FAIL)
  • Correlation Token (in case several messages belonged together)
  • Error (ie. details of an error)
  • Custom Data (specific to each interface / app / process)

There was probably more… it was a few years ago now 🙂

Anyway, we soon discovered that there were applications that were not even being used, even though our customers had paid many tens of thousands of dollars for them (ok, I admit that isn’t a lot of money in the IT world).

Since then Google has created its “Analytics” offering which generates some excellent reports for web site statistics. Indeed, maxant has always collected web stats for the websites it runs. It is very important to know your users / customers usage patterns, in order to be able to understand their real requirements and to improve the software over time.

Traditionally, stats might be gathered using some data warehousing techniques, trying to gather the data together from several existing data sources. In my experience, it is very simple to add a little library that reports stats data to a table or two. This has the advantage over traditional data warehousing, that you log exactly what you want, and not what happens to be available elsewhere. If you need the unique key of a row, log that, as opposed to the friendly name that might be logged elsewhere. You can use join statements when generating your reports to make them user friendly. Alternatively, you can just log both foreign keys and friendly names – making the database human readable, and providing accuracy in reports that need it. In one project we logged over 60 columns, allowing us to create around 30 reports (letting users drill down in detail where they needed it). Of course, if you log that much information, you need to estimate your data growth rate (about 20 Mb a day in our case) and define a suitable strategy for deleting or archiving old data, before you run out of disk space, or worse, start paying the data center for additional space when you don’t really need it.

Also useful, is a switch, allowing you to shut off your reporting in case something goes wrong. Reporting should never interfere with the day to day running of your systems. If it turns out there is a bug in your reporting library, you don’t want to have to wait for a hotfix to be deployed in your production environment, before you can continue processing data. Imagine your application is part of a sales system. The customer doesn’t want to miss out on buying products just because the reporting might be out of whack for this month. The company CERTAINLY doesn’t want to lose sales in favour of having a report that tells them they sold nothing for a week while waiting for the hotfix.

Statistics Reporting of software applications isn’t limited to web or integration applications either. I have used it in rich client applications to track how the application is used, as well as server side to track for examples sales being made. Tracking data is also useful if it includes the inputs which the users made in order to generate the outputs. This can help in bug fixing and problem solving, since you will be able to recreate the issues because you know what the inputs were.

So, start reporting today! Then you can have a fun little project to generate dynamic reports – your managers and clients will love the pretty colourful graphs you can create! And that will help to keep the budget healthy for future work in your department 🙂

It reminds me of when I worked in the aeronautical engineering world. There is a field called Computational Fluid Dynamics (CFD for short) which provides pretty pictures showing airflow over say a wing. It is often nicknamed Colours for Directors (also CFD), because directors love all the pretty colours and pictures, even though they might not fully appreciate the information lurking within the report.