Usability - Productivity - Business - The web - Singapore & Twins

Excel in Continuous Integration

Business Users like Excel. Besides its original use case of calculating, lists are a favourite use case. They also serve as poor man's requirement and bug tracker, so they siep into software development too.

While Excel sheets are great for interaction, they are a beast for anything automation. The irony of it: Modern Excel files (the xslx flavour) are nothing less than zip files with XML content. However that format is, let's say, [insert expletives here].

From an XML representation I would expect something like <cell row="23" col="44">Some value</cell>. However that's not what Excel does. Rename an xlsx to zip and see for yourself. Also (which makes sense for Excel itself) empty cells are not represented in XML.

Cutting a long story short, an Excel file or its XML representation poses some challenges:

  • XML format is not very suitable for automation, like generating reports using XSLT
  • Excel automation only runs on Windows (and when you run headless, you trade the head for a headache). That makes it a no-go for most automation server environments
  • Empty cells are absent from the XML (a variation of "not suitable")
  • Cross reports with other files (e.g. logs in XML format) is hard

To overcome these limitations I wrote Excel2XML. It is a little Java command line utility that converts Excel into a more digestible XML format. I used Microsoft's contribution to the Apache POI project to read the file. It has the following functions:

  • Extract workbooks in one or separate files per worksheet
  • Ignore all formatting
  • Computed cells return their last result values, unless it is a formula error, then the formula is returned
  • The first line of each sheet is treated as column headers, which are extracted as columns/column elements
  • Each cell has a column, a row and a title attribute. The title reflects the value from the first row. This allows in XSLT to query the title instead of relying on the column number. Reordering, adding or removing columns won't kill your XSLT stylesheet that way
  • Optional empty cells can be generated with an attribute of empty="true"
  • Runs on Java8 completely from command line
  • Calling it without parameters outputs the exact syntax of options

The full syntax: java -jar excel2xml.jar -i somefile.xslt [-o somefile.xml [ ]-e] [-s] [-w3,4]

  • -i the input file in xslx format
  • -o the output file. If missing same name as input, but extension xml
  • -e generate empty cells. If missing: cells without data are skipped
  • -s generate a single file for the whole workbook. If missing: creates one file per sheet
  • -w comma separated list of sheets to export. Starts at 0. If missing: exports all sheets

Head over to the git repository and grab a release. Let me know what use you found. As usual YMMV

Posted by on 02 October 2017 | Comments (1) | categories: Continuous integration XML


  1. posted by sean cull on Tuesday 03 October 2017 AD:

    Wow - less than 500 lines of code - impressive