Athena's blog

The pain of XLSX

Athena Lilith Martin

Published on

Today at work I had the misfortune of attempting to work with the Microsoft XSLX format. I hope never to repeat this.

XSLX is a terrible format. Microsoft provides a .NET library for dealing with the Office ZIPped-XML formats, a library which I used. However, this library does not actually provide any help with tasks such as "getting the value of a cell in a spreadsheet"; you essentially have to process a slightly-streamlined version of the XML tree yourself.

And the XLSX XML schema is terrible.

Instead of actually putting it in the spreadsheet XML, Excel puts all the text in a separate file called the "shared strings table" which you have to indirect through to get any actual text. But the format allows text to be inline so you have to have a second case for that, even though I haven't seen Excel do this so far. Meanwhile, numbers are always inline, and booleans and dates are their own different special situations. And Microsoft's library doesn't provide code to straighten this out, you just have to write it yourself.

Even worse than that, if some misguided person has made the terrible misjudgement of merging cells, there is no markup whatsoever as far as I can tell that indicates which cell a given cell has been merged with. The data goes in the top-left cell, and all the others get empty elements as placeholders. You just have to remember what was at the top left for yourself. And, of course, Microsoft's library doesn't give you any help.

Tagged: