Friday, October 18, 2024

Parse Excel XML using power query in excel

I wanted to load an Excel XML file (a.k.a. Microsoft Office Excel 2002 and Excel 2003 XML Format — SpreadsheetML) in another Excel file using Power Query M formula language. It turns out that it's trickier than I expected so I decided to blog my solution.

Many SAP related tools export this Microsoft 2003 XML format. You can save an excel file in that format by selecting the "XML Spreadsheet 2003 (*.xml)" option in the save as dialog:


The file looks like the following:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>Andreas Botsikas</Author>
<LastAuthor>Andreas Botsikas</LastAuthor>
<Created>2024-10-17T20:45:19Z</Created>
<Version>16.00</Version>
</DocumentProperties>
<OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">
<AllowPNG/>
</OfficeDocumentSettings>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>7524</WindowHeight>
<WindowWidth>23040</WindowWidth>
<WindowTopX>32767</WindowTopX>
<WindowTopY>32767</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font ss:FontName="Aptos Narrow" x:CharSet="161" x:Family="Swiss" ss:Size="11"
ss:Color="#000000"/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
</Styles>
<Worksheet ss:Name="Sheet1">
<Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="3" x:FullColumns="1"
x:FullRows="1" ss:DefaultRowHeight="14.4">
<Column ss:Width="55.2"/>
<Column ss:Width="54"/>
<Row>
<Cell><Data ss:Type="String">First Name</Data></Cell>
<Cell><Data ss:Type="String">Last Name</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="String">A</Data></Cell>
<Cell><Data ss:Type="String">B</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="String">C</Data></Cell>
<Cell><Data ss:Type="String">D</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<PageSetup>
<Header x:Margin="0.3"/>
<Footer x:Margin="0.3"/>
<PageMargins x:Bottom="0.75" x:Left="0.7" x:Right="0.7" x:Top="0.75"/>
</PageSetup>
<Selected/>
<Panes>
<Pane>
<Number>3</Number>
<ActiveRow>3</ActiveRow>
</Pane>
</Panes>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>

Notice the "<?mso-application progid="Excel.Sheet"?>" line which gives out the type of this xml file.

Once you have this file, let's assume you place it in C:\tmp\ExcelXmlExport.xml and you need to load it in an excel file. Open up Excel and navigate to the Data tab and then select Get Data -> From Other Sources -> Blank Query.


From the ribbon select the advance editor:


In the editor paste the following M script:

let
// Load the XML file contents
Source = Xml.Tables(File.Contents("C:\tmp\ExcelXmlExport.xml")),
// Filter rows to find one named "Worksheet"
WorksheetRows = Table.SelectRows(Source, each Text.Contains([Name], "Worksheet")),
// Assuming you find one, select the Table
WorksheetTable = WorksheetRows{0}[Table],
// Filter rows within the "Worksheet" table to find the row named "Table"
TableRows = Table.SelectRows(WorksheetTable, each Text.Contains([Name], "Table")),
// Assuming you find one, select the Table
TableTable = TableRows{0}[Table],
// Filter rows within the "Table" table to find the row named "Row"
RowRows = Table.SelectRows(TableTable, each Text.Contains([Name], "Row")),
// Assuming you find one, select the Table
RowTable = RowRows{0}[Table],
// Transform the rows by adding a new column "RowData"
// The transformation takes each "Cell" and extracts the "Element:Text" element
// If the extraction fails (e.g., due to missing data), it defaults to an empty string
TransformRows = Table.AddColumn(RowTable, "RowData", each List.Transform(
[Cell][Data],
(cellData) => try cellData[#"Element:Text"]{0} otherwise ""
)),
// Convert the list of transformed rows into a table
ExpandedRows = Table.FromRows(TransformRows[RowData]),
// Promote the first row of the table to headers
TableWithHeaders = Table.PromoteHeaders(ExpandedRows, [PromoteAllScalars=true])
in
// Return the final table with headers
TableWithHeaders


This will load the Excel XML file and you should have all data loaded as strings:


From there on, you can Close & Load and start building your dashboard using the data you just loaded from an external file.

No comments: