When reviewing a worksheet the ACBA Mapping software uses VBA’s ‘UsedRange’ function to identify the full scope of the sheet. This function is excellent for identifying separate ranges employed in a worksheet, but has some drawbacks. One of these drawbacks has been identified while reviewing the Enron corpus of files (https://figshare.com/articles/dataset/Enron_Spreadsheets_and_Emails/1221767)
The Enron corpus of spreadsheets and other correspondence is
huge. Felienne Hermans however has whittled the number of workbooks down to 16,189
unique items. https://www.felienne.com/archives/3634
The workbook that clearly displayed the
UsedRange issue is named ‘benjamin_rogers__938__historical outages.xlsx’.
Clearly the files have been renamed to prevent duplicates. In this blog though
I will refer to it as ‘Historical Outages’.
In the original Mapping software, users were advised when
the ‘UsedRange’ exceeded half a million cells. This allows the user to escape
from the process and turn off the computer’s sleep mode. Sleep mode
automatically halts behind the scenes processing, effectively preventing the
code from completing its task.
The first worksheet ‘Sheet1’ in the Historical Outages
workbook contained over 10 million cells in its ‘UsedRange’. I turned off sleep
mode accordingly. It took over 5 days to complete the processing.
This was a large worksheet, but structurally fairly simple. The list of generic formulae shows only 3 types.
Generic formulae list |
Even type F2 has a count less than 200K. What was happening
in the remainder of the 9.75 million worksheet cells and why did it take so
long?
Inspecting the map of Sheet1, shows that the active part of
the worksheet stops at row 972.
End of the active portion of worksheet |
From row 972 to row 49,813 of the map sheet appear to be
empty cells. But I am not able to assert that with any certainty unless I can
prove by other means that there are no other active cells.
End of 'UsedRange' |
In order to resolved the conundrum of whether there were
active cells in the almost 49,000 rows of apparently empty space, I used some
of Excel’s other review functions. These are summarised in the ‘Data List and
Mapping Index’ of the map book.
I note the total number of formulae for the workbook is the
sum of the formulae contained in Sheet1 and Sheet2. This is as expected, but
the list of 7 names in the workbook may prove interesting especially if they
are associated with Sheet1.
Only one name refers to Sheet1 |
In this case on one name refers to Sheet1. FilterDatabase is
a standard name created when you filter a list/database. The filter occurs
entirely within the active formula area.
One of the standard approaches for avoiding these
difficulties with UsedRange, is to employ the functions .End(xlDown) and
.End(xlToRight) functions. These work well for single database type tables, but
are likely to cause errors when employing multiple tables in single work sheet.
My review of the Enron files so far, suggests that employing multiple tables is
a very common practice. Indeed one could argue that the possibility multiple
tables within a single worksheet is one of its primary advantages over a
database.
Within ACBA Mapping, I have adopted a different approach.
Within the code for the worksheet review the mapping process monitors how many consecutive
rows are completely empty of any functional cells. If the count of consecutive
rows reaches 100, if offers the user the opportunity to skip the rest of the
mapping process and proceed to report on mapping reviewed so far.
Skip the remainder of the mapping process. |
It turns out the estimate of the time for completion is
grossly inaccurate, but the first mapping of the nearly 50K spreadsheet rows
took a little over 5 days. Should readers wish to view a copy of the original
file and its associated map file, they can be downloaded as a compressed zip
file – HistOutage.zip.
Downloading ACBA Mapping
The ACBA-Mapping software
is distributed as an Excel Add-In and can be downloaded from the link.
For those not familiar with handling Excel Add-Ins,
instructions on what to expect and where to load your Add-Ins can be found
at Generating
the "ACBA-Mapping" Add-In.
Access to Help
There is a discussion group dedicated to providing a forum
for help and information - Excel- Cell Mapping. I recommend you join the group.
Alternatively, you can email me directly from the Stephen
Allen link within the form.
Acknowledgements
Experts in Excel have offered their advice and I am very
grateful for this. My thanks go to Patrick O'Beirne (SystemsModelling) and Jan Karel Pieterse (JKP
Application Development Services).
Spreadsheet Risks
Spreadsheet errors and inconsistencies is a subject that is
of concern by professional developers as well as amateurs. The European
Spreadsheet Risk Interest Group (EuSpRIG) considers the problems from an
academic and professional perspective. The following links provide access to
the EuSpRIG website and discussion forum.
Comments
Post a Comment