If that weren't bad enough, other research indicates that 33% of all bad business decisons are traceable to spreadsheet errors. That's right, 33% of all bad business decisions. That's a business case looking for a solution. Yes?88% of all spreadsheets have errors in them, while 50% of spreadsheets used by large companies have material defects, resulting in loss of time and money, damaged reputations, lost jobs and disrupted careers.
Number of spreadsheets analyzed | 15,770 |
Number of spreadsheets with formulas | 9,120 |
Number of worksheets | 79,983 |
Maximum number of worksheets | 175 |
Number of non-empty cells | 97,636.511 |
Average number of non-empty cells per spreadsheet | 6,191 |
Number of formulas | 20,277,835 |
Average of formulas per spreadsheet with formulas | 2,223 |
Number of unique formulas | 913,472 |
Number of unique formulas per spreadsheet with formulas | 100 |
Erroneous formulas with dependents | 29,324 |
Formulas depending on erroneous cells | 9.6 (on average) |
Maximum number of errors in one file | 83,273 |
Calculation Chain Length of five or longer | 41,367 |
Calculation Chain length greater than seven | 9,471 (seven being the maximum storage of short term memory) |
Emails with spreadsheets attached | 44,214 |
Emails sending or talking about spreadsheets | 68,970 |
Emails discussing errors | 4,140 |
Emails describing modifications of spreadsheets | 14,084 |
Emails as documentation | Observed but no count. |
There is no, repeat no intersection between that list of spreadsheet risks and Underwood's list of needed product improvements for Excel.Too many unknown and unorganized spreadsheet files or tabs Poor naming conventions of files and tabs Excessive and undocumented data linkages between files and sheets Poorly designed and presented spreadsheets Spreadsheet evolution (start off simple and become increasingly too complicated) Spreadsheet ownership (ownership changes without the transfer or knowledge) Manual data input (including cell by cell typing and copy & pasting large datasets) Too much data in a spreadsheet (remember the 64,000 row limit in Excel??) No versioning or change tracking No double-checking processes Overly and needlessly complicated formulas Plain old human error
It is impossible to determine what cells in the set are erroneous, because we cannot possibly know what was the intention of the formula. This rules out finding semantic errors....
A taxonomy of the errors [6] shows that a significant portion of errors (87%, as calculated in [4]) are semantic. In this research, a semantic error is one that is committed when users have a wrong concept that may be correctly or incorrectly put into practice. These arise from misunderstanding the real-world, wrong translation of the real-world to the spreadsheet representation, or a misunderstanding of the spreadsheet’s internal logic [4]. As semantic errors are made on an individual document base, there is neither hope for a best-practice guide to train avoiding them nor for a general software update to help out. Semantic errors pose a more serious threat for wide-impact spreadsheets since more and more individual communication errors might aggregate over the span of distribution. It has been proposed [4], [7] that a key reason in committing semantic errors is a missing higher-level abstraction of the data. Tables, with their grid framework, expose details and allow manipulation of underlying data. Therefore, spread-sheets, as a computer-supported realization of tables, turn one’s attention to data on a micro-level, failing to provide the big picture. Generally, schematic diagrams or pictures abstract away and integrate the data, presenting it holistically....
Context isn't an unfamiliar idea in topic map circles. In (Note that crossing a community boundary leaves the entire context - the circumstances and settings in which a document is created and obtains its specific meaning - behind. Researchers in the field of Human-Computer Interaction (HCI) have focused in recent years on the context-of-use of software systems: user experience issues often only arise in the concrete context in which a product is used. Our approach for tackling the readability issue of spreadsheets is motivated by this insight. Therefore we ask: what is the context of a spreadsheet document and which role does it play for comprehension of spreadsheets? For an answer, consider the following distinct contexts: the context of the data itself, the information context (implicit knowledge) of the author or the reader, the event context of the author (the intention of the document as communication tool) or the reader (the expectation towards the usefulness of the document), the effect context (e.g. decision making based on the document). Note that the clear distinction between authors and readers is only an analytical one. We are well aware that authors turn into readers after a short while even for their own documents and, vice versa, that the motivation of readers might consist in searching for copy-able parts to author their very own spreadsheets. Nevertheless, the context can be clearly distinguished where wide-impact, local boundary-leaving spreadsheets are concerned.
5.3.3 Scope All statements have a scope. The scope represents the context within which a statement is valid. Outside the context represented by the scope the statement is not known to be valid. Formally, a scope is composed of a set of topics that together define the context. That is, the statement is known to be valid only in contexts whereall the subjects in the scope apply.
The authors ultimately conclude:Statement Rephrasing Definition By-Example Evaluation Significance Purpose Organization Provenance Formula History Other
Everyone appears to agree that:In general, the readers missed out on a lot of context dimensions, therefore making the case for assistance systems for spreadsheet comprehension.
...assistive systems for spreadsheet comprehensionbut questions remain about how to fashion such a system?
As semantic errors are made on an individual document base [sic], there is neither hope for a best-practice guide to train avoiding them nor for a general software update to help out. Semantic errors pose a more serious threat for wide-impact spreadsheets since more and more individual communication errors might aggregate over the span of distribution.
A subject can be anything whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever. In particular, it is anything about which the creator of a topic map chooses to discourse.... A topic is a symbol used within a topic map to represent one, and only one, subject, in order to allow statements to be made about the subject. A statement is a claim or assertion about a subject (where the subject may be a topic map construct). Topic names, variant names, occurrences, and associations are statements, whereas assignments of identifying locators to topics are not considered statements.
A subject identifier is a locator that refers to a subject indicator....A subject indicator is an information resource that is referred to from a topic map in an attempt to unambiguously identify the subject represented by a topic to a human being. Any information resource can become a subject indicator by being referred to as such from within some topic map, whether or not it was intended by its publisher to be a subject indicator.A subject locator is a locator that refers to the information resource that is the subject of a topic. The topic thus represents that particular information resource; i.e., the information resource is the subject of the topic.[The order of the first two paragraphs is switched in this presentation. Why subject indicator preceded subject identifier in the original isn't recalled.] 5.3.2 Identifying Subjects
mergingis the act of clustering topics on some criteria and then processing that cluster for presentation. Clustering analysis is widely used in exploratory data mining and should have a place in exploring the semantics of spreadsheets.
correctlyat every stage of writing. Every start at a schema would quickly devolve into making decisions that users should be making, at least if the schema is to reflect their semantics and not those of your authors.
For our present context, I would rephrase the question to be: "is it useful for understanding a particular spreadsheet?"I think you probably also want to start thinking about a distinction I talk about a lot when we’re talking about document modeling: what is true versus what is useful. When you start looking at a set of documents, you can find a lot of things that are true about them and that you could identify and spend an awful lot of money on. The question is how many of those things are useful. It would be possible, for example, when marking up business documents for a subject retrieval system to identify the parts of the document on your subject taxonomy and to identify the documents themselves and the chapters of them and the sections of them and the paragraphs and the sentences and the words and what was the language of origin of each of the verbs in each of the sentences. Is this likely to be useful? Is it conceivable that there might be somebody someplace who would find that useful? Yes. If your goal is to make a corporate procedure library easily available to the telephone help desk, is it likely that knowing which word is a verb is going to be helpful? Probably not. So, if you’re supporting a telephone help desk, maybe you don’t need to get into the linguistic analysis of the sentences. That’s what I’m talking about: about not supplying, not spending money to do something that it is possible that some unknown future person might want. Stick to what you’re supposed to be doing. There may be a text markup standard that specifies how to mark the parts of speech of each word in documents and their language of origins. If there is, and knowing or manipulating this information is related to one of your project goals, then and only then, is that standard relevant to your project.