Balisage logo

Proceedings

Spreadsheets - 90+ million End User Programmers With No Comment Tracking or Version Control

Patrick Durusau

Independent Consultant

Sam Hunting

Independent Consultant; blogger; gardener.

Balisage: The Markup Conference 2015
August 11 - 14, 2015

Copyright © 2015 by the authors. Used with permission.

How to cite this paper

Durusau, Patrick, and Sam Hunting. “Spreadsheets - 90+ million End User Programmers With No Comment Tracking or Version Control.” Presented at Balisage: The Markup Conference 2015, Washington, DC, August 11 - 14, 2015. In Proceedings of Balisage: The Markup Conference 2015. Balisage Series on Markup Technologies, vol. 15 (2015). DOI: 10.4242/BalisageVol15.Durusau01.

Abstract

Stephen Gandel's Damn Excel! How the 'most important software application of all time' is ruining the world is NOT an indictment of Excel. It is an indictment of the inability to:

Without those capabilities, spreadsheets are dangerous to their authors and others.

How dangerous you ask? A short list of horror stories would include: 2013, the "London Whale," JPMorgan Chase, lost £250 million; 2013, error in calculation of international Government debt to GDP ratios; 2012, JPMorgan Chase loses $6.2 billion due to a spreadsheet formula error; 2011, MF Global collapses, in part due to the use of spreadsheets to track assets and liabilities; 2010, US Federal Reserve, spreadsheet error on calculation of $4 billion in Consumer Revolving Credit. The EuSpRIG Horror Stories page has a generous sampling of more spreadsheet horror stories.

Statistically speaking, F1F9 estimates:

88% of all spreadsheets have errors in them, while 50% of spreadsheets used by large companies have material defects, resulting in loss of time and money, damaged reputations, lost jobs and disrupted careers.

If that weren't bad enough, other research indicates that 33% of all bad business decisons are traceable to spreadsheet errors. That's right, 33% of all bad business decisions. That's a business case looking for a solution. Yes?

Disclaimer: Topic maps, even a legend based on the Topic Maps Reference Model, ISO/IEC 13250-5 (2015), cannot magically prevent fraud, stupidity or human error. Topic maps can enable the modeling of relationships within spreadsheets, create comment tracking on errors/spreadsheets, even when the comments are in emails, and explore the subject identities and merging practices required for content level versioing of spreadsheets. Our goal is to empower you to detect fraud, stupidity and human error.

On the technical side, we will analyze real world spreadsheets, determine subjects to be represented and how to identify them, create a legend that will constrain the representatives of subjects (using ordinary XML tools), create a topic map of an actual spreadsheet and review our results against the known requirements to improve auditing of spreadsheets. The auditing process itself will be shown to be auditable.

Table of Contents

The Spreadsheet Use Case
Evidence-Based Requirements
How To Capture Spreadsheet Semantics
Interlude on Designing an Assistive System for Spreadsheets
Introduction
First Requirement: Capture (Not Impose) User Semantics
Second Requirement: Immediate ROI
Third Requirement: Simplicity
Fourth Requirement: Subject Identifier and Subject Locator
Fifth Requirement: Merging/Sharing, A User Choice
Sixth Requirement: No Default Data Model, No Merging with Side Effects
Enron Topic Map?
Issues and Future Research
Conclusion

The Spreadsheet Use Case

If the thought of over 90 million end user programmers with no comment tracking or version control is scary, bear in mind that estimate is for the United States alone. End user programmers include bankers, lawyers, clerks, mid-level managers, small and large business owners, people with and without spreadsheet training and others. Given the universal presence of spreadsheet programs, it is easy to credit claims that spreadsheets control the spending of trillions of dollars every year. All while being used poorly.

Beyond the spreadsheet horror stories in the abstract, consider that in 2003, a spreadsheet error at FannieMae make the mortgage guarantor look $1.3 billion more profitable than it actually was. Gandel 2013 Did that cause the subprime mortgage crisis? Hard to say it was the sole cause but it certainly didn't help.

F1F9 (UK) uses this imagery in a publication on spreadsheet errors:

png image ../../../vol15/graphics/Durusau01/Durusau01-001.png

In case you don't catch the reference see: The Dirty Dozen (movie), which was set in England, just prior to D-Day. (They need a younger advertising firm.)

Evidence-Based Requirements

Horror stories abound about spreadsheets but are too specific to support requirements for a general solution. Fidelity Magellan Fund overstating capital gains by USD 2.6 billion due to the omission of a minus sign, or JP Morgan Chase's loss of £250 million by having a model understood by one person, for example. (F1F9 2015) Several spreadsheet corpora have been developed in recent years to support an evidence-based approach to spreadsheet usage and errors.

Felienne Hermans has played an important role in the development of methods for analysis of spreadsheets, including adaptation of the notion of code smells to spreadsheet smells. Felienne points out in her dissertation (Hermans Dissertation) that while electronic spreadsheets are recent, the notion of a spreadsheet like form isn't (which contains a copy-paste error as well):

png image ../../../vol15/graphics/Durusau01/Durusau01-002.png

In case your Babylonian is rusty, Felienne highlights the error in What Archimedes has to say about spreadsheets:

png image ../../../vol15/graphics/Durusau01/Durusau01-003.png

If that still doesn't ring the bell, you can review The Babylonian tablet Plimpton 322 (Bill Casselman, University of British Columbia), Plimpton 322 (David E. Joyce, Clark University), Words and Pictures: New Light on Plimpton 322, Eleanor Robson (MAA publication). The professional literature on Plimpton 322 is extensive but those resources will get you started.

Felinne has published and spoken widely on spreadsheets and is the leader of the Spreadsheet Lab, which organizes the yearly conference: Software Engineering Methods in Spreadsheets.

Felienne Hermans and Emerson Murphy-Hill (Hermans 2014), extracted spreadsheets and emails that reference them from the Enron data set. A quick overview of this spreadsheet corpus reveals:

Table I

Enron Spreadsheet Summary

Number of spreadsheets analyzed15,770
Number of spreadsheets with formulas9,120
Number of worksheets79,983
Maximum number of worksheets175
Number of non-empty cells97,636.511
Average number of non-empty cells per spreadsheet6,191
Number of formulas20,277,835
Average of formulas per spreadsheet with formulas2,223
Number of unique formulas913,472
Number of unique formulas per spreadsheet with formulas100

Table II

Summary of Enron Excel Spreadsheet Defects

Erroneous formulas with dependents29,324
Formulas depending on erroneous cells9.6 (on average)
Maximum number of errors in one file83,273
Calculation Chain Length of five or longer41,367
Calculation Chain length greater than seven9,471 (seven being the maximum storage of short term memory)

Table III

Summary of Enron Emails Related to Spreadsheets

Emails with spreadsheets attached44,214
Emails sending or talking about spreadsheets68,970
Emails discussing errors4,140
Emails describing modifications of spreadsheets14,084
Emails as documentationObserved but no count.

The first requirement that comes to mind is the need to track comments (sender, receiver, subject) per spreadsheet and more particularly to any portion of the spreadsheet that is the subject of the email. That sounds obvious to you but when Jen Underwood wrote of thirty-four (34) needed improvements in Excel, Is Excel the Next Killer BI App?, linking at the spreadsheet level to emails wasn't mentioned?

In fact, What is Spreadsheet Risk? examines causes of spreadsheet errors (beyond formula miscalculations) and has this non-exclusive list:

  • Too many unknown and unorganized spreadsheet files or tabs

  • Poor naming conventions of files and tabs

  • Excessive and undocumented data linkages between files and sheets

  • Poorly designed and presented spreadsheets

  • Spreadsheet evolution (start off simple and become increasingly too complicated)

  • Spreadsheet ownership (ownership changes without the transfer or knowledge)

  • Manual data input (including cell by cell typing and copy & pasting large datasets)

  • Too much data in a spreadsheet (remember the 64,000 row limit in Excel??)

  • No versioning or change tracking

  • No double-checking processes

  • Overly and needlessly complicated formulas

  • Plain old human error

There is no, repeat no intersection between that list of spreadsheet risks and Underwood's list of needed product improvements for Excel.

Out of those (12) spreadsheet risks, nine (9) of them involve or could be solved by identifying subjects and their relationships to other subjects:

  • Too many unknown and unorganized spreadsheet files or tabs

  • Poor naming conventions of files and tabs

  • Excessive and undocumented data linkages between files and sheets

  • Poorly designed and presented spreadsheets

  • Spreadsheet evolution (start off simple and become increasingly too complicated)

  • Spreadsheet ownership (ownership changes without the transfer or knowledge)

  • No versioning or change tracking

  • No double-checking processes

  • Overly and needlessly complicated formulas

Those nine (9) spreadsheet risks have a common underlying issue. In order to correct any or all of those problems, the intended semantics of a spreadsheet would have to be known. Unfortunately, the semantics of spreadsheets are not discoverable on the face of the spreadsheets.

In her discussion of Excel errors in the Enron dataset, Hermans points out that semantics errors are beyond her reach, saying:

It is impossible to determine what cells in the set are erroneous, because we cannot possibly know what was the intention of the formula. This rules out finding semantic errors....

Even though as outside observers we can't capture the semantics of spreadsheets, that doesn't mean no one understands (or thinks they understands) the semantics of spreadsheets. Perhaps we should ask the users of spreadsheets about their semantics.

User Semantics

Bear in mind that "user of spreadsheet" does not equate to "manager of users of spreadsheets." Managers have their own semantics, which can be captured, but should not be substituted for the semantics of their staff who use spreadsheets. Unless you want to repeat the $170 million Virtual Case File debacle at the FBI. Inappropriate gathering of requirements was only one of the issues, but an important one.

Fortunately, research does exist on the importance of and techniques for capturing user semantics for spreadsheets.

How To Capture Spreadsheet Semantics

In a recent summary and extension of work capturing user semantics, Kohlhase, Andrea 2015 writes:

A taxonomy of the errors [6] shows that a significant portion of errors (87%, as calculated in [4]) are semantic. In this research, a semantic error is one that is committed when users have a wrong concept that may be correctly or incorrectly put into practice. These arise from misunderstanding the real-world, wrong translation of the real-world to the spreadsheet representation, or a misunderstanding of the spreadsheet’s internal logic [4]. As semantic errors are made on an individual document base, there is neither hope for a best-practice guide to train avoiding them nor for a general software update to help out. Semantic errors pose a more serious threat for wide-impact spreadsheets since more and more individual communication errors might aggregate over the span of distribution.

It has been proposed [4], [7] that a key reason in committing semantic errors is a missing higher-level abstraction of the data. Tables, with their grid framework, expose details and allow manipulation of underlying data. Therefore, spread-sheets, as a computer-supported realization of tables, turn one’s attention to data on a micro-level, failing to provide the big picture. Generally, schematic diagrams or pictures abstract away and integrate the data, presenting it holistically....

Ouch! Eighty-seven (87%) of spreadsheet errors are semantic! That signals that topic maps have a role to play in a solution for spreadsheets, but that becomes even clearer when you read:

Note that crossing a community boundary leaves the entire context - the circumstances and settings in which a document is created and obtains its specific meaning - behind. Researchers in the field of Human-Computer Interaction (HCI) have focused in recent years on the context-of-use of software systems: user experience issues often only arise in the concrete context in which a product is used. Our approach for tackling the readability issue of spreadsheets is motivated by this insight. Therefore we ask: what is the context of a spreadsheet document and which role does it play for comprehension of spreadsheets? For an answer, consider the following distinct contexts:

  • the context of the data itself,

  • the information context (implicit knowledge) of the author or the reader,

  • the event context of the author (the intention of the document as communication tool) or the reader (the expectation towards the usefulness of the document),

  • the effect context (e.g. decision making based on the document).

Note that the clear distinction between authors and readers is only an analytical one. We are well aware that authors turn into readers after a short while even for their own documents and, vice versa, that the motivation of readers might consist in searching for copy-able parts to author their very own spreadsheets. Nevertheless, the context can be clearly distinguished where wide-impact, local boundary-leaving spreadsheets are concerned.

Context isn't an unfamiliar idea in topic map circles. In (Topic Maps Data Model):" the term "scope" was used to convey the context in which a statement was valid. In relevant part that reads:

5.3.3 Scope

All statements have a scope. The scope represents the context within which a statement is valid. Outside the context represented by the scope the statement is not known to be valid. Formally, a scope is composed of a set of topics that together define the context. That is, the statement is known to be valid only in contexts where all the subjects in the scope apply.

Note on Scope

The statement: "Formally, a scope is composed of a set of topics that together define the context." is true for topic maps governed by the Topic Maps Data Model. Under Topic Maps Reference Model you can redefine scope to be disjoint if that is appropriate for your use case.

The authors find there are twelve dimensions to be explored for context:

  • Statement

  • Rephrasing

  • Definition

  • By-Example

  • Evaluation

  • Significance

  • Purpose

  • Organization

  • Provenance

  • Formula

  • History

  • Other

The authors ultimately conclude:

In general, the readers missed out on a lot of context dimensions, therefore making the case for assistance systems for spreadsheet comprehension.

Everyone appears to agree that:

Spreadsheets Woes

  1. Spreadsheets have numerous errors

  2. Spreadsheets have semantic errors

  3. Spreadsheets lack documentation

  4. Spreadsheets have complex relationships

  5. Spreadsheets have complex semantics

  6. Spreadsheets errors can damage enterprises and economies

From the examples and literature, a case has been made for ...assistive systems for spreadsheet comprehension but questions remain about how to fashion such a system?

Interlude on Designing an Assistive System for Spreadsheets

Introduction

Beyond the specific problems of spreadsheets, there are general principles that should inform a topic map based solution. The principles discussed below are not particularly original nor are they the only principles you should take into account in your design work. Where they prove useful, use them. Where they prove unhelpful, ignore them. The measure of being useful or not is in the satisfaction of your current customer. With semantics, what other measure would you use?

First Requirement: Capture (Not Impose) User Semantics

You may have missed these lines in an earlier block quote from Kohlhase, Andrea 2015:

As semantic errors are made on an individual document base [sic], there is neither hope for a best-practice guide to train avoiding them nor for a general software update to help out. Semantic errors pose a more serious threat for wide-impact spreadsheets since more and more individual communication errors might aggregate over the span of distribution.

If you think about it, selling a uniform means of identification (for universal merging), or any other aspect of modeling, in an environment defined by diversity, is a demonstration of madness. If you doubt that claim, consider the long dying process now underway for the Semantic Web. You have seen the working group products on CSV, as the W3C attempts to find a popular data format. After fifteen years of puff pieces in Scientific American and elsewhere, billions of dollars spent, there remains no Semantic Web, at least not as promised.

Second Requirement: Immediate ROI

In addition to the issue of replacing diversity with uniformity, the Semantic Web also offered delayed gratification, as in long delayed gratification. That was because in large part, the value-add of the Semantic Web could not be realized until a critical mass of users was obtained. No critical mass = no value add.

Take as a starting point, a hypothetical topic to represent the column headers in a spreadsheet.

<topic>
   <type>spreadsheet-heading</type>
   <spreadsheetoforigin>topicref</spreadsheetoforigin>
   <name>name</name>
   <column>value</column>
   <datatype>topicref</datatype>
   <author>topicref</author>
   <origin>topicref</origin>
   ...
</topic>

As each topic is created, the user's work product is of immediate value to the user and others. For the user, it is documentation of a column header in their spreadsheet. There could be different or additional properties, represented as elements and their values under each topic. For the office manager, a collection of topic maps with such topics provides documentation on column headers across spreadsheets used in their office.

Third Requirement: Simplicity

Many of us were in the car together when the Topic Maps Data Model declared:

A subject can be anything whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever. In particular, it is anything about which the creator of a topic map chooses to discourse....

A topic is a symbol used within a topic map to represent one, and only one, subject, in order to allow statements to be made about the subject. A statement is a claim or assertion about a subject (where the subject may be a topic map construct). Topic names, variant names, occurrences, and associations are statements, whereas assignments of identifying locators to topics are not considered statements.

Which quickly leads to the observation: a topic can represent anything but if you want to talk about an occurrence or an association, you have to create the occurrence or association first and then reify the occurrence or association with a topic. Say that quickly five or more times.

A simpler representation would be to create topic elements that enforce the use of <type> with default values for occurrence or association and which have models for those subjects.

As a consequence users would have this rule: If you want to talk about a subject of any kind, it MUST be represented by a topic.

Fourth Requirement: Subject Identifier and Subject Locator

One of the more useful ideas that came out of Topic Maps Data Model was the distinction between Internationalized Resource Identifiers (IRIs) as subject identifiers and subject locators. As you know:

A subject identifier is a locator that refers to a subject indicator....

A subject indicator is an information resource that is referred to from a topic map in an attempt to unambiguously identify the subject represented by a topic to a human being. Any information resource can become a subject indicator by being referred to as such from within some topic map, whether or not it was intended by its publisher to be a subject indicator.

A subject locator is a locator that refers to the information resource that is the subject of a topic. The topic thus represents that particular information resource; i.e., the information resource is the subject of the topic.

Topic Maps Data Model 5.3.2 Identifying Subjects [The order of the first two paragraphs is switched in this presentation. Why subject indicator preceded subject identifier in the original isn't recalled.]

The distinction between subject identifiers and subject locators is too useful to be limited to IRIs. IRIs were, after all, topic maps effort to be "universal" and therefore mergeable across the universe of topic maps. As pointed out earlier, we were all in the car together so that is a statement of fact, not assignment of blame.

Fifth Requirement: Merging/Sharing, A User Choice

As mentioned in Immediate ROI, there should be no default merging rules unless and until you understand the semantics of your spreadsheet data. Especially across spreadsheets, where you have changed contexts.

Depending upon your role, auditor, end-user, author, etc., you can choose to share merging rules for your topic map, or not. Or process your topic map to suppress information it contains that might enable merging, or limit merging in some way.

That isn't to say that some topics may have merging rules or even multiple merging rules that are applied under different circumstances. It may help to remember that merging is the act of clustering topics on some criteria and then processing that cluster for presentation. Clustering analysis is widely used in exploratory data mining and should have a place in exploring the semantics of spreadsheets.

Rather than defaulting to presumed set of common goals, the better way to promote topic maps for spreadsheets is to enable users to choose their own goals.

Sixth Requirement: No Default Data Model, No Merging with Side Effects

Once you forsake a default data model and merging with side-effects, a world of tools become available for processing your spreadsheet topic map.

The language of choice for spreadsheet topic map legends (at least in XML) should be RELAX-NG. There are other languages in which subject identity, relationship models and even merging rules could be written.

RELAX-NG offers a compact syntax, so you have no reason to develop and debug one for your topic map authoring. In addition to RELAX-NG constraints, additional constraints can be imposed via Schematron which saves you from developing a constraint language as well.

Work by the XQuery Working Group and XSLT Working Group continues to advance processing capabilities for XML.

Enron Topic Map?

When we first started this paper, semantic analysis of the Enron data loomed large. After reading the current literature on spreadsheet semantics, we discovered it isn't possible to obtain user semantics in the absence of users. Seems obvious in hindsight but only in hindsight.

In lieu of a topic map of the semantics of the Enron data, the demonstration portion of this presentation will include a series of "what if" examples of how speculated semantics and relationships could be modeled, along with a basic RELAX-NG schema, to be released at the conference.

Issues and Future Research

There are a number of issues only alluded to or not mentioned at all in this paper that will make customized topic maps for spreadsheets a successful approach. For exmaple:

Future Research Issues

  1. Association types with fixed role models defined by functions in Recalculated Formula (OpenFormula)

  2. Exploration practices for Spreadsheets

  3. Modeling Complex Formulas as Associations of Associations

  4. Detection of topics playing multiple roles in Complex Formulas

  5. Modeling Patterns in Spreadsheet Formulas

  6. Best Practices for Capturing User Semantics

  7. and others

Your assistance with this issues and suggestion of others overlooked is greatly appreciated.

Conclusion

In many ways this has been a difficult paper to write. Very difficult. One of the primary reasons was fighting the urge to proscribe aspects of how users should model semantics correctly at every stage of writing. Every start at a schema would quickly devolve into making decisions that users should be making, at least if the schema is to reflect their semantics and not those of your authors.

It has taken six (6) years but I now have a deeper appreciation for Tommie Usdin's Standards considered Harmful. If you missed Balisage 2009, then you missed the presentation. There is one passage that captures the essence of the presentation:

I think you probably also want to start thinking about a distinction I talk about a lot when we’re talking about document modeling: what is true versus what is useful. When you start looking at a set of documents, you can find a lot of things that are true about them and that you could identify and spend an awful lot of money on. The question is how many of those things are useful. It would be possible, for example, when marking up business documents for a subject retrieval system to identify the parts of the document on your subject taxonomy and to identify the documents themselves and the chapters of them and the sections of them and the paragraphs and the sentences and the words and what was the language of origin of each of the verbs in each of the sentences. Is this likely to be useful? Is it conceivable that there might be somebody someplace who would find that useful? Yes. If your goal is to make a corporate procedure library easily available to the telephone help desk, is it likely that knowing which word is a verb is going to be helpful? Probably not. So, if you’re supporting a telephone help desk, maybe you don’t need to get into the linguistic analysis of the sentences. That’s what I’m talking about: about not supplying, not spending money to do something that it is possible that some unknown future person might want. Stick to what you’re supposed to be doing. There may be a text markup standard that specifies how to mark the parts of speech of each word in documents and their language of origins. If there is, and knowing or manipulating this information is related to one of your project goals, then and only then, is that standard relevant to your project.

For our present context, I would rephrase the question to be: "is it useful for understanding a particular spreadsheet?"

If you ask your next client if they prefer you to capture your semantics for their spreadsheets, their semantics for their spreadsheets, or the semantics of others for their spreadsheets, which one do you think they would choose?

References

[Abreu 2014] Abreu, Rui et al. FaultySheet Detective: When Smells Meet Fault Localization http://conferences.computer.org/icsme/2014/papers/6146a625.pdf, doi:10.1109/ICSME.2014.111

[Asavametha 2013] Asavametha, Atipol, Detecting bad smells in spreadsheets http://hdl.handle.net/1957/30672

[Badame 2012] Badame, Sandro, Danny Dig, Refactoring meets Spreadsheet Formulas http://dig.cs.illinois.edu/papers/refactoringSpreadsheets.pdf, doi:10.1109/ICSM.2012.6405299

[Barik 2015] Barik, Titus, et al., FUSE: A Reproducible, Extendable, Internet-scale Corpus of Spreadsheets http://people.engr.ncsu.edu/ermurph3/papers/icse-msr-15.pdf, doi:10.1109/MSR.2015.70

[Clermont 2003] Clermont, Markus, A Scalable Approach to Spreadsheet Visualization http://www.isys.uni-klu.ac.at/PDF/2003-0175-MC.pdf

[Cunha 2012] Cunha, Jácome, et al., Towards a Catalog of Spreadsheet Smells http://haslab.uminho.pt/jacome/files/iccsa12.pdf, doi:10.1007/978-3-642-31128-4_15

[Cunha 2013] Cunha, Jácome, et al., Spreadsheet Engineering http://dsl2013.math.ubbcluj.ro/files/saraiva_lecture.pdf

[Delft Spreadsheet Lab] Delft Spreadsheet Lab http://spreadsheetlab.org/

[Dou 2014] Dou, Wensheng, Cheung, Shing-Chi, Wei, Jun, Is Spreadsheet Ambiguity Harmful? Detecting and Repairing Spreadsheet Smells due to Ambiguous Computation (ACM digital library) http://hdl.handle.net/1783.1/66737, doi:10.1145/2568225.2568316

[F1F9 2015] Capitalism's Dirty Secret: A Research Report into the Uses and Abuses of Spreadsheets http://www.f1f9.com/blog/spreadsheets-play-a-critical-role-in-business-decision-making-globally

[Fowler 1999] Fowler, Martin, Refactoring http://martinfowler.com/tags/refactoring.html

[Gandel 2013] Gandel, Stephen, Damn Excel! How the 'most important software application of all time' is ruining the world http://fortune.com/2013/04/17/damn-excel-how-the-most-important-software-application-of-all-time-is-ruining-the-world/

[Hermans 2011a] Hermans, Felienne, What Archimedes has to say about spreadsheets https://www.youtube.com/watch?v=yda2cm7D_cQ

[Hermans 2011b] Hermans, Felienne, Infotron Analyzer Screencast https://www.youtube.com/watch?v=GkGr9UoebEk

[Hermans 2011c] Hermans, Felienne, Martin Pinzger, Arie van Deursen, Breviz: Visualizing Spreadsheets using Dataflow Diagrams http://arxiv.org/pdf/1111.6895.pdf

[Hermans 2012] Hermans, Felienne, Detecting and Visualizing Inter-worksheet Smells in Spreadsheets http://www.slideshare.net/Felienne/detecting-and-visualizing-interworksheet-smells-in-spreadsheets

[Hermans 2013] Hermans, Felienne, Analyzing & visualizing spreadsheets (overview of dissertation) http://www.slideshare.net/Felienne/an-overview-of-my-phd-research

[Hermans 2013b] Hermans, Felienne, Software Engineering [spreadsheets] http://www.slideshare.net/devnology/slides-felienne-hermans-symposium-ewi

[Hermans 2013c] Hermans, Felienne, Spreadsheets: The Ununderstood Dark Matter Of IT (video) https://www.youtube.com/watch?v=wbiVK6HKHHg

[BumbleBee, blog post] Hermans, Felienne, BumbleBee, a tool for spreadsheet formula transformations (blog) http://www.felienne.com/archives/2964

[BumbleBee, paper] BumbleBee, a tool for spreadsheet formulat transformations (http://dl.acm.org/citation.cfm?id=2661673) http://files.figshare.com/1222815/paper.pdf

[Hermans Dissertation] Hermans, Felienne, Analyzing & visualizing spreadsheets (dissertation), doi:10.6084/m9.figshare.658936

[Hermans 2014] Hermans, Felienne and Emerson Murphy-Hill. Enron’s Spreadsheets and Related Emails: A Dataset and Analysis http://people.engr.ncsu.edu/ermurph3/papers/icse-seip-15.pdf, doi:10.6084/m9.figshare.1222882

[Hermans 2014b(video)] Hermans, Felienne, Spreadsheets for Developers (video) https://www.youtube.com/watch?v=0CKru5d4GPk

[Hermans 2014b] Hermans, Felienne, Spreadsheets for Developers (slides) http://www.slideshare.net/Felienne/spreadsheets-for-developers

[Hermans 2014c] Hermans, Felienne, Spreadsheets are graphs too: Using Neo4j as backend to store spreadsheet information http://www.slideshare.net/Felienne/felienne-neo-online

[Hermans 2014d] Hermans, Felienne, Improving Spreadsheet Test Practices http://www.slideshare.net/Felienne/improving-spreadsheet-test-practices

[Hermans 2014e] Hermans, Felienne, Testing and Refactoring Spreadsheets http://www.slideshare.net/eusprig/felienne-hermans-at-eusprig-2014

[SEMS 2014] Hermans, Felienne, ed., Software Engineering Methods in Spreadsheets http://ceur-ws.org/Vol-1209/

[SEMS 2015] Hermans, Felienne, ed., Software Engineering Methods in Spreadsheets http://ceur-ws.org/Vol-1355/

[Hermans Blog] Hermans, Felienne, Felienne Hermans blog http://www.felienne.com/

[Topic Maps Data Model] Topic Maps Data Model, ISO/IEC 13250-2 http://www.isotopicmaps.org/sam/sam-model/

[Topic Maps Reference Model] Topic Maps Reference Model, ISO/IEC 13250-5 http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=40757

[Hofer 2014] Hofer, Birgit, et al., Tool-supported fault localization in spreadsheets: Limitations of current evaluation practice http://ceur-ws.org/Vol-1209/paper_1.pdf

[Hofer] Hofer, Birgit, Code Smells by Birgit Hofer http://www.ist.tugraz.at/_attach/Publish/SelectedTopics3/CodeSmells.pdf

[Infotron] Infotron http://www.infotron.nl/

[Jannach 2014] Jannach, Dietmar, et al., Avoiding, Finding and Fixing Spreadsheet Errors - A Survey of Automated Approaches for Spreadsheet QA http://ls13-www.cs.tu-dortmund.de/homepage/publications/jannach/Journal_JSS_2014.pdf, doi:10.1016/j.jss.2014.03.058

[Khokhar 2014] Khokhar, Tariq, Three things I learned at the 2014 Open Knowledge Festival, Tariq Khokar (world bank media contact, hot on spreadsheets) http://blogs.worldbank.org/opendata/three-things-i-learned-2014-open-knowledge-festival

[Kohlhase, Andrea 2008] Kohlhase, Andrea, and Michael Kohlhase, Compensating the Semantic Bias of Spreadsheets http://omdoc.org/pubs/kohkoh-lwa08.pdf

[Kohlhase, Andrea 2015] Kohlhase, Andrea, Micahael Kohlhase, Ana Guseva, Context in Spreadsheet Comprehension http://ceur-ws.org/Vol-1355/paper8.pdf

[Kohlhase, Michael 2009] Kohlhase, Michael, An Open Markup Format for Mathematical Documents http://omdoc.org/pubs/omdoc1.2.pdf

[Kulesz 2013] Kulesz, Daniel, Jan-Peter Ostberg, Practical Challenges with Spreadsheet Auditing Tools http://www.iste.uni-stuttgart.de/fileadmin/user_upload/iste/se/research/publications/download/dk_jpo_EuSpRiG2013_preprint.pdf

[Recalculated Formula (OpenFormula)] OpenFormula http://standards.iso.org/ittf/PubliclyAvailableStandards/c066375_ISO_IEC_26300-2_2015.zip

[Rajalingham 2008] Rajalingham, Kamalasen, David R. Chadwick, Brian Knight, Classification of Spreadsheet Errors http://arxiv.org/pdf/0805.4224.pdf

[RELAX-NG] RELAX-NG http://standards.iso.org/ittf/PubliclyAvailableStandards/c052348_ISO_IEC_19757-2_2008(E).zip

[Schematron] Schematron http://standards.iso.org/ittf/PubliclyAvailableStandards/c040833_ISO_IEC_19757-3_2006(E).zip

[Shallcross 2015] Shallcross, Mike, Sniff out spreadsheet “smells” with new Rainbow 9.0 http://www.themodelanswer.com/news/sniff-out-spreadsheet-smells-rainbow-9-0/

[Shueh 2014] Shueh, Jason, Why Spreadsheets Stink — and 4 Ways to Improve Them http://www.govtech.com/data/Why-Spreadsheets-Stink-4-Ways-to-Improve-Them.html

[Sohon Blog] Roy, Sohon, All things soft and some hard too (Sohon Roy) https://sohonroy.wordpress.com/

[Stiel 2014] Stiel, Bjoern, Version Control for Spreadsheets: A fresh take on an old problem http://www.slideshare.net/eusprig/spread-git

[Stolee 2011] Stolee, Kathryn T., Sebastian Elbaum, Anita Sarma, End-User Programmers and their Communities: An Artifact-based Analysis http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1215&context=cseconfwork, doi:10.1109/ESEM.2011.23

[Usdin 2009] Usdin, Tommie, Standards considered harmful http://www.balisage.net/Proceedings/vol3/html/Usdin01/BalisageVol3-Usdin01.html, doi:10.4242/BalisageVol3.Usdin01

[Walkenback] Walkenback, John, The Spreadsheet Page (Excel) http://spreadsheetpage.com/

Author's keywords for this paper: Auditing; Spreadsheets; Topic Maps; Topic Map Reference Model (TMRM); Versioning

Patrick Durusau

Independent Consultant

Standards editor, data skeptic, topic map advocate. Blog: Another Word for It.

Sam Hunting

Independent Consultant; blogger; gardener.

Long-time contributor to the topic map standards process and topic maps fan.