Errors During Spreadsheet Development Experiments

Ever since the earliest days of spreadsheet development, there have been reported incidents of spreadsheet errors and the problems they caused. More importantly, a number of systematic field audits have reported errors in 24% of the spreadsheets audited (86% in studies since 1997), and these rates may be low because of limits in inspection methods. Overall, the evidence that spreadsheet errors are fairly common is quite strong.

Experiments take us to the next step: understanding the types of errors that people make and why they make them. In addition, experiments allow us to test the effectiveness of techniques to reduce spreadsheet errors. Experiments use large numbers of subjects under controlled conditions. This allows us to single out a few variables for study. It also allows us to do statistical analysis.

One brief piece of terminology. The cell error rate (CER) is the percentage of formula and (usually) numerical cells that containing an error. It is similar to the error rate measure uses in programming: faults per thousand lines of noncomment source code.

Spreadsheet Development Experiments

Study SSs % with Errors Cell Error Rate (CER) Basis* Remarks
Brown & Gould (1987) 27 63%     9 experienced spreadsheet developers each built 3 SSs. Each developer made at least one error.
Olson & Nilsen (1987-1988) 14   21% CERF Measured errors as they occurred, even if the subject corrected them.

CER only reflects formula cells

Lerch (1988) 21       Measured errors as they occurred, even if the subject corrected them

CER only reflects formula cells

All formula cells

    9.3% CERF CER only reflects formula cells

References to cells in the same row

    6.4% CERF CER only reflects formula cells

References to cells in the same column

    4.7% CERF CER only reflects formula cells

References to cells in different rows and columns

    14.1% CERF CER only reflects formula cells

Novice programmers each developed several spreadsheets

Hassinen (1988) 92 55% 4.3% CERF CER only reflects formula cells
Panko & Halverson (1997)         Galumpke task.
Undergraduates
Based on all text and number cells
Working alone
42 79% 5.6% CERV  
Dyads (Groups of 2)
46 78% 3.8% CERV  
Tetrads (Groups of 4)
44 64%  1.9% CERV  
Teo & Tan (1997) 168 42% 2.1% CERV Wall task.
Panko & Sprague (1998) 26 35% 2.1% CERV Wall Task. MBA students with little or no SS development experience
Panko & Sprague (1998) 17 24% 1.1% CERV Wall Task. MBA students with 250 hours or more SS development experience
Janvrin & Morrison 1st (2000a) 61       Task with 50 links between spreadsheets.

CER only reflects some formula cells of high risk--links between worksheets

Differences were not significant

Working alone, no design training

    14% CERF  

Working alone, design training

    7% CERF  

Working in pairs, no design training

    7% CERF  

Working in pairs, design training

    7% CERF  
Janvrin & Morrison 2nd (2000b)         Task with 66 links

CER only reflects some formula cells of high risk--links between worksheets

No training in design

30   16.8% CERF Differences were not significant

Training in design

58   8.4% CERF Differences were not significant
Kreie, et al. (post test) (2000) 73 42% 2.5%    
Panko & Halverson (2001) 35 86% 4.6% CERV Undergraduates
Irons (2003)          

Wall task

11 18% 1.67% CERV Undergraduates and academic staff

Ball task

7 71% 11.9% CERV  

*CERF: Based on formula cells only.
*CERV: Based on all value cells (numbers and formulas).

Brown, P. S. & Gould, J. D. "An Experimental Study of People Creating Spreadsheets," ACM Transactions on Office Information Systems (5:3) July 1987, pp. 258-272.

9 highly experienced SS developers each developed 3 models. All made at least one error in one of their three models. Using a minimum definition of errors that ignored omission errors, 44% of the spreadsheets had errors. Using a full definition that included omission errors, 63% of the spreadsheets contained errors.


Chadwick, D.; Knight, J. & Clipsham, P., "Information Integrity in End-User Systems," in Integrity and Internal Control in Information Systems, Volume 1: Increasing the Confidence in Information Systems, S. Jajodia, W. List, G. McGregor, and L. Strous, Eds., Chapman & Hall: London, 1997, pp. 273-292.

Students and experienced spreadsheet developers in industry were asked a series of multiple choice questions to assess their spreadsheet development and planning skills. Both groups tended to make the same types of errors.

The paper notes that a major problem is that users are trained how to do things right but not how to avoid doing things wrong.


Floyd, B.D. & Pyun, J. Errors in Spreadsheet Use, Working Paper 167, Center for Research on Information Systems, Information Systems Department, New York University, New York, October, 1987.

Cell entry stage experiment. Studied text entries in the Olson & Nilsen [1985, 1987-1988] data. Cell error rate text for cells with typographical errors was 12.5%.


Hassinen, Kari. An Experimental Study of Spreadsheet Errors Made by Novice Spreadsheet Users, Department of Computer Science, University of Joensuu, P. O. Box 111, SF-80101 Joensuu, Finland, 1988.

Development stage experiment. Novice students developed several spreadsheets apiece. (The number of spreadsheets completed during the time interval varied by students.) There was a template with missing formulas to be filled in. With paper and pencil development (N=92), the CER for the formula cells was 4.3%. 55% of the spreadsheets contained errors. As a check, ten more subjects used computers. 48% of their spreadsheets contained errors.


Irons, Richard J (2003). The Wall and the Ball: A Study of Domain Referent Spreadsheet Errors, EuSpRIG Conference Proceedings 2003, Trinity College, Dublin, July 24-25 2003, 33-43.

The paper's subjects attempted to complete two spreadsheets. One was the classic Wall task. The other was the Ball task, which proved to be impossible for 4 of 11 subjects to complete. Most of the completed Ball spreadsheets contained errors. Quite simply, the ball task was beyond the capabilities of the subjects.

Janvrin, D., & Morrison, J. (2000a). Factors Influencing Risks and Outcomes in End-User Development. Proceedings of the Twenty-Ninth Hawaii International Conference on System Sciences, Kihei, Maui, Hawaii.

Janvrin, D., & Morrison, J. (2000b). Using a Structured Design Approach to Reduce Risks in End User Spreadsheet Development, Information & Management, (37), 1-12.

Two experiments were conducted to reduce the number of errors during development. Although the results were promising, they generally did not reach statistical significance.



Kreie, Jennifer; Cronan, Timothy Paul; Pendley, John; & Renwick, Janet S. (2000) "Application Development by End-Users: Can Quality be Improved?" Decision Support Systems, 29, pp. 143-152.

73 subjects were taught a design methodology. Design quality improved, but errors did not fall significantly. In the pretest before the training, 52% of the 73 spreadsheets were incorrect and the cell error rate was 2.8%. In the posttest after training, these figures were 42% and 2.5%.


Lerch, F.J. Computerized Financial Planning: Discovering Cognitive Difficulties in Knowledge Building. Unpublished doctoral dissertation, University of Michigan, Ann Arbor, Michigan, 1988.

Cell entry stage experiment. Data are for errors before corrections. The study observed 21 Lotus 1-2-3 users from industry as they entered formulas in a template. For all formula cells, the CER was 9.3%. For formulas that referenced cells in the same row, the CER was 6.4%. For formulas referencing cells in the same column, it was 4.7%. For formulas referencing cells in different rows and different columns, the CER was 14.4%. For conceptual (logic) errors, the CER was 5.0%. For overload (mechanical) errors, the error rate was 4.3%. The study also examined users of a financial modeling program, IFPS. IFPS error rates were lower overall and were lower in some (but not all) error categories.


Olson, J. R. & Nilsen, E. "Analysis of the Cognition Involved in Spreadsheet Interaction," Human-Computer Interaction (3:4) 1987-1988, pp. 309-349.

Cell entry stage experiment, in which error rates were measured before any corrections. Observed 14 subjects entering a spreadsheet model from printout. Measured errors in 4 formula entries each. CER was 21%.


Panko, Raymond R., “Two Experiments in Reducing Overconfidence in Spreadsheet Development,” Journal of Organizational and End User Computing, 19(1), January–March 2007, 1-23.

The article suggests that overconfidence may be the reason for poor spreadsheet development and security practices. The paper reports on a technique to reduce overconfidence and improve spreadsheet accuracy. In this experiments, subjects in the treatment group were told the percentage of other students who had made errors developing a spreadsheet from a certain word problem. The control group was not given this information. Overconfidence in the treatment group was significantly low than in the control group, and the treatment group had significantly fewer incorrect spreadsheet models. However, both effects were small in size.


Panko, Raymond R. and Halverson, Richard P., Jr., “An Experiment in Collaborative Spreadsheet Development” 2(4) Journal of the Association for Information Systems, July 2001.

Replication of the 1997 experiment with individuals and groups of three. Fixed some methodological problems.


Panko, R.R. & Halverson, R.P, Jr., "Are Two Heads Better than One? (At Reducing Spreadsheet Errors in Spreadsheet Modeling?" Office Systems Research Journal (15:1) Spring 1997, pp. 21-32.

Subjects were undergraduate general business students, who worked alone, in groups of two (dyads), or in groups of four (tetrads). Students who were accounting or finance majors were excluded. Subjects built a spreadsheet based on the Galumpke word problem. This problem required them to build a pro forma income statement.

Variable

Alone

Dyads

Tetrads

Subjects

42

46

44

Groups

42

23

11

Incorrect spreadsheets

79%

78%

64%

Cell Error Rate

5.6%

3.8%

1.9%

Overall, groups of four reduced errors considerably compared to individuals (65%), while dyads reduced errors only slightly (32%). However tetrads were most successful in reducing Eureka logic errors (100%) and omission errors (88%), while they were less successful at reducing Cassandra logic errors (31%) and mechanical errors (67%). Cassandra logic errors are those that are difficult to prove to be errors even if the error is identified by a group member, while eureka logic errors are easily proven to be errors if discovered by a group member. Mechanical errors happened very quickly, and partners did not look at the screen every moment.


Panko, R.R. & Sprague, R.H., Jr. "Hitting the Wall: Errors in Developing and Code Inspecting a 'Simple' Spreadsheet Model, Decision Support Systems, (22,4), April 1998, 337-353.

Subjects developed a spreadsheet model based on the "Wall" task, which was designed to be simple and relatively free of domain knowledge. Subjects took the task home to avoid the concern that being forced to do development in laboratories was the cause of high error rates in earlier studies. The problem was given to 102 undergraduate MIS majors and to 50 MBA students, 17 of whom had more than 250 hours of spreadsheet development experience. Overall, subjects had errors in 35% of their spreadsheets and a cell error rate of 2.0%. Undergraduates and inexperienced MBAs has very similar error rates. Experienced MBAs had a lower CER of 1.1%, but even they had errors in 24% of their spreadsheets. None of the differences were statistically significant. More importantly, none of the differences were of practical importance. Even with a simple, almost domain-free task and no requirement to do the work in a laboratory, error rates were unacceptably high, even for experienced developers among the MBA students.


Panko, Raymond R. & Sprague, Ralph H. Jr., "Experiments in Spreadsheet Development: Task Difficulty, Level of Expertise, and Error Rate," Working Paper, Information Systems Department, College of Business Administration, University of Hawaii, 2404 Maile Way, Honolulu, HI, 96822, September, 1997.

One potential criticism of spreadsheet development experiments is that many have used undergraduate student subjects. In this study, both undergraduate students and MBAs solved the Galumpke and Wall tasks. MBAs were further subdivided into those with 100 hours or less of spreadsheet development experience and those with 250 hours or more. For the simple and relatively domain free wall task, there were no differences in error rates across groups. For the more difficult Galumpke task that required accounting knowledge, MBAs did significantly better than undergraduates, but experience made little difference among MBAs. In neither task did development experience make a significant difference. Even for experienced spreadsheet developers, error rates were much too high for safety.



Panko, R. R., & Halverson, R. P., Jr., "An Experiment in Collaborative Spreadsheet Development," Working Paper, College of Business Administration, University of Hawaii, 2404 Maile Way, Honolulu, HI 96822, August, 1998.

This is a follow-on to the "Are Two Heads Better than One" paper noted above, using tighter experimental controls and exploring additional research topics. Subjects worked alone or in groups of three (triads).

35 students working alone made errors in 86% of their spreadsheets and had a cell error rate of 4.6%. 45 students working in 15 triads made errors in 27% of their spreadsheets and had a cell error rate of 1.0%. The reduction in errors was statistically significant and practically important.


Teo, T.S.H. & Tan, M., "Quantitative and Qualitative Errors in Spreadsheet Development," Proceedings of the Thirtieth Hawaii International Conference on System Sciences, Kihei, Hawaii, January 1997.

168 first- second-year undergraduate students were given the Wall task developed by Panko. In the first task, they build the model. In a second task, given days later, they performed a what-if analysis. In the first task, they made errors in 42% of the spreadsheet and had a cell error rate of 2.05%. Of the errors, 27% were mechanical errors, 27% were logic errors, and 47% were omission errors.

In the second task, 50% of the spreadsheets had errors, and the cell error rate was 3.05%. While doing the second exercise, the subjects corrected 29 of their 86 errors but made 71 new errors. Mechanical errors had a higher correction rate than logic or omission errors, but the numbers were too small for statistical testing. Among the new errors, 72% were omission errors.

Some subjects jammed (did not put each input number in its own labeled cell). There was a significant positive correlation between the number of jamming errors and the number of logic errors for both tasks, which is reasonable because jamming makes formulas and logic difficult to read.

Some subjects duplicated, meaning that they typed the same input number twice in different cells. This actually had a negative correlation with the number of mechanical errors, presumably because duplication placed numbers in the correct column in a two-year model. However the same effect could be done more safely, without duplication, by placing a formula in the second cell referring to the first cell.


Copyright Panko 1997-2005