Errors in Spreadsheet Auditing Experiments

Errors appear to be inevitable in spreadsheet development, just as they are in programming. In fact, cell error rates in programming are consistent with the rates of faults per hundred lines of code in programming. In programming, safety requires an intense testing program accounting for about a third of all development costs. One testing technique commonly used is code inspection, in which a team examines a program module line by line, looking for errors. Typically, they work in two phases--an individual phase followed by a group meeting.

The Human Error website presents data from code inspection studies in software development. The research indicates that code inspection in programming is very difficult. Experiments indicate that inspectors tend to find only half of all errors and sometimes much less than half. As a result, software code inspection is done in teams. Even so, teams still only catch about 80% of all errors.

Research on code inspection in spreadsheeting yields results very similar to those in programming. Subjects working alone catch about half of all errors or sometimes much less. Even team code inspection does not catch all errors.

There are two ways of presenting code inspection results. One is to present the percentage of all seeded errors that are discovered by the subject. In contrast, the error rate is the percentage of seeded errors not discovered by the subject. The following table shows the percentage of errors detected.

Study

Subjects

Sample Size

% of Errors Detected

Remarks

Galletta et al. (1993)

MBA students & CPAs Taking a Continuing Education Course

 

 

Budgeting task containing seeded errors

 

Total sample

 

60

56%

 

 

CPA novices

<100 hours of work experience with SSs

15

57%

 

 

CPA experts

<100 hours of work experience with SSs

15

66%

 

 

MBA students, novices

>250 hours of work experience with SSs

15

52%

 

 

MBA students, experienced

>250 hours of work experience with SSs

15

48%

 

 

Domain (logic) errors corrected

 

 

46%

 

 

Device (mechanical) errors corrected

 

 

65%

 

Galletta et al. (1997)

MBA students

 

51%

Same task used 1993 study

 

Overall

 

113

51%

 

 

On-Screen

 

45

45%

 

 

On Paper

 

68

55%

 

Panko (1999)

 

 

 

Modified version of Galletta wall task.

Undergrads working alone

 

60

63%

 

Undergrads working in groups of three

 

60

83%

 

Panko & Sprague (1998)

Undergrads

23

16%

Students who made errors in the Wall task who then went on to inspect their own spreadsheets.

 

 

Clermont, Markus

Mittermeir, R.

Auditing Large Spreadsheet Programs

 

 

Uses Galletta/My model

 

Found 5/7 errors.

Did not count or find omission errors.

Caught 5/9 then

 

 


Galletta, D.F.; Abraham, D.; El Louadi, M.; Lekse, W.; Pollailis, Y.A.; & Sampler, J.L. "An Empirical Study of Spreadsheet Error-Finding Performance," Journal of Accounting, Management, and Information Technology (3:2) April-June 1993, pp. 79-95.

30 CPAs taking a professional education course and 30 MBA students. Each subject debugged six models. Each had one seeded device (mechanical) error and one seeded domain (logic) error. Subjects subdivided into SS experts with more than 250 hours and novices with fewer hours. The authors argued that this would have produced approximately 2,000 feedback-related errors. Subjects missed 54% of domain errors and 35% of device errors. Accountants made significantly fewer errors, due to fewer domain errors. Expertise increased speed but did not reduce errors. Experts caught 57% of the errors while novices caught 55%.


Galletta, D.F.; Hartzel, K.S.; Johnson, S.; & Joseph, J.L, "Spreadsheet Presentation and Error Detection: An Experimental Study," Journal of Management Information Systems, 13(3) Winter 1996-1997, pp. 45-63.

113 MBA students debugged a single model seeded with eight errors. Subjects either looked at the model on-screen or on paper. Overall, subjects caught 51% of the seeded errors. With the onscreen presentation, subjects caught fewer errors then they caught with paper presentations. This paper-versus-screen difference is consistent with past research cited in the paper.


Panko, R.R. & Sprague, R.H., Jr. "Hitting the Wall: Errors in Developing and Code Inspecting a 'Simple' Spreadsheet Model," Decision Support Systems, 22, 1998, 337-353.

Undergraduate MIS majors who had built a spreadsheet for the Wall task were given the opportunity to correct their errors. No subject with a correct model changed it. Of 23 subjects who had made an error in the spreadsheet originally, only four (13%) completely corrected the spreadsheet. They corrected 18% of the individual errors.


Panko, Raymond R. "Applying Code Inspection to Spreadsheet Testing," Journal of Management Information Systems, 16(2), Fall 1999, 159-176.

Spreadsheet errors appear to be about as frequent as errors in programming. Programming reliability requires an extensive testing phase that may include code inspection. A similar testing phase may be needed in spreadsheeting.

We conducted an experiment using the full two-phase programming code inspection methodology. In addition, subjects had a minimum completion time for the individual and group inspections, to prevent hasty inspection.

Individual code inspection, consistent with past studies, caught only 63% of all errors. This jumped to 83% during the three-person group code inspection phase. However groups did not find any new errors, and one group even lost an error found by one of its members in the individual phase. This raises the question of whether a group code inspection phase is really necessary.

Subjects were somewhat overconfident in their ability to detect errors when working alone.

Omission errors and errors in long formulas were difficult to detect.

Group code inspection produced the largest gains for the types of errors that individuals found most difficult to detect.


Reithel, Brian J.; Nichols, Dave L.; & Robinson, Robert K., "An Experimental Investigation of the Effects of Size, Format, and Errors on Spreadsheet Reliability Perception," Journal of Computer Information Systems, 36(3), Spring 1996, pp. 54-64.

Subjects were shown spreadsheets that were large and small and were poorly formatted or well-formatted. Paradoxically, subjects had much more confidence in the accuracy of large, well-formatted spreadsheets than in the other three types. This is paradoxical because large spreadsheets should have more errors than small spreadsheets.

Copyright Panko 1997-2006.