Errors appear to be
inevitable in spreadsheet development, just as they are in programming. In
fact, cell error rates in programming are consistent with the rates of faults
per hundred lines of code in programming. In programming, safety requires an
intense testing program accounting for about a third of all development costs.
One testing technique commonly used is code inspection, in which a team
examines a program module line by line, looking for errors. Typically, they
work in two phases--an individual phase followed by a group meeting.
The Human Error website presents data from code
inspection studies in software development. The research indicates that code
inspection in programming is very difficult. Experiments indicate that
inspectors tend to find only half of all errors and sometimes much less than
half. As a result, software code inspection is done in teams. Even so, teams
still only catch about 80% of all errors.
Research on code
inspection in spreadsheeting yields results very similar to those in
programming. Subjects working alone catch about half of all errors or sometimes
much less. Even team code inspection does not catch all errors.
There are two ways of
presenting code inspection results. One is to present the percentage of all
seeded errors that are discovered by the subject. In contrast, the error
rate is the percentage of seeded errors not discovered by the
subject. The following table shows the percentage of errors detected.
|
Study |
Subjects |
Sample
Size |
%
of Errors Detected |
Remarks |
||
|
Galletta et al. (1993) |
MBA students & CPAs Taking a
Continuing Education Course |
|
|
Budgeting task containing seeded errors |
||
|
Total sample |
|
60 |
56% |
|
|
|
CPA novices |
<100 hours of work experience with SSs |
15 |
57% |
|
|
|
CPA experts |
<100 hours of work experience with SSs |
15 |
66% |
|
|
|
MBA students, novices |
>250 hours of work experience with SSs |
15 |
52% |
|
|
|
MBA students, experienced |
>250 hours of work experience with SSs |
15 |
48% |
|
|
|
Domain (logic) errors corrected |
|
|
46% |
|
|
|
Device (mechanical) errors corrected |
|
|
65% |
|
|
|
Galletta et al. (1997) |
MBA students |
|
51% |
Same task used 1993 study |
||
|
Overall |
|
113 |
51% |
|
|
|
On-Screen |
|
45 |
45% |
|
|
|
On Paper |
|
68 |
55% |
|
|
|
Panko (1999) |
|
|
|
Modified version of Galletta wall task. |
||
|
Undergrads
working alone |
|
60 |
63% |
|
||
|
Undergrads
working in groups of three |
|
60 |
83% |
|
||
|
Panko & Sprague (1998) |
Undergrads |
23 |
16% |
Students who made errors in the Wall task
who then went on to inspect their own spreadsheets. |
||
Clermont, Markus
Mittermeir, R.
Auditing Large Spreadsheet Programs
Uses Galletta/My model
Found 5/7 errors.
Did not count or find omission errors.
Caught 5/9 then
Galletta, D.F.; Abraham, D.; El
Louadi, M.; Lekse, W.; Pollailis, Y.A.; & Sampler, J.L. "An Empirical
Study of Spreadsheet Error-Finding Performance," Journal of Accounting,
Management, and Information Technology (3:2) April-June 1993, pp. 79-95.
30 CPAs taking a
professional education course and 30 MBA students. Each subject debugged six
models. Each had one seeded device (mechanical) error and one seeded domain
(logic) error. Subjects subdivided into SS experts with more than 250 hours and
novices with fewer hours. The authors argued that this would have produced
approximately 2,000 feedback-related errors. Subjects missed 54% of domain
errors and 35% of device errors. Accountants made significantly fewer errors,
due to fewer domain errors. Expertise increased speed but did not reduce
errors. Experts caught 57% of the errors while novices caught 55%.
Galletta, D.F.; Hartzel, K.S.; Johnson, S.;
& Joseph, J.L, "Spreadsheet Presentation and Error Detection: An
Experimental Study," Journal of Management Information Systems, 13(3)
Winter 1996-1997, pp. 45-63.
113 MBA students
debugged a single model seeded with eight errors. Subjects either looked at the
model on-screen or on paper. Overall, subjects caught 51% of the seeded errors.
With the onscreen presentation, subjects caught fewer errors then they caught
with paper presentations. This paper-versus-screen difference is consistent
with past research cited in the paper.
Panko, R.R. & Sprague, R.H., Jr. "Hitting the Wall: Errors in
Developing and Code Inspecting a 'Simple' Spreadsheet Model," Decision
Support Systems, 22, 1998, 337-353.
Undergraduate MIS
majors who had built a spreadsheet for the Wall task were given the opportunity
to correct their errors. No subject with a correct model changed it. Of 23
subjects who had made an error in the spreadsheet originally, only four (13%)
completely corrected the spreadsheet. They corrected 18% of the individual
errors.
Panko, Raymond R. "Applying Code Inspection to Spreadsheet Testing,"
Journal of Management Information Systems,
16(2), Fall 1999, 159-176.
Spreadsheet errors
appear to be about as frequent as errors in programming. Programming
reliability requires an extensive testing phase that may include code
inspection. A similar testing phase may be needed in spreadsheeting.
We conducted an
experiment using the full two-phase programming code inspection methodology. In
addition, subjects had a minimum completion time for the individual and group
inspections, to prevent hasty inspection.
Individual code
inspection, consistent with past studies, caught only 63% of all errors. This
jumped to 83% during the three-person group code inspection phase. However
groups did not find any new errors, and one group even lost an error found by
one of its members in the individual phase. This raises the question of whether
a group code inspection phase is really necessary.
Subjects were somewhat
overconfident in their ability to detect errors when working alone.
Omission errors and errors
in long formulas were difficult to detect.
Group code inspection
produced the largest gains for the types of errors that individuals found most
difficult to detect.
Reithel, Brian J.; Nichols, Dave L.; &
Robinson, Robert K., "An Experimental Investigation of the Effects of
Size, Format, and Errors on Spreadsheet Reliability Perception," Journal
of Computer Information Systems, 36(3), Spring 1996, pp. 54-64.
Subjects were shown
spreadsheets that were large and small and were poorly formatted or
well-formatted. Paradoxically, subjects had much more confidence in the
accuracy of large, well-formatted spreadsheets than in the other three types.
This is paradoxical because large spreadsheets should have more errors than
small spreadsheets.
Copyright Panko
1997-2006.