Performance evaluation of two OCR systems
TL;DRAbstract
An experimental protocol for the performance evaluation of Optical Character Recognition (OCR) algorithms is described. The protocol is intended to serve as a model for using the University of Washington English Document Image Database-I to evaluate OCR systems. The plain text zones (without special symbols) in this database have over 2,300,000 characters. The performances of two UNIX-based OCR systems, namely Caere OCR v109a and Xerox ScanWorX v2.0, are measured. The results suggest that Caere OCR outperforms ScanWorX in terms of recognition accuracy; however, ScanWorX is more robust in the presence of image flaws.
Chat with Paper
AI Agents for this Paper
An experimental protocol for the performance evaluation of Optical Character Recognition (OCR) algorithms is described. The protocol is intended to serve as a model for using the University of Washington English Document Image Database-I to evaluate OCR systems. The plain text zones (without special symbols) in this database have over 2,300,000 characters. The performances of two UNIX-based OCR systems, namely Caere OCR v109a and Xerox ScanWorX v2.0, are measured. The results suggest that Caere OCR outperforms ScanWorX in terms of recognition accuracy; however, ScanWorX is more robust in the presence of image flaws.
Keywords
Chat
Click to start Chat