User Settings
Article

Performance evaluation of two OCR systems

Sibo Chen,S. Subramaniam,R.M. Haralick,Ihsin T. Phillips-1994-12-31-OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information)
10

TL;DRAbstract

An experimental protocol for the performance evaluation of Optical Character Recognition (OCR) algorithms is described. The protocol is intended to serve as a model for using the University of Washington English Document Image Database-I to evaluate OCR systems. The plain text zones (without special symbols) in this database have over 2,300,000 characters. The performances of two UNIX-based OCR systems, namely Caere OCR v109a and Xerox ScanWorX v2.0, are measured. The results suggest that Caere OCR outperforms ScanWorX in terms of recognition accuracy; however, ScanWorX is more robust in the presence of image flaws.

Chat with Paper

AI Agents for this Paper

An experimental protocol for the performance evaluation of Optical Character Recognition (OCR) algorithms is described. The protocol is intended to serve as a model for using the University of Washington English Document Image Database-I to evaluate OCR systems. The plain text zones (without special symbols) in this database have over 2,300,000 characters. The performances of two UNIX-based OCR systems, namely Caere OCR v109a and Xerox ScanWorX v2.0, are measured. The results suggest that Caere OCR outperforms ScanWorX in terms of recognition accuracy; however, ScanWorX is more robust in the presence of image flaws.

Keywords

Optical character recognitionComputer scienceUnixArtificial intelligenceProtocol (science)Information retrievalNatural language processingCharacter (mathematics)

Chat

Click to start Chat