Skip to main content
eScholarship
Open Access Publications from the University of California

Using Nature Language Processing to Improve Optical Character Recognition

Abstract

OCR (Optical Character Recognition) has developed over 100 years. However, if the document or picture is stained,it could not work well. Considering that words in text can be connected by logical relationship, with the help of the idea thatreducing the size of word stock which references from license plate recognition, this paper established N-GRAM model, usedthe results of Google search engine to improve its text sparsity. The use of residual features of the original stained characterscan improve the recognition rate and accuracy with the help of a smaller size of the word stock successfully.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View