Recognize Chinese Simplified From Image Using pytesseract and Tesseract-OCR – Tesseract-OCR Tutorial

By | October 21, 2020

In this tutorial, we will introduce how to recognize chinese simplified text from an image using pytesseract and Tesseract-OCR. You can learn how to do by following our tutorial.

Download chi_sim.traineddata

In order to recognize chinese simplified text from an image, you should use chi_sim.traineddata.

Fix Python Tesseract Failed loading language ‘chi_sim’ Error

Then we can start to recognize.

Recognize chinese simplified text from an image

Here is an example code to show you how to do.

from  PIL import  Image
import pytesseract

img_path='test.png'
im = Image.open(img_path)
imgrey = im.convert('L')
imgrey.show()

text=pytesseract.image_to_string(imgrey, lang='chi_sim')
 
print(text)

Here test.png is an image, which contains some chinese simplified text.

Recognize Chinese Simplified From Image Using pytesseract and Tesseract-OCR - Tesseract-OCR Tutorial

In order to increase the accuracy of recognition, we can convert it to a grey color.

imgrey = im.convert('L')

Finally, we can use lang=’chi_sim’ to recognize chinese simplified text in this image.

Leave a Reply