Recognize Chinese Simplified From Image Using pytesseract and Tesseract-OCR

In this tutorial, we will introduce how to recognize chinese simplified text from an image using pytesseract and Tesseract-OCR. You can learn how to do by following our tutorial.

Download chi_sim.traineddata

In order to recognize chinese simplified text from an image, you should use chi_sim.traineddata.

Fix Python Tesseract Failed loading language ‘chi_sim’ Error

Then we can start to recognize.

Recognize chinese simplified text from an image

Here is an example code to show you how to do.

from  PIL import  Image
import pytesseract

img_path='test.png'
im = Image.open(img_path)
imgrey = im.convert('L')
imgrey.show()

text=pytesseract.image_to_string(imgrey, lang='chi_sim')
 
print(text)

Here test.png is an image, which contains some chinese simplified text.

In order to increase the accuracy of recognition, we can convert it to a grey color.

imgrey = im.convert('L')

Finally, we can use lang=’chi_sim’ to recognize chinese simplified text in this image.

Recognize Chinese Simplified From Image Using pytesseract and Tesseract-OCR – Tesseract-OCR Tutorial

Download chi_sim.traineddata

Recognize chinese simplified text from an image

Leave a Reply Cancel reply