In this tutorial, we will introduce how to recognize chinese simplified text from an image using pytesseract and Tesseract-OCR. You can learn how to do by following our tutorial.
Download chi_sim.traineddata
In order to recognize chinese simplified text from an image, you should use chi_sim.traineddata.
Fix Python Tesseract Failed loading language ‘chi_sim’ Error
Then we can start to recognize.
Recognize chinese simplified text from an image
Here is an example code to show you how to do.
from PIL import Image import pytesseract img_path='test.png' im = Image.open(img_path) imgrey = im.convert('L') imgrey.show() text=pytesseract.image_to_string(imgrey, lang='chi_sim') print(text)
Here test.png is an image, which contains some chinese simplified text.
In order to increase the accuracy of recognition, we can convert it to a grey color.
imgrey = im.convert('L')
Finally, we can use lang=’chi_sim’ to recognize chinese simplified text in this image.