python - Pytesseract による OCR 精度の向上

質問する 2020-09-28T09:14:52.040

2299 次

の画像からテキストを抽出したいpython。そのために、私はを選びましpytesseractた。画像からテキストを抽出しようとしたところ、満足のいく結果が得られませんでした。私もこれを経験し、リストされているすべてのテクニックを実装しました。とはいえ、うまく機能していないようです。

画像：

コード：

import pytesseract
import cv2
import numpy as np

img = cv2.imread('D:\\wordsimg.png')

img = cv2.resize(img, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)

img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

kernel = np.ones((1,1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)

img = cv2.threshold(cv2.medianBlur(img, 3), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
    
txt = pytesseract.image_to_string(img ,lang = 'eng')

txt = txt[:-1]

txt = txt.replace('\n',' ')

print(txt)

出力：

t hose he large form might light another us should took mountai house n story important went own own thought girl over family look some much ask the under why miss point make mile grow do own school was

不要なスペースが 1 つでも多くの費用がかかる可能性があります。結果が 100% 正確であることを望みます。どんな助けでも大歓迎です。ありがとう！

python - Pytesseract による OCR 精度の向上

1 に答える 1

Related

Reference