python - Tesseract-OCR を使用した画像からテキストへの認識は、Python コードよりも Gimp を使用して画像を手動で前処理した場合に優れています

Question

Tesseract-OCR を使用した手動の画像前処理と認識のために、Python でコードを記述しようとしています。

手動プロセス:
単一の画像のテキストを手動で認識するために、Gimp を使用して画像を前処理し、TIF 画像を作成します。次に、それを正しく認識する Tesseract-OCR にフィードします。

Gimpを使用して画像を前処理するには -

モードを RGB / グレースケールに変更
メニュー -- イメージ -- モード -- RGB
しきい値設定
メニュー -- ツール -- カラーツール -- しきい値 -- 自動
モードを Indexed
Menu -- Image -- Mode -- Indexedに変更します
Resize / Scale to Width > 300px
Menu -- Image -- Scale image -- Width=300
Tif として保存

それから私はそれにtesseractを与えます -

$ tesseract captcha.tif output -psm 6

そして、常に正確な結果が得られます。

Python コード:
OpenCV と Tesseract を使用して上記の手順を複製しようとしました -

def binarize_image_using_opencv(captcha_path, binary_image_path='input-black-n-white.jpg'):
    im_gray = cv2.imread(captcha_path, cv2.CV_LOAD_IMAGE_GRAYSCALE)
    (thresh, im_bw) = cv2.threshold(im_gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
    # although thresh is used below, gonna pick something suitable
    im_bw = cv2.threshold(im_gray, thresh, 255, cv2.THRESH_BINARY)[1]
    cv2.imwrite(binary_image_path, im_bw)

    return binary_image_path

def preprocess_image_using_opencv(captcha_path):
    bin_image_path = binarize_image_using_opencv(captcha_path)

    im_bin = Image.open(bin_image_path)
    basewidth = 300  # in pixels
    wpercent = (basewidth/float(im_bin.size[0]))
    hsize = int((float(im_bin.size[1])*float(wpercent)))
    big = im_bin.resize((basewidth, hsize), Image.NEAREST)

    # tesseract-ocr only works with TIF so save the bigger image in that format
    tif_file = "input-NEAREST.tif"
    big.save(tif_file)

    return tif_file

def get_captcha_text_from_captcha_image(captcha_path):

    # Preprocess the image befor OCR
    tif_file = preprocess_image_using_opencv(captcha_path)

    #   Perform OCR using tesseract-ocr library
    # OCR : Optical Character Recognition
    image = Image.open(tif_file)
    ocr_text = image_to_string(image, config="-psm 6")
    alphanumeric_text = ''.join(e for e in ocr_text)

    return alphanumeric_text

しかし、私は同じ精度を得ていません。私は何を取りこぼしたか？

更新 1:

元の画像
Gimp を使用して作成された Tif イメージ
私のpythonコードで作成されたTif画像

更新 2:

このコードは、https://github.com/hussaintamboli/python-image-to-textで入手できます。

score 1 · Accepted Answer

出力が予想される出力からわずかに逸脱している場合 (つまり、コメントで提案されている余分な '," など)、文字認識を予想する文字セット (英数字など) に制限してみてください。

python - Tesseract-OCR を使用した画像からテキストへの認識は、Python コードよりも Gimp を使用して画像を手動で前処理した場合に優れています

更新 1:

更新 2:

2 に答える 2

Related

Reference