visual-studio-2010 - Tesseract OCRを使用して画像から複数の線を抽出するにはどうすればよいですか？

Question

「HelloWorld」というテキストを含む1行の画像を渡し、TesseractOCRは結果「HelloWorld」を完全に表示します。
しかし、複数行のテキストを含む画像を渡した場合

Helloworld
お元気ですか

何も表示されません。

これが私たちのコードです：

#include "stdafx.h"
#include <iostream>
#include <baseapi.h>
#include <allheaders.h>
#include <fstream>

using namespace std;

int _tmain(int argc, _TCHAR* argv[])
{
    tesseract::TessBaseAPI api;

    api.Init("", "eng", tesseract::OEM_DEFAULT);
    api.SetPageSegMode(static_cast<tesseract::PageSegMode>(7));
    api.SetOutputName("out");

    cout<<"File name:";
    char image[256];
    cin>>image;
    PIX   *pixs = pixRead(image);

    STRING text_out;
    api.ProcessPages(image, NULL, 0, &text_out);

    cout<<text_out.string();

    ofstream files;
    files.open("out.txt");
    files << text_out.string()<<endl;
    files.close();

    cin>> image;
    return 0;
}

1行で入力

1行で出力

2行で入力

2行で出力

score 0 · Accepted Answer

ページセグメンテーションモード7は、画像を単一のテキスト行として扱います。3を試してください。これは完全自動のページセグメンテーションですが、OSDはありません（デフォルト）。

visual-studio-2010 - Tesseract OCRを使用して画像から複数の線を抽出するにはどうすればよいですか？

1行で入力

1行で出力

2行で入力

2行で出力

1 に答える 1

Related

Reference