ios - OCR：画像からテキストへ？

Question

質問のコピーまたは繰り返しとしてマークを付ける前に、まず質問全体をお読みください。

私はpressentで行うことができます以下のとおりです：

画像を取得し、OCRの目的の部分をトリミングします。
tesseractおよびを使用して画像を処理しleptonicaます。
適用されたドキュメントがチャンクにトリミングされる場合、つまり画像ごとに1文字の場合、96％の精度が得られます。
そうしないと、ドキュメントの背景が白色でテキストが黒色の場合、ほぼ同じ精度が得られます。

たとえば、入力が次の写真の場合：

写真スタート

ここに画像の説明を入力してください

写真終了

私が欲しいのは、ここに画像の説明を入力してください
ブロックを生成することなく、この写真に対して同じ精度を得ることができることです。

tesseractを初期化し、画像からテキストを抽出するために使用したコードは次のとおりです。

正八胞体の初期化のために

.hファイル内

tesseract::TessBaseAPI *tesseract;
uint32_t *pixels;

.mファイル内

tesseract = new tesseract::TessBaseAPI();
tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], "eng");
tesseract->SetPageSegMode(tesseract::PSM_SINGLE_LINE);
tesseract->SetVariable("tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ");
tesseract->SetVariable("language_model_penalty_non_freq_dict_word", "1");
tesseract->SetVariable("language_model_penalty_non_dict_word ", "1");
tesseract->SetVariable("tessedit_flip_0O", "1");
tesseract->SetVariable("tessedit_single_match", "0");
tesseract->SetVariable("textord_noise_normratio", "5");
tesseract->SetVariable("matcher_avg_noise_size", "22");
tesseract->SetVariable("image_default_resolution", "450");
tesseract->SetVariable("editor_image_text_color", "40");
tesseract->SetVariable("textord_projection_scale", "0.25");
tesseract->SetVariable("tessedit_minimal_rejection", "1");
tesseract->SetVariable("tessedit_zero_kelvin_rejection", "1");

画像からテキストを取得する場合

- (void)processOcrAt:(UIImage *)image
{
    [self setTesseractImage:image];

    tesseract->Recognize(NULL);
    char* utf8Text = tesseract->GetUTF8Text();
    int conf = tesseract->MeanTextConf();

    NSArray *arr = [[NSArray alloc]initWithObjects:[NSString stringWithUTF8String:utf8Text],[NSString stringWithFormat:@"%d%@",conf,@"%"], nil];

    [self performSelectorOnMainThread:@selector(ocrProcessingFinished:)
                           withObject:arr
                        waitUntilDone:YES];
    free(utf8Text);
}

- (void)ocrProcessingFinished0:(NSArray *)result
{
    UIAlertView *alt = [[UIAlertView alloc]initWithTitle:@"Data" message:[result objectAtIndex:0] delegate:self cancelButtonTitle:nil otherButtonTitles:@"OK", nil];
   [alt show];
}

しかし、ナンバープレート画像がnullであるか、画像にガベージデータが含まれているため、適切な出力が得られません。

そして、最初の画像、つまりテキストが黒の白い背景の画像を使用すると、出力は89〜95％正確になります。

私を助けてください。

任意の提案をいただければ幸いです。

アップデート

リンクを提供してくれた@jcesarと、貴重な情報とガイドを提供してくれた@konstantinpribludaに感謝します。

画像を適切な白黒形式に変換することができます（ほぼ）。したがって、認識はすべての画像で優れています:)

画像の適切な2値化についてサポートが必要です。どんなアイデアでも大歓迎です

score 6 · Accepted Answer

こんにちはすべての返信をありがとう、そのすべての返信から私は以下のようにこの結論を得ることができます：

ナンバープレートが含まれているトリミングされた画像ブロックを1つだけ取得する必要があります。
そのプレートから、ここで提供されている方法を使用して取得したデータを使用して、数値部分の部分を見つける必要があります。
次に、上記の方法で検出されたRGBデータを使用して、画像データをほぼ白黒に変換します。
次に、ここで提供されている方法を使用して、データが画像に変換されます。

上記の4つのステップは、次のように1つの方法にまとめられます。

-(void)getRGBAsFromImage:(UIImage*)image
{
    NSInteger count = (image.size.width * image.size.height);
    // First get the image into your data buffer
    CGImageRef imageRef = [image CGImage];
    NSUInteger width = CGImageGetWidth(imageRef);
    NSUInteger height = CGImageGetHeight(imageRef);
    CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
    unsigned char *rawData = (unsigned char*) calloc(height * width * 4, sizeof(unsigned char));
    NSUInteger bytesPerPixel = 4;
    NSUInteger bytesPerRow = bytesPerPixel * width;
    NSUInteger bitsPerComponent = 8;
    CGContextRef context = CGBitmapContextCreate(rawData, width, height,
                                                 bitsPerComponent, bytesPerRow, colorSpace,
                                                 kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big);
    CGColorSpaceRelease(colorSpace);

    CGContextDrawImage(context, CGRectMake(0, 0, width, height), imageRef);
    CGContextRelease(context);

    // Now your rawData contains the image data in the RGBA8888 pixel format.
    int byteIndex = 0;
    for (int ii = 0 ; ii < count ; ++ii)
    {
        CGFloat red   = (rawData[byteIndex]     * 1.0) ;
        CGFloat green = (rawData[byteIndex + 1] * 1.0) ;
        CGFloat blue  = (rawData[byteIndex + 2] * 1.0) ;
        CGFloat alpha = (rawData[byteIndex + 3] * 1.0) ;

        NSLog(@"red %f \t green %f \t blue %f \t alpha %f rawData [%d] %d",red,green,blue,alpha,ii,rawData[ii]);
        if(red > Required_Value_of_red || green > Required_Value_of_green || blue > Required_Value_of_blue)//all values are between 0 to 255
        {
            red = 255.0;
            green = 255.0;
            blue = 255.0;
            alpha = 255.0;
            // all value set to 255 to get white background.
        }
        rawData[byteIndex] = red;
        rawData[byteIndex + 1] = green;
        rawData[byteIndex + 2] = blue;
        rawData[byteIndex + 3] = alpha;

        byteIndex += 4;
    }

    colorSpace = CGColorSpaceCreateDeviceRGB();
    CGContextRef bitmapContext = CGBitmapContextCreate(
                                                       rawData,
                                                       width,
                                                       height,
                                                       8, // bitsPerComponent
                                                       4*width, // bytesPerRow
                                                       colorSpace,
                                                       kCGImageAlphaNoneSkipLast);

    CFRelease(colorSpace);

    CGImageRef cgImage = CGBitmapContextCreateImage(bitmapContext);

    UIImage *img = [UIImage imageWithCGImage:cgImage];

    //use the img for further use of ocr

    free(rawData);
}

ノート：

この方法の唯一の欠点は、消費される時間とRGB値が白に変換され、その他が黒に変換されることです。

アップデート：

    CGImageRef imageRef = [plate CGImage];
    CIContext *context = [CIContext contextWithOptions:nil]; // 1
    CIImage *ciImage = [CIImage imageWithCGImage:imageRef]; // 2
    CIFilter *filter = [CIFilter filterWithName:@"CIColorMonochrome" keysAndValues:@"inputImage", ciImage, @"inputColor", [CIColor colorWithRed:1.f green:1.f blue:1.f alpha:1.0f], @"inputIntensity", [NSNumber numberWithFloat:1.f], nil]; // 3
    CIImage *ciResult = [filter valueForKey:kCIOutputImageKey]; // 4
    CGImageRef cgImage = [context createCGImage:ciResult fromRect:[ciResult extent]];
    UIImage *img = [UIImage imageWithCGImage:cgImage];

上記のメソッドの（getRGBAsFromImage:）コードをこれに置き換えるだけで、結果は同じですが、かかる時間はわずか0.1〜0.3秒です。

score 4 · Accepted Answer

提供されたデモ写真と正しい文字を生成することで、ほぼ瞬時に結果を得ることができました。

GPUImageを使用して画像を前処理しました

// Pre-processing for OCR
GPUImageLuminanceThresholdFilter * adaptiveThreshold = [[GPUImageLuminanceThresholdFilter alloc] init];
[adaptiveThreshold setThreshold:0.3f];
[self setProcessedImage:[adaptiveThreshold imageByFilteringImage:_image]];

そして、その処理された画像をTESSに送信します

- (NSArray *)processOcrAt:(UIImage *)image {
    [self setTesseractImage:image];

    _tesseract->Recognize(NULL);
    char* utf8Text = _tesseract->GetUTF8Text();

    return [self ocrProcessingFinished:[NSString stringWithUTF8String:utf8Text]];
}

- (NSArray *)ocrProcessingFinished:(NSString *)result {
    // Strip extra characters, whitespace/newlines
    NSString * results_noNewLine = [result stringByReplacingOccurrencesOfString:@"\n" withString:@""];
    NSArray * results_noWhitespace = [results_noNewLine componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
    NSString * results_final = [results_noWhitespace componentsJoinedByString:@""];
    results_final = [results_final lowercaseString];

    // Separate out individual letters
    NSMutableArray * letters = [[NSMutableArray alloc] initWithCapacity:results_final.length];
    for (int i = 0; i < [results_final length]; i++) {
        NSString * newTile = [results_final substringWithRange:NSMakeRange(i, 1)];
        [letters addObject:newTile];
    }

    return [NSArray arrayWithArray:letters];
}

- (void)setTesseractImage:(UIImage *)image {
    free(_pixels);

    CGSize size = [image size];
    int width = size.width;
    int height = size.height;

    if (width <= 0 || height <= 0)
        return;

    // the pixels will be painted to this array
    _pixels = (uint32_t *) malloc(width * height * sizeof(uint32_t));
    // clear the pixels so any transparency is preserved
    memset(_pixels, 0, width * height * sizeof(uint32_t));

    CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();

    // create a context with RGBA pixels
    CGContextRef context = CGBitmapContextCreate(_pixels, width, height, 8, width * sizeof(uint32_t), colorSpace,
                                                 kCGBitmapByteOrder32Little | kCGImageAlphaPremultipliedLast);

    // paint the bitmap to our context which will fill in the pixels array
    CGContextDrawImage(context, CGRectMake(0, 0, width, height), [image CGImage]);

    _tesseract->SetImage((const unsigned char *) _pixels, width, height, sizeof(uint32_t), width * sizeof(uint32_t));
}

この左の'マークは-ですが、これらも簡単に削除できます。使用している画像セットによっては、少し微調整する必要があるかもしれませんが、正しい方向に移動できるはずです。

使用に問題がある場合はお知らせください。これは私が使用しているプロジェクトからのものであり、すべてを削除したり、プロジェクトを最初から作成したりする必要はありませんでした。

score 1 · Accepted Answer

私はあえて、正八胞体はあなたの目的のためにやり過ぎになるでしょう。認識品質を向上させるために辞書の照合は必要ありません（この辞書はありませんが、ライセンス番号のチェックサムを計算する手段である可能性があります）。また、OCR用に最適化されたフォントがあります。そして何よりも、画像内の領域を見つけるためのマーカー（近くのオレンジ色と青色の領域が適しています）があります。

私のOCRアプリは、人間が支援する関心領域の取得を使用しています（カメラのプレビューにヘルプオーバーレイを表示することだけを目的としています）。通常、人は顔のような興味深い特徴を見つけるためにハールカスケードのようなものを使用します。また、すべての画像をトラバースし、適切な色の左端/右端/最上部/最下部のピクセルをストーするだけで、オレンジ色の領域の重心、またはオレンジ色のピクセルのバウンディングボックスを計算することもできます。

認識自体については、不変の瞬間を使用することをお勧めします（tesseractに実装されているかどうかはわかりませんが、Javaプロジェクトから簡単に移植できます： http ：//sourceforge.net/projects/javaocr/ ）。

モニター画像でデモアプリを試しましたが、スポーツの数字を認識しました（キャラクターのトレーニングを受けていません）

二値化（黒と白を分離する）に関しては、輝度の変化に対して最高の耐性を与えるため、sauvolaメソッドをお勧めします（これもOCRプロジェクトで実装されています）

ios - OCR：画像からテキストへ？

3 に答える 3

Related

Reference