java - 長い文字列を配列またはデータベースフィールドに変換するにはどうすればよいですか?

Question

OpenBravoPOS を拡張して、製品を注文した会社からの請求書を読むのに忙しいです。

この請求書は PDF で作成されます。Itext ライブラリを使用して、特定の注文行を読み取りました。問題は、必要なページを 1 つの大きな文字列で読み取ることができることです。この文字列は次のようになります

LEVERINGSBON 30/06/2012 27828/2012/NL/WebShop   Distributeur ID nummer: 15099191 Uw distributeur: Klant Naam: FM Point Marcel Snoeck Adres: Zonnedauw 17 5953MS Reuver Telefoon: +31654317017 E-MAIL: yvonneenmarcel@home.nl Opmerking: -  Lp. Rekening Totaal FV/39525/2012/NL     vd Wal Sandra 72.00 1 3 x 354 - Luxury Collection 50ml NEW! 72.00 FV/39526/2012/NL     Slaats Tim 6.00 2 1 x KR01 - Eye Pencil DECADENCE BLACK 6.00 FV/39527/2012/NL     Nabben Britt 44.95 3 3 x E013 - Krachtreiniger 1000ml 24.75 4 2 x E016 -Tapijtreiniger 1000ml 9.20 5 1 x 3 Step Mascara PERFECT BLACK 11.00 FV/39528/2012/NL     Nabben Lieke 32.00 6 1 x 192 - Luxury Collection 50ml 21.00 7 1 x 3 Step Mascara PERFECT BLACK 11.00 FV/39529/2012/NL     Claessens Patrick 12.40 8 1 x P101 - Peeling VERBENA 12.40 FV/39530/2012/NL     Smits Yolanda 56.00 9 1 x E006 - Wasmiddel VIVID COLOURS 1000ml 7.00 10 2 x B023 - Body Lotion 200ml NEW 18.40 11 2 x 023 - Classic Collection 30ml 30.60 FV/39531/2012/NL     van Pol-Thijssen Silvia 34.70 12 1 x 110 - Classic Collection 50ml 15.30 13 1 x N003 - Nagellak HOT RED 7.00 14 1 x P103 - Peeling CHERRY BLOSSOM 12.40 Aantal: 21 Totaal: 258.05 € 1.17.4564.29482 1/1        "

私がやろうとしたことは、各行を読んで、これが注文行かどうかを判断することでした。そうであれば、データベースに入れる必要がありました。

1 つの注文行は次のようになります

2 1 x KR01 - Eye Pencil DECADENCE BLACK 6.00

これは次のように読むことができます。注文行番号 2、数量 1 製品 KR01 説明アイペンシルデカデンスブラック、価格 6.00

この長い文字列を読み取り、正しい注文行で区切る簡単な方法はありますか?

お返事をありがとうございます

今までの私のコードは次のとおりです。

/*
 * To change this template, choose Tools | Templates
 * and open the template in the editor.
 */
package part4.chapter15;

import com.itextpdf.text.pdf.PdfArray;
import com.itextpdf.text.pdf.PdfName;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfReaderContentParser;
import com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy;
import com.itextpdf.text.pdf.parser.TextExtractionStrategy;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.PrintWriter;

public class ExtractPageContent {

    /** The original PDF that will be parsed. */
    public static final String PREFACE = "C:/Users/marcel/Documents/FM/NL/FMPoint        /Kassa_voorraad_software/PDF-Itext/PDF_Results_Import_Files/small.pdf" ;
    /** The resulting text file. */
    public static final String RESULT = "C:/Users/marcel/Documents/FM/NL/FMPoint        /Kassa_voorraad_software/PDF-Itext/PDF_Results_Import_Files/sample-            result.txt" ;

    /**
     * Parses a PDF to a plain text file.
     * @param pdf the original PDF
     * @param txt the resulting text
 * @throws IOException
 */
public void parsePdf(String pdf, String txt) throws IOException {

        /** Putting result in Array, to be able extract to Table */
        PdfArray array;

        PdfReader reader = new PdfReader(pdf);
        PdfReaderContentParser parser = new PdfReaderContentParser(reader);
        PrintWriter out = new PrintWriter(new FileOutputStream(txt));
        TextExtractionStrategy strategy;
        for (int i = 1; i <= reader.getNumberOfPages(); i++) {
            strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
            String str = strategy.getResultantText();
            CharSequence FindPage = "Lp. Rekening Totaal"; 
            if  (str.contains(FindPage)){ 
              out.println(strategy.getResultantText());
        }
        }
        out.flush();
        out.close();
    }

    /**
     * Main method.
     * @param    args    no arguments needed
     * @throws IOException
     */
    public static void main(String[] args) throws IOException {
        new ExtractPageContent().parsePdf(PREFACE, RESULT);
    }

}

score 3 · Accepted Answer

regexこれを解決するために、さまざまな方法でを設計できます。これが1つです：

    String pdf = "LEVERINGSBON 30/06/2012 27828/2012/NL/WebShop   Distributeur ID nummer: 15099191 Uw distributeur: Klant Naam: FM Point Marcel Snoeck Adres: Zonnedauw 17 5953MS Reuver Telefoon: +31654317017 E-MAIL: yvonneenmarcel@home.nl Opmerking: - Lp. Rekening Totaal FV/39525/2012/NL     vd Wal Sandra 72.00 1 3 x 354 - Luxury Collection 50ml NEW! 72.00 FV/39526/2012/NL     Slaats Tim 6.00 2 1 x KR01 - Eye Pencil DECADENCE BLACK 6.00 FV/39527/2012/NL     Nabben Britt 44.95 3 3 x E013 - Krachtreiniger 1000ml 24.75 4 2 x E016 -Tapijtreiniger 1000ml 9.20 5 1 x 3 Step Mascara PERFECT BLACK 11.00 FV/39528/2012/NL     Nabben Lieke 32.00 6 1 x 192 - Luxury Collection 50ml 21.00 7 1 x 3 Step Mascara PERFECT BLACK 11.00 FV/39529/2012/NL     Claessens Patrick 12.40 8 1 x P101 - Peeling VERBENA 12.40 FV/39530/2012/NL     Smits Yolanda 56.00 9 1 x E006 - Wasmiddel VIVID COLOURS 1000ml 7.00 10 2 x B023 - Body Lotion 200ml NEW 18.40 11 2 x 023 - Classic Collection 30ml 30.60 FV/39531/2012/NL     van Pol-Thijssen Silvia 34.70 12 1 x 110 - Classic Collection 50ml 15.30 13 1 x N003 - Nagellak HOT RED 7.00 14 1 x P103 - Peeling CHERRY BLOSSOM 12.40 Aantal: 21 Totaal: 258.05 € 1.17.4564.29482 1/1        ";
    String patternString = "\\d\\s\\d\\sx.*?\\d\\.\\d\\d";
    Matcher matcher = Pattern.compile(patternString).matcher(pdf);
    List<String> dataRows = new ArrayList<String>();
    while (matcher.find()) {
        dataRows.add(matcher.group());
    }
    System.out.println(dataRows);

正規表現の説明
\\d\\s\\d\\sx::一致する数字、スペース、数字、スペース、'x'
.*?：任意の数の任意の文字に一致しますが、貪欲ではないものに一致しますなぜこれが重要なのですか？ \\d\.\\d\\d：最後の数値を小数点以下2桁と一致させ
ます。これは、データの変化に応じて調整する必要がありますが、開始点としては適切です。

文字列の代わりにカスタムデータ構造のリストが必要な場合は、次のように一致の個々の部分を取得できます。

...  
String patternString = "(\\d)\\s(\\d)\\sx.*?\\d\\.\\d\\d";
...
while (matcher.find()) {
    MyDataObj m = new MyDataObj();
    m.setSomeField(dataRows.add(matcher.group(1)));
    m.setAnotherField(dataRows.add(matcher.group(2)));
}

パラテンシスに保持したいすべての値をパターンで囲みmatcher.group(1)、matcher.group(2)などを使用してそれらを取得するだけです（matcher.group(0)一致全体が得られます）

score 0 · Accepted Answer

回答の結果は素晴らしいです次のコードの結果は次のとおりです。

/*
 * To change this template, choose Tools | Templates
 * and open the template in the editor.
 */
package part4.chapter15;

import com.itextpdf.text.pdf.PdfArray;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfReaderContentParser;
import com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy;
import com.itextpdf.text.pdf.parser.TextExtractionStrategy;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class ExtractPageContent {

/** The original PDF that will be parsed. */
    public static final String PREFACE = "C:/Users/marcel/Documents/FM/NL/FMPoint/Kassa_voorraad_software/PDF-Itext/PDF_Results_Import_Files/big.pdf" ;
    /** The resulting text file. */
    public static final String RESULT = "C:/Users/marcel/Documents/FM/NL/FMPoint/Kassa_voorraad_software/PDF-Itext/PDF_Results_Import_Files/sample-result.txt" ;

    /**
     * Parses a PDF to a plain text file.
     * @param pdf the original PDF
     * @param txt the resulting text
     * @throws IOException
     */
    public void parsePdf(String pdf, String txt) throws IOException {

        /** Putting result in Array, to be able extract to Table */
        PdfArray array;

        PdfReader reader = new PdfReader(pdf);
        PdfReaderContentParser parser = new PdfReaderContentParser(reader);
        PrintWriter out = new PrintWriter(new FileOutputStream(txt));
        TextExtractionStrategy strategy;
        for (int i = 1; i <= reader.getNumberOfPages(); i++) {
            strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
            String str = strategy.getResultantText();
            CharSequence FindPage = "Lp. Rekening Totaal"; 
            if  (str.contains(FindPage)){ 
/*                Pattern re =  Pattern.compile("(\\d+)\\s(\\d+)(\\xA0)x(\\xA0)(.*?)(\\d+\\.\\d{2})"); */
                /* Pattern for orders of Artikels with product Code */
                Pattern re2 =  Pattern.compile("(\\d+)\\s(\\d+)(\\xA0)x(\\xA0)(\\w+)(\\xA0)-\\s(.*?)(\\d+\\.\\d{2})"); 
                Matcher m = re2.matcher(str);
                int mIdx = 0;
                while (m.find()){
                    for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
                        /*System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));*/
                        System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
                    }
                    mIdx++;
                }

/**     System.out.println(dataRows); */

          out.println(strategy.getResultantText());
    }
    }
    out.flush();
    out.close();
}


/**
 * Main method.
 * @param    args    no arguments needed
 * @throws IOException
 */
public static void main(String[] args) throws IOException {
    new ExtractPageContent().parsePdf(PREFACE, RESULT);
}

}

出力結果は次のようになります。

完全な注文行[0][0]= 4 3x023-クラシックコレクション30ml45.90

行番号[0][1]= 4

数量[0][2]= 3

空[0][3]=

空[0][4]=

製品コード[0][5]= 023

空[0][6]=

製品の説明[0][7]=クラシックコレクション30ml

価格[0][8]= 45.90

[1] [0] = 5 2xC052-ハンドクリームとネイルクリーム100mlNEW15.20

[1] [1] = 5

[1] [2] = 2

[1] [3] =

[1] [4] =

[1] [5] = C052

[1] [6] =

[1][7]=ハンドクリームとネイルクリーム100mlNEW

[1] [8] = 15.20

この素晴らしいサポートをありがとう

java - 長い文字列を配列またはデータベース フィールドに変換するにはどうすればよいですか?

2 に答える 2

Related

Reference

java - 長い文字列を配列またはデータベースフィールドに変換するにはどうすればよいですか?