java - このような正規表現は、これらのテキスト行で機能しますか?

Question

正規表現:

String regexp = "([0-9.]{1,15})[ \t]*([0-9]{1,15})[ \t]*([0-9.]{1,15})[ \t]*(\"(.*?)\"\\s+\\((\\d{4})\\)\\s+\\{(.*?)\\})";

文章：

1000000103 50 4.5 #1 シングル (2006)
2...1.2.12 8 2.7 $1,000,000 チャンス・オブ・ア・ライフタイム (1986)
11..2.2..2 8 5.0 $100 タクシー (2001)
....13.311 9 7.1 $100,000 Name That Tune (1984)
3..21...22 10 4.6 $2 ビル (2002)
30010....3 18 2.7 2,500 万ドルのデマ (2004)
2000010002 111 5.6 1日40ドル (2002年)
2000000..4 26 1.6 $5 カバー (2009)
.0..2.0122 15 7.8 $9.99 (2003)
..2...1113 8 7.5 $weepstake$ (1979)
0000000125 3238 8.7 アロアロ！(1982)
1....22.12 8 6.5 アロアロ！(1982) {空軍兵でいっぱいのバレル (#7.7)

Java と MySQL を一緒に使用しようとしています。私が計画しているプロジェクトのためにそれを学んでいます。目的の出力を次のようにしたい:

distribution = first column
rank = second column
votes = thirst column 
title = fourth column

最初の 3 つは問題なく動作します。4枚目で困っています。

これは、最初のいくつかのエントリがうまく貼り付けられていないようなもので、さらにいくつかのエントリを貼り付けると、私があなたに見せようとしていることを理解しやすくなるかもしれません。だからここにある：

0...001122 16 7.8 「アロアロアロ！」(1982) {グルーバーはいくつかのミンチを行います (#3.2)}
100..01103 21 7.4 「アロアロアロ！」(1982) {ハンス・ゴーズ・オーバー・ザ・トップ (#4.1)}
....022100 11 6.9 「『アロ』アロ！」(1982) {こんにちはハンス (#7.4)}
0....03022 21 8.4 「アロアロアロ！」(1982) {Herr Flick's Revenge (#2.6)}
……8..1 6 7.0 「『アロ』アロ！」(1982) {ヒトラーの最後のハイル (#8.3)}
.....442.. 5 6.5 "'Allo'Allo!" (1982) {諜報員 (#6.5)}
....1123.2 9 6.9 "'Allo'Allo!" (1982) {It's Raining Italians (#6.2)}
....1.33.3 10 7.8 「『アロ』アロ！」(1982) {ルクレール・アゲインスト・ザ・ウォール (#5.18)}
....22211. 8 6.4 「アロアロアロ！」(1982) {通信回線 (#7.5)}

私が使用しているコード:

  stmt.executeUpdate("CREATE TABLE mytable(distribution char(20)," +
      "votes integer," + "rank float," + "title char(250));");
  String regexp ="([\\d\\.]+)\\s+(\\d+)\\s+([\\d\\.]+)\\s+(.*?\\s+\\(\\d{4}\\).*)";
  Pattern pattern = Pattern.compile(regexp);
  String line;
  String data= "";
  while ((line = bf.readLine()) != null) {
    data = line.replaceAll("'", " ");
    String data2 = data.replaceAll("\"", "");
    //System.out.println(data2);
    Matcher matcher = pattern.matcher(data2);
    if (matcher.find()) {
        String distribution = matcher.group(1);
        String votes = matcher.group(2);
        String rank = matcher.group(3);
        String title = matcher.group(4);
        //System.out.println(distribution + " " + votes + " " + rank + " " + title);
        String todo = ("INSERT into mytable " +
            "(Distribution, Votes, Rank, Title) "+
            "values ('"+distribution+"', '"+votes+"', '"+rank+"', '"+title+"')");
        stmt = con.createStatement();
        int r = stmt.executeUpdate(todo);
    }
  }

score 3 · Accepted Answer

3

/Allo Allo! \(1982\) \{A Barrel Full of Airmen \(\#7\.7\)\}/

于 2010-03-02T03:31:45.870 に答える

score 2 · Accepted Answer

代わりに分割を使用して、タブで分割することはできますか? または、opencsv ライブラリを入手して使用します。

おそらく次のようなもの

....

String[] temp;
String the_line;
BufferedReader in = new BufferedReader(new FileReader("file.txt")); 

while ((the_line = in.readLine()) != null)
{
    temp = the_line.split("\t");
    ....
}

....

score 1 · Accepted Answer

これを試して

        BufferedReader reader = new BufferedReader(new FileReader("yourFile"));

        Pattern p = Pattern.compile("([0-9\\.]+)[\\s]+([0-9]+)[\\s]+([0-9]\\.[0-9])[\\s]+([^\\s].*$)");

        String line;
        while( (line = reader.readLine()) != null ) {
            Matcher m = p.matcher(line);
            if ( m.matches() ) {
                 System.out.println(m.group(1));
                 System.out.println(m.group(2));
                 System.out.println(m.group(3));
                 System.out.println(m.group(4));
            }

        }

3番目のグループが1桁だけであると仮定します。そして1桁だけ

score 1 · Accepted Answer

プログラミングの第 1 のルールを覚えておいてください: シンプルに保ちましょう! なぜ正規表現が本当に必要なのですか?

あなたはうまく定義された表形式を持っているように思えます...それはtsvにありますか？

そうでない場合は、行ごとに読み取り、最初の 3 列のスペースに基づいて分割すると、最後の列のみを解析するために正規表現が必要になります。

score 0 · Accepted Answer

多分： [a-zA-Z ]+\!\(\d{4}\) \{[a-zA-Z0-9 \(\)\#\.]+\}

あなたが達成しようとしていることがわからないので、これはちょっとした推測です...

より良いヘルプを得るには、より詳細な情報を提供する必要があります: いくつかの例の行、これはどのような種類のデータですか? 一致が必要ですか、それとも特定のキャプチャグループが必要ですか?

score 0 · Accepted Answer

いいえ、そうではありません。

[ \t][ \t]+またはになる必要が\s+あります。あなたの数字は、サンプル入力でスペースを使用して右揃えにされています (存在する場合はタブに加えて)
バックスラッシュは、文字列リテラル内で二重エスケープする必要があります

タイトルの結果を試してみたい"'Allo 'Allo"場合Title = Allo Allo! (1982) {Lines of Communication (#7.5)}：

pattern = "([0-9\\.]+)[ \\t]+([0-9]+)[ \\t]+([0-9\\.]+)[ \\t]+(.*?[ \\t]+\\([0-9]{4}\\).*)";

または（ファドリアンが提案したように簡略化）：

pattern = "([\\d\\.]+)\\s+(\\d+)\\s+([\\d\\.]+)\\s+(.*?\\s+\\(\\d{4}\\).*)";

バックスラッシュ、エスケープ、および引用について詳しくは、 Patternjavadoc ページのその名前のセクションを参照してください。

score 0 · Accepted Answer

テキストの解析に正規表現を使用しないでください。正規表現は、パーツ/コンポーネントのテキストを解析するのではなく、テキストのパターンを照合することを目的としています。

あなたの質問のテキストファイルの例が実際の変更されていない例である場合、「パーサー」の次の基本的なキックオフの例は機能するはずです（おまけとして、必要なJDBCコードも即座に実行します）。あなたのデータを変更せずににコピーペーストしましたc:\test.txt。

public static void main(String... args) throws Exception {
    final String SQL = "INSERT INTO movie (distribution, votes, rank, title) VALUES (?, ?, ?, ?)";
    Connection connection = null;
    PreparedStatement statement = null;
    BufferedReader reader = null;        

    try {
        connection = database.getConnection();
        statement = connection.prepareStatement(SQL);
        reader = new BufferedReader(new InputStreamReader(new FileInputStream("/test.txt")));

        // Loop through file.
        for (String line; (line = reader.readLine()) != null;) {
            if (line.isEmpty()) continue; // I am not sure if those odd empty lines belongs in your file, else this if-check can be removed.

            // Gather data from lines.
            String distribution = line.substring(0, 10);
            int votes = Integer.parseInt(line.substring(12, 18).trim());
            double rank = Double.parseDouble(line.substring(20, 24).trim());
            String title = line.substring(26).trim().replace("\"", ""); // You also want to get rid of those double quotes, huh? I am however not sure why, maybe you initially had problems with it in your non-prepared SQL string...

            // Just to show what you've gathered.
            System.out.printf("%s, %5d, %.1f, %s%n", distribution, votes, rank, title);

            // Now add batch to statement.
            statement.setString(1, distribution);
            statement.setInt(2, votes);
            statement.setDouble(3, rank);
            statement.setString(4, title);
            statement.addBatch();
        }

        // Execute batch insert!
        statement.executeBatch();
    } finally {
        // Gently close expensive resources, you don't want to leak them!
        if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
        if (statement != null) try { statement.close(); } catch (SQLException logOrIgnore) {}
        if (connection != null) try { connection.close(); } catch (SQLException logOrIgnore) {}
    }
}

ほら、それはうまくいきます。複雑すぎる正規表現は必要ありません。

score 0 · Accepted Answer

これは、やりたいことを行うためのはるかに単純な正規表現です

([\d\.]*)\s*([\d\.]*)\s*([\d\.]*)\s*(.*)

\s* だけでなく、行末の空白にも対応する必要がある場合

([\d\.]*)\s*([\d\.]*)\s*([\d\.]*)\s*(.*)\s*

[\d.] の代わりに \S を使用する小さな間違いを修正しました。

java - このような正規表現は、これらのテキスト行で機能しますか?

8 に答える 8

Related

Reference