java - カンマ区切りの文字列を分割し、引用符内のカンマを無視しますが、1 つの二重引用符を含む文字列を許可します

Question

コンマ区切り文字で文字列を分割する方法について、stackoverflow のいくつかの投稿を検索しましたが、引用符でコンマを分割しないでください (参照: 文字列をコンマで配列に分割するが、二重引用符内のコンマを無視する方法を参照してください) 。同様の結果を得るには、二重引用符を 1 つ含む文字列も許可する必要があります。

すなわち。"test05, \"test, 05\", test\", test 05"に分割する必要があります

test05
"test, 05"
test"
test 05

ここに記載されている方法と同様の方法を試しました：

一重引用符または二重引用符で囲まれていない場合にスペースを使用して文字列を分割するための正規表現

の代わりにMatcherを使用しsplit()ます。ただし、その特定の例は、コンマではなくスペースで分割されます。代わりに、コンマを考慮してパターンを調整しようとしましたが、うまくいきませんでした。

String str = "test05, \"test, 05\", test\", test 05";
str = str + " "; // add trailing space
int len = str.length();
Matcher m = Pattern.compile("((\"[^\"]+?\")|([^,]+?)),++").matcher(str);

for (int i = 0; i < len; i++)
{
    m.region(i, len);

    if (m.lookingAt())
    {
        String s = m.group(1);

        if ((s.startsWith("\"") && s.endsWith("\"")))
        {
            s = s.substring(1, s.length() - 1);
        }

        System.out.println(i + ": \"" + s + "\"");
        i += (m.group(0).length() - 1);
    }
}

score 1 · Accepted Answer

私はこれで同様の問題を抱えていましたが、良い.netソリューションが見つからなかったのでDIYしました。

私のアプリケーションでは、csv を解析しているので、分割資格情報は "," です。このメソッドは、単一の char split 引数がある場合にのみ機能すると思います。

そこで、二重引用符内のコンマを無視する関数を作成しました。入力文字列を文字配列に変換し、文字を文字ごとに解析することでそれを行います

public static string[] Splitter_IgnoreQuotes(string stringToSplit)
    {   
        char[] CharsOfData = stringToSplit.ToCharArray();
        //enter your expected array size here or alloc.
        string[] dataArray = new string[37];
        int arrayIndex = 0;
        bool DoubleQuotesJustSeen = false;          
        foreach (char theChar in CharsOfData)
        {
            //did we just see double quotes, and no command? dont split then. you could make ',' a variable for your split parameters I'm working with a csv.
            if ((theChar != ',' || DoubleQuotesJustSeen) && theChar != '"')
            {
                dataArray[arrayIndex] = dataArray[arrayIndex] + theChar;
            }
            else if (theChar == '"')
            {
                if (DoubleQuotesJustSeen)
                {
                    DoubleQuotesJustSeen = false;
                }
                else
                {
                    DoubleQuotesJustSeen = true;
                }
            }
            else if (theChar == ',' && !DoubleQuotesJustSeen)
            {
                arrayIndex++;
            }
        }
        return dataArray;
    }

この関数は、私のアプリケーションの好みでは、これらは不要で入力に存在するため、入力の ("") も無視します。

score 1 · Accepted Answer

正規表現が壊れるところまで来ました。

代わりに、必要に応じて特別なケースを処理する単純なスプリッターを作成することをお勧めします。テスト駆動開発は、これを行うのに最適です。

ただし、CSV 行を解析しようとしているようです。これに CSV ライブラリを使用することを検討しましたか?

score 0 · Accepted Answer

これを試して：

import java.util.regex.*;

public class Main {
  public static void main(String[] args) throws Exception {

    String text = "test05, \"test, 05\", test\", test 05";

    Pattern p = Pattern.compile(
        "(?x)          # enable comments                                      \n" +
        "(\"[^\"]*\")  # quoted data, and store in group #1                   \n" +
        "|             # OR                                                   \n" +
        "([^,]+)       # one or more chars other than ',', and store it in #2 \n" +
        "|             # OR                                                   \n" +
        "\\s*,\\s*     # a ',' optionally surrounded by space-chars           \n"
    );

    Matcher m = p.matcher(text);

    while (m.find()) {
      // get the match
      String matched = m.group().trim();

      // only print the match if it's group #1 or #2
      if(m.group(1) != null || m.group(2) != null) {
        System.out.println(matched);
      }
    }
  }
}

それtest05, "test, 05", test", test 05が生み出すのは：

test05
「テスト、05」
テスト"
テスト05

そしてtest05, "test 05", test", test 05それは以下を生み出します：

test05
「テスト05」
テスト"
テスト05

score 0 · Accepted Answer

本当に DIY する必要がない限り、Apache Commons クラス org.apache.commons.csv.CSVParser を検討する必要があります。

http://commons.apache.org/sandbox/csv/apidocs/org/apache/commons/csv/CSVParser.html

score 0 · Accepted Answer

このパターンに対して分割します。

(?<=\"?),(?!\")|(?<!\"),(?=\")

したがって、次のようになります。

String[] splitArray = subjectString.split("(?<=\"?),(?!\")|(?<!\"),(?=\")");

UPD: 質問ロジックの最近の変更によると、ネイキッドスプリットを使用しない方がよいでしょう。最初にコンマ内のテキストをコンマ内以外のテキストから分離し、次に最後のテキストで単純な split(",") を作成する必要があります。シンプルな for ループを使用して、出会った引用符の数を確認すると同時に、読み取った文字を StringBuffer に保存します。最初に文字を StringBuffer に保存し、引用符に出会うまで、StringBuffer を引用符で囲まれていない文字列を含む配列に入れます。次に、新しい StringBuffer を作成し、読み込んだ次の文字を保存します。2 番目のコンマに出会った後、新しい StringBuffer を停止して、コンマで囲まれた文字列を含む配列に入れます。文字列の最後まで繰り返します。したがって、2 つの配列があり、1 つはカンマで囲まれた文字列を含みます。コンマで囲まれていない文字列を持つその他。次に、2 番目の配列のすべての要素を分割する必要があります。

java - カンマ区切りの文字列を分割し、引用符内のカンマを無視しますが、1 つの二重引用符を含む文字列を許可します

5 に答える 5

Related

Reference