java - javaで同等のg_ascii_strcasecmp

Question

g_ascii_strcasecmp関数を使用してソートされた単語のリストがあります。このリストをJavaで処理する必要があります。Javaの同等のソート機能は何ですか？二分探索を実装するには、正しい比較関数が必要です。これまでのところ、以下の関数がありますが、常に正しい結果が得られるとは限りません。

public int compareStrings(String str) {
    Collator collator = Collator.getInstance();//TODO: implement locale?
    return collator.compare(this.wordString, str);
}

アップデート。リストの例：「T、t、T'ai Chi Ch'uan、t'other、T-、T-bone、T-boneステーキ、T-ジャンクション、タバスコ、タバサラン、タビー」。

score 1 · Accepted Answer

Collator文字列の比較方法を制御できないため、Javadocを読んだ後は使用しません。ロケールを選択することはできますが、そのロケールがCollator文字列を比較する方法をどのように指示するかはあなたの手に負えません。

文字列内の文字がすべてASCII文字であることがわかっている場合はString.compareTo()、Unicode文字値に基づいて辞書式順序でソートするメソッドを使用します。文字列内のすべての文字がASCII文字である場合、それらのUnicode文字値はそれらのASCII値になるため、それらのUnicode値で辞書式順序で並べ替えるのは、ASCII値で辞書式順序で並べ替えるのと同じになりますg_ascii_stcasecmp。また、大文字と小文字を区別しない必要がある場合は、を使用できますString.compareToIgnoreCase()。

コメントで述べたように、独自の比較関数を作成する必要があると思います。ASCII範囲にない文字をスキップして、文字列内の文字をループする必要があります。したがって、このようなものは、単純で愚かな実装であり、私が想像するコーナーケースをカバーするために強化する必要がありますg_ascii_strcasecmp。

public int compareStrings(String str) {
    List<Character> myAsciiChars = onlyAsciiChars(this.wordString);
    List<Character> theirAsciiChars = onlyAsciiChars(str);

    if (myAsciiChars.size() > theirAsciiChars.size()) {
        return 1;
    }
    else if (myAsciiChars.size() < theirAsciiChars.size()) {
        return -1;
    }

    for (int i=0; i < myAsciiChars.size(); i++) {
        if (myAsciiChars.get(i) > theirAsciiChars.get(i)) {
            return 1;
        }
        else if (myAsciiChars.get(i) < theirAsciiChars.get(i)) {
            return -1;
        }
    }

    return 0;
}

private final static char MAX_ASCII_VALUE = 127; // (Or 255 if using extended ASCII)

private List<Character> onlyAsciiChars(String s) {
    List<Character> asciiChars = new ArrayList<>();
    for (char c : s.toCharArray()) {
        if (c <= MAX_ASCII_VALUE) {
            asciiChars.add(c);
        }
    }
    return asciiChars;
}

score 1 · Accepted Answer

私が思いついた方法を共有することにしました：

    /**
     * Compares two strings, ignoring the case of ASCII characters. It treats
     * non-ASCII characters taking in account case differences. This is an 
     * attempt to mimic glib's string utility function 
     * <a href="http://developer.gnome.org/glib/2.28/glib-String-Utility-Functions.html#g-ascii-strcasecmp">g_ascii_strcasecmp ()</a>.
     *
     * This is a slightly modified version of java.lang.String.CASE_INSENSITIVE_ORDER.compare(String s1, String s2) method.
     * 
     * @param str1  string to compare with str2
     * @param str2  string to compare with str1
     * @return      0 if the strings match, a negative value if str1 < str2, or a positive value if str1 > str2
     */
    private static int compareToIgnoreCaseASCIIOnly(String str1, String str2) {
        int n1 = str1.length();
        int n2 = str2.length();
        int min = Math.min(n1, n2);
        for (int i = 0; i < min; i++) {
            char c1 = str1.charAt(i);
            char c2 = str2.charAt(i);
            if (c1 != c2) {
                if ((int) c1 > 127 || (int) c2 > 127) { //if non-ASCII char
                    return c1 - c2;
                } else {
                    c1 = Character.toUpperCase(c1);
                    c2 = Character.toUpperCase(c2);
                    if(c1 != c2) {
                        c1 = Character.toLowerCase(c1);
                        c2 = Character.toLowerCase(c2);
                        if(c1 != c2) {
                            return c1 - c2;
                        }
                    }
                }
            }
        }
        return n1 - n2;
    }

java - javaで同等のg_ascii_strcasecmp

2 に答える 2

Related

Reference