algorithm - 誰もが良いProper Caseアルゴリズムを持っていますか

Question

信頼できる Proper Case または PCase アルゴリズム (UCase または Upper に類似) を持っている人はいますか? "GEORGE BURDELL"orなどの値を取り、"george burdell"それをに変換するものを探しています"George Burdell"。

単純なケースを処理する単純なものがあります。"O'REILLY"などを処理してに変換できるものが理想ですが"O'Reilly"、それは難しいと思います。

それが物事を単純化するのであれば、私は主に英語に焦点を当てています.

更新:私は言語として C# を使用していますが、ほぼすべてのものから変換できます (同様の機能が存在すると仮定します)。

マクドナルドのシナリオが難しいものであることには同意します。私の O'Reilly の例と一緒にそれについて言及するつもりでしたが、元の投稿では言及しませんでした。

score 17 · Accepted Answer

私があなたの質問を誤解していない限り、自分で作成する必要はないと思います.TextInfoクラスがあなたのためにそれを行うことができます.

using System.Globalization;

CultureInfo.InvariantCulture.TextInfo.ToTitleCase("GeOrGE bUrdEll")

「ジョージ・バーデルが戻ってきます。特別なルールが関係する場合は、独自の文化を使用できます。

更新: マイケル(この回答へのコメント) は、メソッドが頭字語であると想定するため、入力がすべて大文字の場合、これは機能しないと指摘しました。これに対する単純な回避策は、ToTitleCase に送信する前にテキストを .ToLower() することです。

score 11 · Accepted Answer

@zwol : 別の返信として投稿します。

ljsの投稿に基づく例を次に示します。

void Main()
{
    List<string> names = new List<string>() {
        "bill o'reilly", 
        "johannes diderik van der waals", 
        "mr. moseley-williams", 
        "Joe VanWyck", 
        "mcdonald's", 
        "william the third", 
        "hrh prince charles", 
        "h.r.m. queen elizabeth the third",
        "william gates, iii", 
        "pope leo xii",
        "a.k. jennings"
    };
    
    names.Select(name => name.ToProperCase()).Dump();
}

// http://stackoverflow.com/questions/32149/does-anyone-have-a-good-proper-case-algorithm
public static class ProperCaseHelper
{
    public static string ToProperCase(this string input)
    {
        if (IsAllUpperOrAllLower(input))
        {
            // fix the ALL UPPERCASE or all lowercase names
            return string.Join(" ", input.Split(' ').Select(word => wordToProperCase(word)));
        }
        else
        {
            // leave the CamelCase or Propercase names alone
            return input;
        }
    }

    public static bool IsAllUpperOrAllLower(this string input)
    {
        return (input.ToLower().Equals(input) || input.ToUpper().Equals(input));
    }

    private static string wordToProperCase(string word)
    {
        if (string.IsNullOrEmpty(word)) return word;

        // Standard case
        string ret = capitaliseFirstLetter(word);

        // Special cases:
        ret = properSuffix(ret, "'");   // D'Artagnon, D'Silva
        ret = properSuffix(ret, ".");   // ???
        ret = properSuffix(ret, "-");       // Oscar-Meyer-Weiner
        ret = properSuffix(ret, "Mc", t => t.Length > 4);      // Scots
        ret = properSuffix(ret, "Mac", t => t.Length > 5);     // Scots except Macey

        // Special words:
        ret = specialWords(ret, "van");     // Dick van Dyke
        ret = specialWords(ret, "von");     // Baron von Bruin-Valt
        ret = specialWords(ret, "de");
        ret = specialWords(ret, "di");
        ret = specialWords(ret, "da");      // Leonardo da Vinci, Eduardo da Silva
        ret = specialWords(ret, "of");      // The Grand Old Duke of York
        ret = specialWords(ret, "the");     // William the Conqueror
        ret = specialWords(ret, "HRH");     // His/Her Royal Highness
        ret = specialWords(ret, "HRM");     // His/Her Royal Majesty
        ret = specialWords(ret, "H.R.H.");  // His/Her Royal Highness
        ret = specialWords(ret, "H.R.M.");  // His/Her Royal Majesty

        ret = dealWithRomanNumerals(ret);   // William Gates, III

        return ret;
    }

    private static string properSuffix(string word, string prefix, Func<string, bool> condition = null)
    {
        if (string.IsNullOrEmpty(word)) return word;
        if (condition != null && ! condition(word)) return word;
        
        string lowerWord = word.ToLower();
        string lowerPrefix = prefix.ToLower();

        if (!lowerWord.Contains(lowerPrefix)) return word;

        int index = lowerWord.IndexOf(lowerPrefix);

        // If the search string is at the end of the word ignore.
        if (index + prefix.Length == word.Length) return word;

        return word.Substring(0, index) + prefix +
            capitaliseFirstLetter(word.Substring(index + prefix.Length));
    }

    private static string specialWords(string word, string specialWord)
    {
        if (word.Equals(specialWord, StringComparison.InvariantCultureIgnoreCase))
        {
            return specialWord;
        }
        else
        {
            return word;
        }
    }

    private static string dealWithRomanNumerals(string word)
    {
        // Roman Numeral parser thanks to [djk](https://stackoverflow.com/users/785111/djk)
        // Note that it excludes the Chinese last name Xi
        return new Regex(@"\b(?!Xi\b)(X|XX|XXX|XL|L|LX|LXX|LXXX|XC|C)?(I|II|III|IV|V|VI|VII|VIII|IX)?\b", RegexOptions.IgnoreCase).Replace(word, match => match.Value.ToUpperInvariant());
    }

    private static string capitaliseFirstLetter(string word)
    {
        return char.ToUpper(word[0]) + word.Substring(1).ToLower();
    }

}

score 4 · Accepted Answer

タイトルの大文字と小文字を区別するためのこのきちんとした Perl スクリプトもあります。

http://daringfireball.net/2008/08/title_case_update

#!/usr/bin/perl

#     This filter changes all words to Title Caps, and attempts to be clever
# about *un*capitalizing small words like a/an/the in the input.
#
# The list of "small words" which are not capped comes from
# the New York Times Manual of Style, plus 'vs' and 'v'. 
#
# 10 May 2008
# Original version by John Gruber:
# http://daringfireball.net/2008/05/title_case
#
# 28 July 2008
# Re-written and much improved by Aristotle Pagaltzis:
# http://plasmasturm.org/code/titlecase/
#
#   Full change log at __END__.
#
# License: http://www.opensource.org/licenses/mit-license.php
#


use strict;
use warnings;
use utf8;
use open qw( :encoding(UTF-8) :std );


my @small_words = qw( (?<!q&)a an and as at(?!&t) but by en for if in of on or the to v[.]? via vs[.]? );
my $small_re = join '|', @small_words;

my $apos = qr/ (?: ['’] [[:lower:]]* )? /x;

while ( <> ) {
  s{\A\s+}{}, s{\s+\z}{};

  $_ = lc $_ if not /[[:lower:]]/;

  s{
      \b (_*) (?:
          ( (?<=[ ][/\\]) [[:alpha:]]+ [-_[:alpha:]/\\]+ |   # file path or
            [-_[:alpha:]]+ [@.:] [-_[:alpha:]@.:/]+ $apos )  # URL, domain, or email
          |
          ( (?i: $small_re ) $apos )                         # or small word (case-insensitive)
          |
          ( [[:alpha:]] [[:lower:]'’()\[\]{}]* $apos )       # or word w/o internal caps
          |
          ( [[:alpha:]] [[:alpha:]'’()\[\]{}]* $apos )       # or some other word
      ) (_*) \b
  }{
      $1 . (
        defined $2 ? $2         # preserve URL, domain, or email
      : defined $3 ? "\L$3"     # lowercase small word
      : defined $4 ? "\u\L$4"   # capitalize word w/o internal caps
      : $5                      # preserve other kinds of word
      ) . $6
  }xeg;


  # Exceptions for small words: capitalize at start and end of title
  s{
      (  \A [[:punct:]]*         # start of title...
      |  [:.;?!][ ]+             # or of subsentence...
      |  [ ]['"“‘(\[][ ]*     )  # or of inserted subphrase...
      ( $small_re ) \b           # ... followed by small word
  }{$1\u\L$2}xig;

  s{
      \b ( $small_re )      # small word...
      (?= [[:punct:]]* \Z   # ... at the end of the title...
      |   ['"’”)\]] [ ] )   # ... or of an inserted subphrase?
  }{\u\L$1}xig;

  # Exceptions for small words in hyphenated compound words
  ## e.g. "in-flight" -> In-Flight
  s{
      \b
      (?<! -)                 # Negative lookbehind for a hyphen; we don't want to match man-in-the-middle but do want (in-flight)
      ( $small_re )
      (?= -[[:alpha:]]+)      # lookahead for "-someword"
  }{\u\L$1}xig;

  ## # e.g. "Stand-in" -> "Stand-In" (Stand is already capped at this point)
  s{
      \b
      (?<!…)                  # Negative lookbehind for a hyphen; we don't want to match man-in-the-middle but do want (stand-in)
      ( [[:alpha:]]+- )       # $1 = first word and hyphen, should already be properly capped
      ( $small_re )           # ... followed by small word
      (?! - )                 # Negative lookahead for another '-'
  }{$1\u$2}xig;

  print "$_";
}

__END__

しかし、適切なケースでは、人の名前のみを意味するように聞こえます。

score 2 · Accepted Answer

私が取り組んでいるアプリに実装するために、今日これを書きました。このコードはコメント付きでかなり自明だと思います。すべてのケースで 100% 正確というわけではありませんが、西洋の名前のほとんどを簡単に処理できます。

例:

mary-jane => Mary-Jane

o'brien => O'Brien

Joël VON WINTEREGG => Joël von Winteregg

jose de la acosta => Jose de la Acosta

このコードは、必要に応じて上部の配列に任意の文字列値を追加できるという点で拡張可能です。それを調べて、必要な特別な機能を追加してください。

function name_title_case($str)
{
  // name parts that should be lowercase in most cases
  $ok_to_be_lower = array('av','af','da','dal','de','del','der','di','la','le','van','der','den','vel','von');
  // name parts that should be lower even if at the beginning of a name
  $always_lower   = array('van', 'der');

  // Create an array from the parts of the string passed in
  $parts = explode(" ", mb_strtolower($str));

  foreach ($parts as $part)
  {
    (in_array($part, $ok_to_be_lower)) ? $rules[$part] = 'nocaps' : $rules[$part] = 'caps';
  }

  // Determine the first part in the string
  reset($rules);
  $first_part = key($rules);

  // Loop through and cap-or-dont-cap
  foreach ($rules as $part => $rule)
  {
    if ($rule == 'caps')
    {
      // ucfirst() words and also takes into account apostrophes and hyphens like this:
      // O'brien -> O'Brien || mary-kaye -> Mary-Kaye
      $part = str_replace('- ','-',ucwords(str_replace('-','- ', $part)));
      $c13n[] = str_replace('\' ', '\'', ucwords(str_replace('\'', '\' ', $part)));
    }
    else if ($part == $first_part && !in_array($part, $always_lower))
    {
      // If the first part of the string is ok_to_be_lower, cap it anyway
      $c13n[] = ucfirst($part);
    }
    else
    {
      $c13n[] = $part;
    }
  }

  $titleized = implode(' ', $c13n);

  return trim($titleized);
}

score 1 · Accepted Answer

どのプログラミング言語を使用していますか? 多くの言語では、正規表現の一致に対してコールバック関数を使用できます。これらは、一致を簡単に適切にケース化するために使用できます。使用される正規表現は非常に単純です。次のように、すべての単語文字に一致する必要があります。

/\w+/

または、最初の文字を余分な一致として抽出することもできます。

/(\w)(\w*)/

これで、マッチの最初の文字と後続の文字に別々にアクセスできるようになりました。コールバック関数は、ヒットの連結を単純に返すことができます。疑似 Python (私は実際には Python を知りません):

def make_proper(match):
    return match[1].to_upper + match[2]

ちなみに、これは「O'Reilly」のケースも処理します。これは、「O」と「Reilly」が別々に一致し、両方が適切に区別されるためです。ただし、「McDonald's」や一般的にアポストロフィ付きの単語など、アルゴリズムで適切に処理されない他の特殊なケースがあります。後者の場合、アルゴリズムは「Mcdonald'S」を生成します。アポストロフィの特別な処理を実装できますが、それは最初のケースに干渉します。理論的な完全な解決策を見つけることは不可能です。実際には、アポストロフィの後の部分の長さを考慮すると役立つ場合があります。

score 1 · Accepted Answer

これはおそらく単純な C# 実装です:-

public class ProperCaseHelper {
  public string ToProperCase(string input) {
    string ret = string.Empty;

    var words = input.Split(' ');

    for (int i = 0; i < words.Length; ++i) {
      ret += wordToProperCase(words[i]);
      if (i < words.Length - 1) ret += " ";
    }

    return ret;
  }

  private string wordToProperCase(string word) {
    if (string.IsNullOrEmpty(word)) return word;

    // Standard case
    string ret = capitaliseFirstLetter(word);

    // Special cases:
    ret = properSuffix(ret, "'");
    ret = properSuffix(ret, ".");
    ret = properSuffix(ret, "Mc");
    ret = properSuffix(ret, "Mac");

    return ret;
  }

  private string properSuffix(string word, string prefix) {
    if(string.IsNullOrEmpty(word)) return word;

    string lowerWord = word.ToLower(), lowerPrefix = prefix.ToLower();
    if (!lowerWord.Contains(lowerPrefix)) return word;

    int index = lowerWord.IndexOf(lowerPrefix);

    // If the search string is at the end of the word ignore.
    if (index + prefix.Length == word.Length) return word;

    return word.Substring(0, index) + prefix +
      capitaliseFirstLetter(word.Substring(index + prefix.Length));
  }

  private string capitaliseFirstLetter(string word) {
    return char.ToUpper(word[0]) + word.Substring(1).ToLower();
  }
}

score 0 · Accepted Answer

クロノス、ありがとう。私はあなたの関数で次の行を見つけました：

`if (!lowerWord.Contains(lowerPrefix)) return word`;

言わなければならない

if (!lowerWord.StartsWith(lowerPrefix)) return word;

したがって、「información」は「InforMacIón」に変更されません

一番、

エンリケ

score 0 · Accepted Answer

各単語の最初の文字を大文字にする簡単な方法 (スペースで区切る)

$words = explode(” “, $string);
for ($i=0; $i<count($words); $i++) {
$s = strtolower($words[$i]);
$s = substr_replace($s, strtoupper(substr($s, 0, 1)), 0, 1);
$result .= “$s “;
}
$string = trim($result);

あなたが与えた「O'REILLY」の例をキャッチするという点では、両方のスペースで文字列を分割し、 ' は、アポストロフィの後に現れる文字、つまりフレッドの s を大文字にするため、機能しません

だから私はおそらく次のようなことを試すだろう

$words = explode(” “, $string);
for ($i=0; $i<count($words); $i++) {

$s = strtolower($words[$i]);

if (substr($s, 0, 2) === "o'"){
$s = substr_replace($s, strtoupper(substr($s, 0, 3)), 0, 3);
}else{
$s = substr_replace($s, strtoupper(substr($s, 0, 1)), 0, 1);
}
$result .= “$s “;
}
$string = trim($result);

これは、O'Reilly、O'Clock、O'Donnell などをキャッチするはずです。

このコードはテストされていないことに注意してください。

score 0 · Accepted Answer

ここにはたくさんの良い答えがあります。私のものは非常に単純で、組織内の名前のみを考慮に入れています。必要に応じて拡張できます。これは完全な解決策ではなく、vancouver を VanCouver に変更しますが、これは誤りです。ですので、使用する場合は微調整してください。

これがC＃での私の解決策です。これにより名前がプログラムにハードコードされますが、少し手を加えれば、テキストファイルをプログラムの外部に保持し、名前の例外 (つまり、Van、Mc、Mac) を読み込んでそれらをループすることができます。

public static String toProperName(String name)
{
    if (name != null)
    {
        if (name.Length >= 2 && name.ToLower().Substring(0, 2) == "mc")  // Changes mcdonald to "McDonald"
            return "Mc" + Regex.Replace(name.ToLower().Substring(2), @"\b[a-z]", m => m.Value.ToUpper());

        if (name.Length >= 3 && name.ToLower().Substring(0, 3) == "van")  // Changes vanwinkle to "VanWinkle"
            return "Van" + Regex.Replace(name.ToLower().Substring(3), @"\b[a-z]", m => m.Value.ToUpper());

        return Regex.Replace(name.ToLower(), @"\b[a-z]", m => m.Value.ToUpper());  // Changes to title case but also fixes 
                                                                                   // appostrophes like O'HARE or o'hare to O'Hare
    }

    return "";
}

score 0 · Accepted Answer

これをテキストボックスの textchanged イベントハンドラとして使用します。「マクドナルド」の参入支援

Public Shared Function DoProperCaseConvert(ByVal str As String, Optional ByVal allowCapital As Boolean = True) As String
    Dim strCon As String = ""
    Dim wordbreak As String = " ,.1234567890;/\-()#$%^&*€!~+=@"
    Dim nextShouldBeCapital As Boolean = True

    'Improve to recognize all caps input
    'If str.Equals(str.ToUpper) Then
    '    str = str.ToLower
    'End If

    For Each s As Char In str.ToCharArray

        If allowCapital Then
            strCon = strCon & If(nextShouldBeCapital, s.ToString.ToUpper, s)
        Else
            strCon = strCon & If(nextShouldBeCapital, s.ToString.ToUpper, s.ToLower)
        End If

        If wordbreak.Contains(s.ToString) Then
            nextShouldBeCapital = True
        Else
            nextShouldBeCapital = False
        End If
    Next

    Return strCon
End Function

score -1 · Accepted Answer

ソリューションを希望する言語については言及していないため、ここにいくつかの疑似コードを示します。

Loop through each character
    If the previous character was an alphabet letter
        Make the character lower case
    Otherwise
        Make the character upper case
End loop

algorithm - 誰もが良いProper Caseアルゴリズムを持っていますか

13 に答える 13

Related

Reference