c# - 文字列内の重複する文字セットを削除する方法

Question

たとえば、文字列には次のものが含まれます (文字列は変数です)。

http://www.google.comhttp://www.google.com

ここで重複する URL を削除する最も効率的な方法は何でしょう - たとえば、出力は次のようになります。

http://www.google.com

score 2 · Accepted Answer

入力にはURLのみが含まれていると想定しています。

string input = "http://www.google.comhttp://www.google.com";

// this will get you distinct URLs but without "http://" at the beginning
IEnumerable<string> distinctAddresses = input
   .Split(new[] {"http://"}, StringSplitOptions.RemoveEmptyEntries)
   .Distinct();

StringBuilder output = new StringBuilder();
foreach (string distinctAddress in distinctAddresses)
{
   // when building the output, insert "http://" before each address so 
   // that it resembles the original
   output.Append("http://");
   output.Append(distinctAddress);
}

Console.WriteLine(output);

score 1 · Accepted Answer

文字列をリストに収集し、distinct を使用します。文字列に http アドレスがある場合は、正規表現http:.+?(?=((http:)|($))を適用できますRegexOptions.SingleLine

var distinctList = list.Distinct(StringComparer.CurrentCultureIgnoreCase).ToList();

score 1 · Accepted Answer

効率にはさまざまな定義があります。コードサイズ、合計実行時間、CPU 使用率、スペース使用率、コードを記述する時間などです。「効率的」になりたい場合は、これらのどれを目指しているかを知っておく必要があります。

私はこのようなことをします：

string url = "http://www.google.comhttp://www.google.com";
if (url.Length % 2 == 0)
{
    string secondHalf = url.Substring(url.Length / 2);
    if (url.StartsWith(secondHalf))
    {
        url = secondHalf;
    }
}

削除する必要がある重複の種類によっては、これが機能する場合と機能しない場合があります。

score 0 · Accepted Answer

文字列の長さがわからない場合、何かが double であるかどうかも、double とは何かもわかりません。

string yourprimarystring = "http://www.google.comhttp://www.google.com";
int firstCharacter;
string temp;
for(int i = 0; i <= yourprimarystring.length; i++)
{
  for(int j = 0; j <= yourprimarystring.length; j++)
  {
    string search = yourprimarystring.substring(i,j);
    firstCharacter = yourprimaryString.IndexOf(search);
    if(firstCharacter != -1)
    {
      temp = yourprimarystring.substring(0,firstCharacter) + yourprimarystring.substring(firstCharacter + j - i,yourprimarystring.length)
      yourprimarystring = temp;
    }
}

これはすべての要素を繰り返し処理し、最初の文字から最後の文字まですべて取り出し、次のように検索します。

ABCDA - A を検索すると、A が A を除外します。これが問題です。変数を可変にしたい場合は、複製に必要な時間を指定する必要がありますが、私のコードが役立つかもしれません。

c# - 文字列内の重複する文字セットを削除する方法

4 に答える 4

Related

Reference