string - 文字が繰り返されない最長の部分文字列を見つける方法は?

Question

繰り返し文字を含まない特定の文字列内の文字の最長部分文字列を見つけるアルゴリズムが必要です。特定の文字列のすべての部分文字列を考慮して、繰り返されない文字の数を計算する O(n*n) アルゴリズムを考えることができます。たとえば、一意の文字の最長部分文字列が BGAKG に対応する 5 文字である文字列 " AABGAKG " を考えてみましょう。

誰でもそれを行うためのより良い方法を提案できますか?

ありがとう

編集：自分の質問を他の人に適切に説明できないと思います。部分文字列に繰り返し文字を含めることができます (geeksforgeeks ソリューションが行うように、部分文字列にすべての個別の文字が必要というわけではありません)。私が見つけなければならないことは、部分文字列内の非繰り返し文字の最大数です（一部の文字が繰り返される場合があります）。

たとえば、文字列がAABGAKGIMNの場合、BGAKGIMNが解決策です。

score 1 · Accepted Answer

これはどう：

public static String getLongestSubstringNoRepeats( String string ){
    int iLongestSoFar = 0;
    int posLongestSoFar = 0;
    char charPrevious = 0;
    int xCharacter = 0;
    int iCurrentLength = 0;
    while( xCharacter < string.length() ){
        char charCurrent = string.charAt( xCharacter );
        iCurrentLength++;
        if( charCurrent == charPrevious ){
            if( iCurrentLength > iLongestSoFar ){
                iLongestSoFar = iCurrentLength;
                posLongestSoFar = xCharacter;
            }
            iCurrentLength = 1;
        }
        charPrevious = charCurrent;
        xCharacter++;
    }
    if( iCurrentLength > iLongestSoFar ){
        return string.substring( posLongestSoFar );
    } else {
        return string.substring( posLongestSoFar, posLongestSoFar + iLongestSoFar );
    }
}

score 1 · Accepted Answer

かなりトリッキーな質問ですが、C# に基づく O(n) ソリューションを紹介します。

public string MaxSubStringKUniqueChars(string source, int k) {

if (string.IsNullOrEmpty(source) || k > source.Length) return string.Empty;

        var start = 0;
        var ret = string.Empty;
        IDictionary<char, int> dict = new Dictionary<char, int>();

        for (var i = 0; i < source.Length; i++)
        {
            if (dict.ContainsKey(source[i]))
            {
                dict[source[i]] = 1 + dict[source[i]];
            }
            else
            {
                dict[source[i]] = 1;
            }

            if (dict.Count == k + 1)
            {
                if (i - start > ret.Length)
                {
                    ret = source.Substring(start, i - start);
                }

                while (dict.Count > k)
                {
                    int count = dict[source[start]];
                    if (count == 1)
                    {
                        dict.Remove(source[start]);
                    }
                    else
                    {
                        dict[source[start]] = dict[source[start]] - 1;
                    }

                    start++;
                }
            }

        }
        //just for edge case like "aabbcceee", should return "cceee"
        if (dict.Count == k && source.Length - start > ret.Length)
        {
            return source.Substring(start, source.Length - start);
        }

        return ret;
    }

`

//これはテストケースです。

    public void TestMethod1()
    {
        var ret = Item001.MaxSubStringKUniqueChars("aabcd", 2);
        Assert.AreEqual("aab", ret);

        ret = Item001.MaxSubStringKUniqueChars("aabbccddeee", 2);
        Assert.AreEqual("ddeee", ret);

        ret = Item001.MaxSubStringKUniqueChars("abccccccccaaddddeeee", 3);
        Assert.AreEqual("ccccccccaadddd", ret);

        ret = Item001.MaxSubStringKUniqueChars("ababcdcdedddde", 2);
        Assert.AreEqual("dedddde", ret);
    }

score 0 · Accepted Answer

  //Given a string ,find the longest sub-string with all distinct characters in it.If there are multiple such strings,print them all.
  #include<iostream>
  #include<cstring>
  #include<array>

  using namespace std;

 //for a string with all small letters
 //for capital letters use 65 instead of 97

 int main()
 {

  array<int ,26> count ;

  array<string,26>largest;

  for(int i = 0 ;i <26;i++)
  count[i]=0;

  string s = "abcdefghijrrstqrstuvwxyzprr";
  string out = "";

  int k = 0,max=0;

  for(int i = 0 ; i < s.size() ; i++)
     { 
         if(count[s[i] - 97]==1)
           {  
              int loc = out.find(s[i]);

              for(int j=0;j<=loc;j++)   count[out[j] - 97]=0;

              if(out.size() > max) 
                {   
                   max = out.size();
                   k=1;
                   largest[0] = out;
                 }
             else if(out.size()==max) largest[k++]=out;

             out.assign(out,loc+1,out.size()-loc-1);
           }
         out = out + s[i];
         count[s[i] - 97]++;
      }

   for(int i=0;i<k;i++)  cout<<largest[i] << endl;
   //output will be
   // abcdefghijr
   // qrstuvwxyzp

  }

score 0 · Accepted Answer

与えられた文字列を s とし、その長さを n とします。

f(i) を、s[i] で終わる、異なる文字を持つ s の最長の [連続した] 部分文字列と定義します。それはユニークで明確に定義されています。

各 i について f(i) を計算します。f(i-1) と s[i] から簡単に推測できます。

文字 s[i] が f(i-1) にある場合、j を s[j] = s[i] となる j < i の最大位置とする。f(i) は s[j+1 .. i] (Python 表記)
それ以外の場合、f(i) は f(i-1) に s[i] が追加されます。

問題の解決策は、最大長の任意の f(i) (必ずしも一意であるとは限りません) です。

このアルゴリズムを実装して、O(n * 26) 時間で実行できます。ここで、26 はアルファベットの文字数です。

score -1 · Accepted Answer

string MaximumSubstringNonRepeating(string text)
{
    string max = null;
    bool isCapture = false;
    foreach (string s in Regex.Split(text, @"(.)\1+"))
    {
        if (!isCapture && (max == null || s.Length > max.Length))
        {
            max = s;
        }
        isCapture = !isCapture;
    }
    return max;
}

.任意の文字に一致します。( )そのキャラクターを捉えます。\1キャプチャされた文字を再度一致させます。+その文字を繰り返します。パターン全体は、任意の 1 文字の 2 回以上の繰り返しに一致します。"AA"または",,,,"。

Regex.Split()パターンに一致するたびに文字列を分割し、その間にあるピースの配列を返します。(1 つの注意点: キャプチャされた部分文字列も含まれます。この場合、繰り返されている 1 文字です。キャプチャはピースの間に表示されます。これがisCaptureフラグを追加した方法です。)

この関数は、繰り返されるすべての文字を切り取り、繰り返される文字の各セットの間にある最長の部分を返します。

>>> MaximumSubstringNonRepeating("AABGAKG") // "AA" is repeated
"BGAKG"

>>> MaximumSubstringNonRepeating("AABGAKGIMNZZZD") // "AA" and "ZZZ" are repeated.
"BGAKGIMN"

string - 文字が繰り返されない最長の部分文字列を見つける方法は?

9 に答える 9

Related

Reference