delphi - 大文字と小文字を区別しない位置

Question

D2010（ユニコード）で大文字と小文字を区別しないPosのような同等の関数はありますか？

Pos（AnsiUpperCase（FindString）、AnsiUpperCase（SourceString））を使用できることはわかっていますが、関数が呼び出されるたびに文字列を大文字に変換するため、処理時間が長くなります。

たとえば、1000000ループでは、Posは78ミリ秒かかりますが、大文字に変換するには764ミリ秒かかります。

str1 := 'dfkfkL%&/s"#<.676505';
  for i := 0 to 1000000 do
    PosEx('#<.', str1, 1); // Takes 78ms

  for i := 0 to 1000000 do
    PosEx(AnsiUpperCase('#<.'), AnsiUpperCase(str1), 1); // Takes 764ms

この特定の例のパフォーマンスを向上させるために、ループの前に最初に文字列を大文字に変換できることは知っていますが、大文字と小文字を区別しないPosのような関数を探している理由は、FastStringsの関数を置き換えるためです。。Posを使用する文字列はすべて異なるため、すべてを大文字に変換する必要があります。

Pos +文字列を大文字に変換するよりも高速な関数は他にありますか？

score 25 · Accepted Answer

これを行うための組み込みのDelphi関数は、AnsiStringsの場合はAnsiStrings.ContainsText、Unicode文字列の場合はStrUtils.ContainsTextの両方にあります。

ただし、バックグラウンドでは、ロジックと非常によく似たロジックを使用します。

どのライブラリにある場合でも、そのような関数は常に低速になります。特にUnicodeと可能な限り互換性を持たせるには、かなりのオーバーヘッドが必要です。そして、それらはループ内にあるので、それは多くの費用がかかります。

そのオーバーヘッドを回避する唯一の方法は、可能な限りループの外側でこれらの変換を行うことです。

だから：あなた自身の提案に従ってください、そしてあなたは本当に良い解決策を持っています。

--jeroen

score 11 · Accepted Answer

私の以前の回答のこのバージョンは、 D2007とD2010の両方で機能します。

Delphi2007ではCharUpCaseTable256バイトです
Delphi 2010では、128 KB（65535 * 2）です。

理由は文字サイズです。古いバージョンのDelphiでは、元のコードは初期化時に現在のロケール文字セットのみをサポートしていました。私InsensPosExはあなたのコードより約4倍速いです。確かにもっと速く進むことは可能ですが、単純さを失うでしょう。

type
  TCharUpCaseTable = array [Char] of Char;

var
  CharUpCaseTable: TCharUpCaseTable;

procedure InitCharUpCaseTable(var Table: TCharUpCaseTable);
var
  n: cardinal;
begin
  for n := 0 to Length(Table) - 1 do
    Table[Char(n)] := Char(n);
  CharUpperBuff(@Table, Length(Table));
end;

function InsensPosEx(const SubStr, S: string; Offset: Integer = 1): Integer;
var
  n:            Integer;
  SubStrLength: Integer;
  SLength:      Integer;
label
  Fail;
begin
  Result := 0;
  if S = '' then Exit;
  if Offset <= 0 then Exit;

  SubStrLength := Length(SubStr);
  SLength := Length(s);

  if SubStrLength > SLength then Exit;

  Result := Offset;
  while SubStrLength <= (SLength-Result+1) do 
  begin
    for n := 1 to SubStrLength do
      if CharUpCaseTable[SubStr[n]] <> CharUpCaseTable[s[Result+n-1]] then
        goto Fail;
      Exit;
Fail:
    Inc(Result);
  end;
  Result := 0;
end;

//...

initialization
  InitCharUpCaseTable({var}CharUpCaseTable);

score 5 · Accepted Answer

また、ボイヤームーア（BM）検索を使用して速度を上げるFastStringsをD2009およびD2010に変換するという問題にも直面しました。私の検索の多くは単一の文字のみを検索し、これらのほとんどはアルファベット以外の文字を検索しているため、SmartPosのD2010バージョンには、最初の引数としてwidecharを使用するオーバーロードバージョンがあり、文字列を単純にループします。これらを見つけるために。大文字と小文字を区別しないいくつかのケースを処理するために、両方の引数の大文字を使用します。私のアプリケーションでは、このソリューションの速度はFastStringsに匹敵すると思います。

'string find'の場合、最初のパスはSearchBufを使用して大文字を使用し、ペナルティを受け入れることでしたが、最近、UnicodeBM実装を使用する可能性を調査しています。ご存知かもしれませんが、BMはUnicode比率の文字セットに適切に、または簡単に拡張できませんが、SoftGemsにはUnicodeBMの実装があります。。これはD2009およびD2010より前の日付ですが、かなり簡単に変換できるように見えます。著者のMikeLischkeは、67kbのUnicode大文字表を含めることで大文字の問題を解決しています。これは、私の控えめな要件には遠すぎるステップかもしれません。私の検索文字列は通常短いので（単一の3文字の例ほど短くはありませんが）、Unicode BMのオーバーヘッドも支払う価値がないかもしれません。BMの利点は、検索する文字列の長さとともに増加します。

これは間違いなく、Unicode BMを自分のアプリケーションに組み込む前に、実際のアプリケーション固有の例を使用したベンチマークが必要になる状況です。

編集：いくつかの基本的なベンチマークは、私が「UnicodeTunedBoyer-Moore」ソリューションに警戒するのが正しかったことを示しています。私の環境では、UTBMを使用すると、コードが大きくなり、時間が長くなります。この実装が提供する追加機能のいくつかが必要な場合は、それを使用することを検討するかもしれません（サロゲートと単語全体の検索の処理）。

score 4 · Accepted Answer

これが私が書いたもので、何年も使っています：

function XPos( const cSubStr, cString :string ) :integer;
var
  nLen0, nLen1, nCnt, nCnt2 :integer;
  cFirst :Char;
begin
  nLen0 := Length(cSubStr);
  nLen1 := Length(cString);

  if nLen0 > nLen1 then
    begin
      // the substr is longer than the cString
      result := 0;
    end

  else if nLen0 = 0 then
    begin
      // null substr not allowed
      result := 0;
    end

  else

    begin

      // the outer loop finds the first matching character....
      cFirst := UpCase( cSubStr[1] );
      result := 0;

      for nCnt := 1 to nLen1 - nLen0 + 1 do
        begin

          if UpCase( cString[nCnt] ) = cFirst then
            begin
              // this might be the start of the substring...at least the first
              // character matches....
              result := nCnt;

              for nCnt2 := 2 to nLen0 do
                begin

                  if UpCase( cString[nCnt + nCnt2 - 1] ) <> UpCase( cSubStr[nCnt2] ) then
                    begin
                      // failed
                      result := 0;
                      break;
                    end;

                end;

            end;


          if result > 0 then
            break;
        end;


    end;
end;

score 2 · Accepted Answer

通常のPosステートメント内で、サブ文字列とソース文字列の両方を小文字または大文字に変換しないのはなぜですか。両方の引数がすべて1つのケースであるため、結果は事実上大文字と小文字を区別しません。シンプルでライト。

score 1 · Accepted Answer

Jediコードライブラリには、DelphiのRTLを補完するStrIPosやその他の便利な関数が何千もあります。私がまだDelphiで多くの作業をしていたとき、JCLとそのビジュアルブラザーJVCLは、新しくインストールしたDelphiに最初に追加したものの1つでした。

score 0 · Accepted Answer

この機会に、Pos（）+何らかの形式の文字列正規化（大文字/小文字の変換）よりも優れていることは言うまでもなく、これほど優れたアプローチを見つけることができませんでした。

Delphi 2009でUnicode文字列処理をベンチマークしたときに、Pos（）RTLルーチンがDelphi 7以降大幅に改善されていることがわかりました。これは、FastCodeライブラリの側面がRTLに組み込まれているという事実によって部分的に説明されています。しばらくしてください。

一方、FastStringsライブラリは、長い間、大幅に更新されていません（iirc）。テストでは、多くのFastStringsルーチンが実際に同等のRTL関数に追い抜かれていることがわかりました（いくつかの例外を除いて、Unicodeの追加の複雑さによって発生する避けられないオーバーヘッドによって説明されます）。

スティーブによって提示されたソリューションの「Char-Wise」処理は、これまでのところ最高です。

文字列全体（文字列とサブ文字列の両方）を正規化するアプローチでは、Unicode文字列では大文字と小文字を変換すると文字列の長さが変更される可能性があるため、結果の文字ベースの位置にエラーが発生するリスクがあります。（一部の文字は、大文字と小文字の変換でより多くの/より少ない文字に変換されます）。

これらはまれなケースかもしれませんが、Steveのルーチンはそれらを回避し、すでに非常に高速なPos + Uppercaseよりも約10％遅いだけです（ベンチマーク結果は、そのスコアで私のものと一致しません）。

score 0 · Accepted Answer

'AnsiUpperCase'の代わりに、Tableを使用できます。これははるかに高速です。古いコードの形を変えました。それは非常にシンプルで、また非常に高速です。それをチェックしてください：

type
  TAnsiUpCaseTable = array [AnsiChar] of AnsiChar;

var
  AnsiTable: TAnsiUpCaseTable;

procedure InitAnsiUpCaseTable(var Table: TAnsiUpCaseTable);
var
  n: cardinal;
begin
  for n := 0 to SizeOf(TAnsiUpCaseTable) -1 do
  begin
    AnsiTable[AnsiChar(n)] := AnsiChar(n);
    CharUpperBuff(@AnsiTable[AnsiChar(n)], 1);
  end;
end;

function UpCasePosEx(const SubStr, S: string; Offset: Integer = 1): Integer;
var
  n              :integer;
  SubStrLength   :integer;
  SLength        :integer;
label
  Fail;
begin
  SLength := length(s);
  if (SLength > 0) and (Offset > 0) then begin
    SubStrLength := length(SubStr);
    result := Offset;
    while SubStrLength <= SLength - result + 1 do begin
      for n := 1 to SubStrLength do
        if AnsiTable[SubStr[n]] <> AnsiTable[s[result + n -1]] then
          goto Fail;
      exit;
Fail:
      inc(result);
    end;
  end;
  result := 0;
end;

initialization
  InitAnsiUpCaseTable(AnsiTable);
end.

score 0 · Accepted Answer

Posの前に大文字または小文字に変換するのが最善の方法だと思いますが、AnsiUpperCase/AnsiLowerCase関数の呼び出しはできるだけ少なくする必要があります。

score 0 · Accepted Answer

多くの場合、単純な解決策はあなたが使いたいものです：

if AnsiPos(AnsiupperCase('needle'), AnsiupperCase('The Needle in the haystack')) <> 0 then
    DoSomething;

参照：

score 0 · Accepted Answer

Windows上のすべてのプログラムは、コードサイズを抑えるシェルAPI関数を呼び出すことができます。いつものように、プログラムを下から上に読んでください。これは、ASCII文字列のみでテストされており、幅の広い文字列ではテストされていません。

program PrgDmoPosIns; {$AppType Console} // demo case-insensitive Pos function for Windows

// Free Pascal 3.2.2 [2022/01/02], Win32 for i386
// FPC.EXE -vq -CoOr -Twin32 -oPrgStrPosDmo.EXE PrgStrPosDmo.LPR
//         -vq Verbose: Show message numbers
//             -C Code generation:
//               o Check overflow of integer operations
//                O Check for possible overflow of integer operations - Integer Overflow checking turns on Warning 4048
//                 r Range checking
//                   -Twin32 Target 32 bit Windows operating systems
// 29600 bytes code, 1316 bytes data, 35,840 bytes file

function StrStrIA( pszHaystack, pszNeedle : PChar ) : PChar; stdcall; external 'shlwapi.dll'; // dynamic link to Windows API's case-INsensitive search
// https://docs.microsoft.com/en-us/windows/win32/api/shlwapi/nf-shlwapi-strstria
// "FPC\3.2.2\Source\Packages\winunits-base\src\shlwapi.pp" line 557

function StrPos(        strNeedle, strHaystk : string ) : SizeInt; // return the position of Needle within Haystack, or zero if not found
var
  intRtn       : SizeInt; // function result
  ptrHayStk             , // pointers to
  ptrNeedle             , //   search strings
  strMchFnd    : PChar  ; // pointer to match-found string, or null-pointer/empty-string when not found
  bolFnd       : boolean; // whether Needle was found within Haystack
  intLenHaystk          , // length of haystack
  intLenMchFnd : SizeInt; // length of needle
begin
  strHayStk :=       strHayStk + #0            ; // strings passed to API must be
  strNeedle :=       strNeedle + #0            ; //   null-terminated

  ptrHayStk := Addr( strHayStk[ 1 ] )          ; // set pointers to point at first characters of
  ptrNeedle := Addr( strNeedle[ 1 ] )          ; //   null-terminated strings, so API gets C-style strings

  strMchFnd := StrStrIA( ptrHayStk, ptrNeedle ); // call Windows to perform search; match-found-string now points inside the Haystack
  bolFnd    := ( strMchFnd <> '' )             ; // variable is True when           match-found-string is not null/empty

  if bolFnd then begin                         ; // when Needle was yes found in Haystack
    intLenMchFnd := Length( strMchFnd )        ; // get length of needle
    intLenHaystk := Length( strHayStk )        ; // get length of haystack
    intRtn       := intLenHaystk - intLenMchFnd; // set  function result to the position of needle within haystack, which is the difference in lengths
  end       else                                 // when Needle was not found in Haystack
    intRtn       := 0                          ; // set  function result to tell caller needle does not appear within haystack
  StrPos := intRtn                             ; // pass function result back to caller
end; // StrPos

procedure TstOne( const strNeedle, strHayStk : string ); // run one test with this Needle
var
  intPos : SizeInt; // found-match location of Needle within Haystack, or zero if none
begin
  write  ( 'Searching for : [', strNeedle, ']' ); // bgn output row for this test
  intPos := StrPos(  strNeedle, strHaystk      ); // get Needle position
  writeln(' StrPos is '       , intPos         ); // end output row for this test
end; // TstOne

procedure TstAll(                                     ); // run all tests with various Needles
const
  strHayStk = 'Needle in a Haystack'; // all tests will search in this string
begin
  writeln( 'Searching in  : [', strHayStk, ']' ); // emit header row
  TstOne ( 'Noodle'           , strHayStk      ); // test not-found
  TstOne ( 'Needle'           , strHayStk      ); // test found at yes-first character
  TstOne ( 'Haystack'         , strHayStk      ); // test found at not-first character
end; // TstAll

begin // ***** MAIN *****
  TstAll( ); // run all tests
end.

delphi - 大文字と小文字を区別しない位置

11 に答える 11

Related

Reference