php - PHPでテキストファイル内の一致するリンクを見つける

Question

テキストファイルを読み取り、ディレクトリ検索とクロスマッチして、ファイルのディレクトリインデックスを使用して説明（テキストファイル）を計算する機能があります。leveltensin 関数を使用していくつかのファジーロジックを提供したので、名前が 100% 同一である必要はありませんが、問題が発生しています。現在セットアップしているため、行のコメントを外すとメモリウォールが発生します。その下では、txt ファイル全体を検索し、すべての ling をディレクトリファイル名と比較します。700 を超えるファイルがそれぞれ 700 回チェックされるため、すぐにメモリが不足します。while (!feof($file_handle) ) から飛び出す方法が必要です。一致するものが見つかったら、次のパスの開始点を停止した行位置に設定する方法を見つけて、int ループ 0-700 にする必要があります。毎回

function GenerateList($titleB, $descB, $thumbB, $dirB, $patternB){
$outputB = "<CATEGORY name=\"$titleB\" desc=\"$descB\" thumb=\"$thumbB\">";
$open_error = 0;

if (is_dir($dirB)){
$myDirectory = opendir($dirB);
// get each entry
while($entryName = readdir($myDirectory)) {
    $dirArray[] = $entryName;
}

// close directory
closedir($myDirectory);

//  count elements in array
$indexCount = count($dirArray);

// sort em
sort($dirArray);
// loop through the array of files and print them all
if (!($text = file_get_contents("Scripts/descriptions.txt"))){$open_error = 1;}
$results = array();
for($index=0; $index < $indexCount; $index++) {
    $ext = explode(".", $dirArray[$index]);
    $parsed_title = preg_replace ($patternB, "", $ext[0]);
    if ((substr("$dirArray[$index]", 0, 1) != ".")&&($ext[1] == "flv")){ // don't list hidden files

//if ($open_error == 0){
//  $file_handle = fopen("Scripts/descriptions.txt", "rb");

//while (!feof($file_handle) ) {
//$line_of_text = fgets($file_handle);
//$parts = explode('|', $line_of_text);
/*
echo "<PRE>";
echo strtolower($parts[0]);
echo "</br>";
echo strtolower($parsed_title);
echo "</br>";
echo "</PRE>";
*/
//if ((wordMatch(strtolower($parts[0]), strtolower($parsed_title), 2)) > 0){
        $outputB .= "<ITEM>";
        $outputB .= "<file_path>/Sources/Power Rangers/$dirB".$dirArray[$index]."</file_path>";
        $outputB .= "<file_width>500</file_width>";
        $outputB .= "<file_height>375</file_height>";
        $outputB .= "<file_title>".$parsed_title."</file_title>";
//      $outputB .= "<file_desc>".$parts[1]."</file_desc>";
        $outputB .= "<file_desc>test</file_desc>";
//      $outputB .= "<file_image>".$match_result[2]."</file_image>";
        $outputB .= "<file_image>$thumbB</file_image>";
//      $outputB .= "<featured_image>".$match_result[3]."</featured_image>";
        $outputB .= "<featured_image>$thumbB</featured_image>";
//      $outputB .= "<featured_or_not>".$parts[4]."</featured_or_not>";
        $outputB .= "<featured_or_not>true</featured_or_not>";
        $outputB .= "</ITEM>";
//};//if ((wordMatch($parts[0], strtolower($word), 2) > 0)
//};//while
//fclose($file_handle);

//};//if ($open_error == 0)
    };//if ((substr("$dirArray[$index]", 0, 1) != ".")&&($ext[1] == "flv"))
};//for($index=0; $index < $indexCount; $index++) 
};//if (file_exists($dirB))
$outputB .= "</CATEGORY>";
return $outputB;
};//function

    function wordMatch($words, $input, $sensitivity){ 
        $shortest = -1; 
        foreach ($words as $word) { 
            $lev = levenshtein($input, $word); 
            if ($lev == 0) { 
                $closest = $word; 
                $shortest = 0; 
                break; 
            } //if
            if ($lev <= $shortest || $shortest < 0) { 
                $closest  = $word; 
                $shortest = $lev; 
            } //if
        } //foreach
        if($shortest <= $sensitivity){ 
            return $closest; 
        } else { 
            return 0; 
        } //if/else
    } // function, http://php.net/manual/en/function.levenshtein.php

score 1 · Accepted Answer

正規表現の代わりに、2 つのアイテム間の編集距離を計算できます。80% のヒューリスティックは、一致させようとしている文字列の長さが(length-edit_distance)/length >= .8どこにあると言うのと同じです。length

したがって、文字列の長さが 20 文字で、ターゲットからの編集距離が 2 である場合(20-2) / 20 == .9 、そのアイテムはターゲットと 90% 一致したと計算されます。これは 0.8 より大きいので、一致として受け入れます。

「編集距離」はレーベンシュタイン距離とも呼ばれるため、次のようにするだけです。

$len = (float) strlen($target);  // Avoids integer division.
$match = ($len-levenshtein($input, $target))/$len;

if ($match >= 0.8) {
  // The $input matches our $target
}

php - PHPでテキストファイル内の一致するリンクを見つける

1 に答える 1

Related

Reference