php - PHP で 2 つの文字列の部分的な類似性をチェックする方法

Question

2 つの文字列の類似度をチェックする PHP の関数はありますか?

たとえば、私は持っています：

$string1="Hello how are you doing" 
$string2= " hi, how are you"

function($string1, $string2)「how」、「are」、「you」という単語が行に含まれているため、は true を返します。

またはさらに良いことに、"how"、"are"、"you" はの 3/5 であるため、60% の類似性を返します$string1。

それを行う関数はPHPに存在しますか?

score 36 · Accepted Answer

それはいい質問なので、私はそれにいくつかの努力をしました：

<?php
$string1="Hello how are you doing";
$string2= " hi, how are you";

echo 'Compare result: ' . compareStrings($string1, $string2) . '%';
//60%


function compareStrings($s1, $s2) {
    //one is empty, so no result
    if (strlen($s1)==0 || strlen($s2)==0) {
        return 0;
    }

    //replace none alphanumeric charactors
    //i left - in case its used to combine words
    $s1clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s1);
    $s2clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s2);

    //remove double spaces
    while (strpos($s1clean, "  ")!==false) {
        $s1clean = str_replace("  ", " ", $s1clean);
    }
    while (strpos($s2clean, "  ")!==false) {
        $s2clean = str_replace("  ", " ", $s2clean);
    }

    //create arrays
    $ar1 = explode(" ",$s1clean);
    $ar2 = explode(" ",$s2clean);
    $l1 = count($ar1);
    $l2 = count($ar2);

    //flip the arrays if needed so ar1 is always largest.
    if ($l2>$l1) {
        $t = $ar2;
        $ar2 = $ar1;
        $ar1 = $t;
    }

    //flip array 2, to make the words the keys
    $ar2 = array_flip($ar2);


    $maxwords = max($l1, $l2);
    $matches = 0;

    //find matching words
    foreach($ar1 as $word) {
        if (array_key_exists($word, $ar2))
            $matches++;
    }

    return ($matches / $maxwords) * 100;    
}
?>

score 9 · Accepted Answer

他の回答がすでに述べているように、similar_text を使用できます。デモンストレーションは次のとおりです。

$string1="Hello how are you doing" ;
$string2= " hi, how are you";

echo similar_text($string1, $string2, $perc); //12

echo $perc; //61.538461538462

は 12 を返し、要求した類似度のパーセンテージを $perc に設定します。

score 9 · Accepted Answer

Alex Siriの回答に加えて、次の記事によると：

http://docstore.mik.ua/orelly/webprog/php/ch04_06.htm

PHP には、2 つの文字列がほぼ等しいかどうかをテストできる関数がいくつか用意されています。

$string1="Hello how are you doing" ;
$string2= " hi, how are you";

サウンドデックス

if (soundex($string1) == soundex($string2)) {

  echo "similar";

} else {

  echo "not similar";

}

メタフォン

if (metaphone($string1) == metaphone($string2)) {

   echo "similar";

} else {

  echo "not similar";

}

類似のテキスト

$similarity = similar_text($string1, $string2);

レーベンシュタイン

$distance = levenshtein($string1, $string2);

score 0 · Accepted Answer

これが私の機能で、非常に興味深いものです。

文字列のおおよその類似性をチェックしています。

そのために私が使っている基準があります。

言葉の順番が大事
単語の類似度は 85% です。

例：

$string1 = "How much will it cost to me" (string in vocabulary)
$string2 = "How much does costs it "   //("costs" instead "cost" -is a mistake) (user input);

アルゴリズム: 1) 単語の類似性をチェックし、「適切な」単語できれいな文字列を作成します (語彙に現れる順序で)。OUTPUT: "how how much it cost" 2) ユーザー入力に表示されるように、"適切な単語" でクリーンな文字列を作成します。OUTPUT: "how how much it it" 3) 2 つの出力を比較し、同じでない場合は no を返し、同じ場合は yes を返します。

error_reporting(E_ALL);
ini_set('display_errors', true);

$string1="сколько это стоит ваще" ;
$string2= "сколько будет стоить это будет мне";

if(compareStrings($string1, $string2)) {
 echo "yes";    
} else {
    echo 'no';
}
//echo compareStrings($string1, $string2);

function compareStrings($s1, $s2) {

    if (strlen($s1)==0 || strlen($s2)==0) {
        return 0;
    }

    while (strpos($s1, "  ")!==false) {
        $s1 = str_replace("  ", " ", $s1);
    }
    while (strpos($s2, "  ")!==false) {
        $s2 = str_replace("  ", " ", $s2);
    }

    $ar1 = explode(" ",$s1);
    $ar2 = explode(" ",$s2);
  //  $array1 = array_flip($ar1);
  //  $array2 = array_flip($ar2);
    $l1 = count($ar1);
    $l2 = count($ar2);

 $meaning="";
    $rightorder="";
    $compare=0;
    for ($i=0;$i<$l1;$i++) {


        for ($j=0;$j<$l2;$j++) {

            $compare = (similar_text($ar1[$i],$ar2[$j],$percent)) ;
          //  echo $compare;
if ($percent>=85) {
    $meaning=$meaning." ".$ar1[$i];
    $rightorder=$rightorder." ".$ar1[$j];
    $compare=0;
}

        }


    }
    //print_r($rightorder);
if ($rightorder==$meaning) {
    return true;
} else {
    return false;
}

}

あなたの意見と改善方法の提案を聞きたいです

php - PHP で 2 つの文字列の部分的な類似性をチェックする方法

6 に答える 6

Related

Reference