php - PHP を使用してテキストからインデックスを取得する

Question

この質問は、私の前の質問の続きです。

PHPを使用してタグを確認し、タグ内の値を取得します

私はこのようなテキストを持っています:

<ORGANIZATION>Head of Pekalongan Regency</ORGANIZATION>, Dra. Hj.. Siti Qomariyah , MA and her staff were greeted by <ORGANIZATION>Rector of IPB</ORGANIZATION> Prof. Dr. Ir. H. Herry Suhardiyanto , M.Sc. and <ORGANIZATION>officials of IPB</ORGANIZATION> in the guest room.

前の私の質問からの回答コードを使用して、次のPREG_OFFSET_CAPTUREように追加します。

function get_text_between_tags($string, $tagname) {
    $pattern = "/<$tagname\b[^>]*>(.*?)<\/$tagname>/is";
    preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
    if(!empty($matches[1]))
        return $matches[1];
    return array();
}

出力が得られます：

配列 (
[0] => 配列 ( [0] => ペカロンガン県長 [1] => 14 )
[1] => 配列 ( [0] => IPB の学長 [1] => 131 )
[2] => 配列 ( [0] => IPB の職員 [1] => 222 ) )

14, 131, 222 はパターンマッチ時の文字のインデックスです。単語のインデックスを取得できますか? 私はこのような出力を意味します:

配列 (
[0] => 配列 ( [0] => ペカロンガン県長 [1] => 0 )
[1] => 配列 ( [0] => IPB の学長 [1] => 15)
[2] => 配列 ( [0] => IPB の職員 [1] => 27 ) )

PREG_OFFSET_CAPTUREより多くのコードが必要な、または他の方法はありますか? 何も思いつきません。手伝ってくれてありがとう。:)

score 1 · Accepted Answer

これは機能しますが、少し仕上げが必要です。

<?php

$raw = '<ORGANIZATION>Head of Pekalongan Regency</ORGANIZATION>, Dra. Hj.. Siti Qomariyah , MA and her staff were greeted by <ORGANIZATION>Rector of IPB</ORGANIZATION> Prof. Dr. Ir. H. Herry Suhardiyanto , M.Sc. and <ORGANIZATION>officials of IPB</ORGANIZATION> in the guest room.';

$result = getExploded($raw,'<ORGANIZATION>','</ORGANIZATION>');

echo '<pre>';
print_r($result);
echo '</pre>';

function getExploded($data, $tagStart, $tagEnd) {
    $tmpData = explode($tagStart,$data);
    $wordCount = 0;
    foreach($tmpData as $k => $v) {
        $tmp = explode($tagEnd,$v);
        $result[$k][0] = $tmp[0];
        $result[$k][1] = $wordCount;
        $wordCount = $wordCount + (count(explode(' ',$v)) - 1);
    }
    return $result;
}

?>

結果は次のとおりです。

Array
(
    [0] => Array
        (
            [0] => 
            [1] => 0
        )

    [1] => Array
        (
            [0] => Head of Pekalongan Regency
            [1] => 0
        )

    [2] => Array
        (
            [0] => Rector of IPB
            [1] => 16
        )

    [3] => Array
        (
            [0] => officials of IPB
            [1] => 28
        )

    )

php - PHP を使用してテキストからインデックスを取得する

1 に答える 1

Related

Reference