php - Symfony2 DomCrawler と FB2 ブック形式パーサー

Question

全て！

Symfony2 DomCrawler コンポーネントで正しく記述された XML ファイルを解析するにはどうすればよいですか?

すべてのセクションを分割し、このセクションのみに属する現在のセクションと一緒に内部タグ (エピグラフ、p、詩など) を収集する必要があります。

以下に説明する標準のFB2本のXML形式があります。

<?xml version="1.0" encoding="utf-8"?>
<FictionBook xmlns="http://www.gribuser.ru/xml/fictionbook/2.0" xmlns:l="http://www.w3.org/1999/xlink">
<description></description>
<body>
<section>
    <title><p><strong>Level 1, section 1</strong></p></title>
    <section>
        <title><p><strong>Level 2, section 2</strong></p></title>
        <section>
            <title><p><strong>Level 3, section 3</strong></p></title>
            <p>Level 3, section 3, paragraph 1</p>
            <poem>
                <stanza>
                    <v>bla-bla-bla 1</v>
                    <v>bla-bla-bla 2</v>
                    <v>bla-bla-bla 3</v>
                </stanza>
            </poem>
            <p>Level3, section 3, paragraph 2</p>
            <subtitle><strong>x x x</strong></subtitle>
        </section>
        <section>
            <title><p><strong>Level 3, section 4</strong></p></title>
            <p>Level 3, section 4, paragraph 1</p>
            <p>Level 3, section 4, paragraph 2</p>
            <subtitle><strong>x x x</strong></subtitle>
        </section>
        <section>
            <title><p><strong>Level 3, section 5</strong></p></title>
            <p>Level 3, section 5, paragraph 1</p>
            <p>Level 3, section 5, paragraph 2</p>
            <p>Level 3, section 5, paragraph 3</p>
            <empty-line/>
            <subtitle>This file was created</subtitle>
            <subtitle>with BookDesigner program</subtitle>
            <subtitle>bookdesigner@the-ebook.org</subtitle>
            <subtitle>22.04.2004</subtitle>
        </section>
    </section>
</section>
</body>
</FictionBook>

以下のコードは機能しません。誰かがこれを解決するのを手伝ってくれませんか? ところで、タイトルは正しく解析されました...しかし、セクションのタグはそうではありません...

private function loadBookSections(Crawler $crawler)
{
    $sections = $crawler->filter('section')->each(function(Crawler $node) {
        $c = $node->filter('section')->reduce(function(Crawler $node, $i) {
            return ($i == 0);
        });

        return array(
            'title' => $node->filter('title')->text(),
            'inner' => $c->html(),
        );
    });

    echo "*******************************************\n";

    foreach($sections as $section ) {
        echo ">>> ".$section['title']."\n";
        echo "!!! ".$section['inner']."\n";
    }
}

そして助けてくれてありがとう！

score -1 · Accepted Answer

XML ファイルをかなり減らすと、次のようになります。

<section>
    <section>
        <!-- ... -->
    </section>
    <section>
        <!-- ... -->
    </section>
    <section>
        <!-- ... -->
    </section>
</section>

親要素ではなく、子要素をキャッチしたいsection。

現在、親要素のリストのみを反復処理していsectionます。つまり、親要素の HTML のみを取得していますsection。

section section子を反復処理するには、代わりにを選択する必要がありますsection。

コードをさらに改善するための補足情報: 醜いreduce呼び出しの代わりに->first()、ノードリストの最初の要素を取得するために使用します。

合計すると、コードは次のようになります。

$sections = $crawler->filter('section section')->each(function(Crawler $node) {
    $c = $node->filter('section')->first();

    return array(
        'title' => $node->filter('title')->text(),
        'inner' => $c->html(),
    );
});

php - Symfony2 DomCrawler と FB2 ブック形式パーサー

2 に答える 2

Related

Reference