php - xpath または ->nextSibling を使用して、curl を使用して html ファイルを読み取る

Question

DOMDocument 関数を使用して、以下のサンプル HTML コードから 15、Teaching Periods 1 および 2.、Max 150. などのデータを抽出する方法について教えてください。私はそれを回避しようとしましたが、抽出したいテキストの前に複数のタグがあると、抽出したすべてのデータを mysql データベースに保存する必要があるため、すべてのコンテンツを一度に抽出するのが難しくなります。

<P><SPAN STYLE="font-size: 16pt; font-weight: bold"><a name="CS1050"class="modtitle">CS1050 Fundamentals of Internet Computing</a></SPAN></p>
<P><B>Credit Weighting: </B>15<BR><BR>
<B>Teaching Period(s): </B>Teaching Periods 1 and 2.<BR><BR>
<B>No. of Students: </B>Max 150.<BR><BR>
<B>Pre-requisite(s): </B>None<BR><BR>
<B>Co-requisite(s): </B>None<BR><BR>
<B>Teaching Methods: </B>72 x 1hr(s) Lectures; 18 x 2hr(s) Practicals.<BR><BR>
<B>Module Co-ordinator: </B>Professor Gregory Provan, Department of Computer Science.     <BR><BR>
<B>Lecturer(s): </B> Mr Gavin Russell, Department of Computer Science.<BR><BR>
<B>Module Objective: </B>To introduce students to Internet computer systems, web design, and<BR>client-side programming.<BR><BR>
<B>Module Content: </B>This module provides an introduction to the key concepts of Internet computing. Starting with the fundamentals of computer systems and the Internet, students progress to learn how to design web sites and how to utilize simple client-side programming. Issues related to user interface design and human-computer interfacing (HCI) are covered. Broader issues related to the use of the Internet for Blogging and Social Networks are discussed. The practical element of the module allows students to develop skills necessary for web site design using simple client side programming.<BR><BR>
<B>Learning Outcomes: </B>On successful completion of this module, students should be able to:<BR>&middot; Understand the fundamental principles of computer systems and the Internet;<BR>&middot; Design web sites;<BR>&middot; Use simple client-side programming;<BR>&middot; Understand the principles of user interface design and human-computer interfaces.<BR><BR>
<B>Assessment: </B>Total Marks 300: End of Year Written Examination 240 marks; Continuous Assessment 60 marks (Departmental Tests; Assignments).<BR><BR>
<B>Compulsory Elements: </B>End of Year Written Examination; Continuous Assessment.<BR<BR>
<B>Penalties (for late submission of Course/Project Work etc.): </B>Work which is submitted late shall be assigned a mark of zero (or a Fail Judgement in the case of Pass/Fail modules).<BR><BR>
<B>Pass Standard and any Special Requirements for Passing Module: </B>40%.<BR><BR>
<B>End of Year Written Examination Profile: </B>1 x 3 hr(s) paper(s).<BR><BR>
<B>Requirements for Supplemental Examination: </B>1 x 3 hr(s) paper(s) to be taken in Autumn. The mark for Continuous Assessment is carried forward.</P>



                   MY SAMPLE CURL CODE

$content3= $dom->getElementsByTagname('p');
$content4 = $dom->getElementsByTagname('b');

        //===========================================
        //=====  EXTRACT P STUFF ====================
        //===========================================

        foreach ($content3 as $value) {
            $contentnew[]= $value;
        print_r($value); 


        echo "Attribute Value = ";
        echo $value->getAttribute('value');
        echo "<br />";


        // let's get hold of the text value from the node
        $mytempvariable=$value->nodeValue;
        print "CONTENT OF P NODE: \n\n$mytempvariable <br /> <br />\n\n\n";
        }
        echo "<br /> <br /> <br />";



        //===========================================
        //===== EXTRACT B STUFF =====================
        //===========================================
        foreach ($content4 as $value) {
            $contentnew[]= $value;


        echo "Attribute Value = ";
        echo $value->getAttribute('value');
        echo "<br />";

        print_r($value); 
        // let's get hold of the text value from the node
        $mytempvariable=$value->nodeValue;
        print "CONTENT OF B NODE: \n\n$mytempvariable <br /> <br />\n\n\n";
        }
        echo "<br /> <br /> <br />";

->nextSibling または xpath を使用してすべての b ノードの後にすべてのデータを抽出できると聞きましたが、xpath を使用して必要なすべての関連データを抽出することはできないようです。

score 0 · Accepted Answer

あなたはとても親しかった：

$result = array();
foreach($dom->getElementsByTagName('b') as $node){
    $result[preg_replace('/:\s+$/','',$node->textContent)] = trim($node->nextSibling->textContent);
}
var_dump($result);

php - xpath または ->nextSibling を使用して、curl を使用して html ファイルを読み取る

1 に答える 1

Related

Reference