こんにちは、ratemyprofessor Web サイトで教授名とコメントを解析し、各 div をプレーンテキストに変換しようとしています。これが、私が取り組んでいる div クラス構造です。
<div id="ratingTable">
<div class="ratingTableHeader"></div>
<div class="entry odd"><a name="18947089"></a>
<div class="date">
8/24/11 // the date which I want to parse
</div><div class="class"><p>
ENGL2323 // the class which I want to parse
</p></div><div class="rating"></div><div class="comment" style="width:350px;">
<!-- comment section -->
<p class="commentText"> // this is what I want to parse as plaintext for each entry
I have had Altimont for 4 classes. He is absolutely one of my favorite professors at St. Ed's. He's generous with his time, extremely knowledgeable, and such an all around great guy to know. Having class with him he would always have insightful comments on what we were reading, and he speaks with a lot of passion about literature. Just the best!
</p><div class="flagsIcons"></div></div>
<!-- closes comment -->
</div>
<!-- closes even or odd -->
<div class="entry even"></div> // these divs are the entries for each professor
<!-- closes even or odd -->
<div class="entry odd"></div>
<!-- closes even or odd -->
</div>
<!-- closes rating table -->
したがって、すべてのエントリはこの「ratingtable」div の下にカプセル化され、各エントリは「entry odd」または「entry even」div のいずれかになります。
これまでの私の試みは次のとおりですが、大量のゴミを含む巨大な文字化けした配列が生成されるだけです。
<?php
header('Content-type: text/html; charset=utf-8'); // this just makes sure encoding is right
include('simple_html_dom.php'); // the parser library
$html = file_get_html('http://www.ratemyprofessors.com/SelectTeacher.jsp?sid=834'); // the url for the teacher rating profile
//first attempt, rendered nothing though
foreach($html->find("div[class=commentText]") as $content){
echo $content.'<hr />';
}
foreach($html->find("div[class=commentText]") as $content){
$content = <div class="commentText"> // first_child() should be the <p>
echo $content->first_child().'<hr />';
//Get the <p>'s following the <div class="commentText">
$next = $content->next_sibling();
while ($next->tag == 'p') {
echo $next.'<hr />';
$next = $next->next_sibling();
}
}
?>