perl - HTML テーブルからコンテンツを抽出する

Question

重複の可能性:
Perl を使用してテーブルの内容を抽出する

HTML::TableExtract を使用して、html ファイルからテーブルコンテンツを抽出しようとしています。私の問題は、html ファイルが次のように構成されていることです。

!DOCTYPE html>
<html>
<body>

    <h4>One row and three columns:</h4>

    <table border="1">
      <tr>
        <td>
        <p> 100 </p></td>
        <td>
        <p> 200 </p></td>
        <td>
        <p> 300 </p></td>
        </tr>
      <tr>
        <td>
        <p> 100 </p></td>
        <td>
        <p> 200 </p></td>
        <td>
        <p> 300 </p></td>
        </tr>
    </table>
</body>

この構造のため、出力は次のようになります。

私が欲しいものの代わりに：

   100|200|300|
   400|500|600|

助けていただけますか？ここに私のperlコードがあります

use strict;
use warnings;
use HTML::TableExtract;

my $te = HTML::TableExtract->new();
$te->parse_file('Table_One.html');

open (DATA2, ">TableOutput.txt")
    or die "Can't open file";

foreach my $ts ($te->tables()) {

    foreach my $row ($ts->rows()) {

        my $Final = join('|', @$row );
    print DATA2 "$Final";
    }
}
close (DATA2);

score 1 · Accepted Answer

sub trim(_) { my ($s) = @_; $s =~ s/^\s+//; $s =~ s/\s+\z//; $s }

foreach my $ts ($te->tables()) {
    foreach my $row ($ts->rows()) {
        print DATA2 join('|', map trim, @$row), "\n";
    }                                            
}                                                 ^
                                                  |
                                                  |

または、末尾の " " が本当に必要な場合は|、

sub trim(_) { my ($s) = @_; $s =~ s/^\s+//; $s =~ s/\s+\z//; $s }

foreach my $ts ($te->tables()) {
    foreach my $row ($ts->rows()) {
        print DATA2 (map { trim($_).'|' } @$row), "\n";
    }
}

perl - HTML テーブルからコンテンツを抽出する

私が欲しいものの代わりに：

1 に答える 1

Related

Reference