html - Perl で HTML タイトルを抽出するにはどうすればよいですか?

Question

Perl を使用して HTML ページのタイトルを抽出する方法はありますか? フォームの送信中に隠し変数として渡され、その方法で Perl で取得できることは知っていますが、送信せずにこれを行う方法があるかどうか疑問に思っていましたか?

たとえば、次のような HTML ページがあるとします。

<html><head><title>TEST</title></head></html>

そして、Perlでやりたい：

$q -> h1('something');

「何か」を <title> タグに含まれるものに動的に置き換えるにはどうすればよいですか?

score 8 · Accepted Answer

私はpQueryを使用します。jQuery と同じように機能します。

あなたは言うことができます：

use pQuery;
my $page = pQuery("http://google.com/");
my $title = $page->find('title');
say "The title is: ", $title->html;

ものを置き換えることは似ています：

$title->html('New Title');
say "The entirety of google.com with my new title is: ", $page->html;

pQueryコンストラクターにHTML 文字列を渡すことができます。

最後に、任意の HTML を「テンプレート」として使用し、それを Perl コマンドで「洗練」したい場合は、Template::Refineを使用します。

score 3 · Accepted Answer

3

HTML::HeadParserがこれを行います。

于 2009-02-23T18:41:21.303 に答える

score 1 · Accepted Answer

前の回答は間違っています。HTML タイトルタグがより頻繁に使用されている場合、タイトルタグが有効であること (間にタグがないこと) を確認することで簡単に克服できます。

my ($title) = $test_content =~ m/<title>([a-zA-Z\/][^>]+)<\/title>/si;

score 1 · Accepted Answer

use strict;
use LWP::Simple;

my $url = 'http://www.google.com'|| die "Specify URL on the cmd line";
my $html = get ($url);
$html =~ m{<TITLE>(.*?)</TITLE>}gism;

print "$1\n";

score 1 · Accepted Answer

あなたが何を求めているのか私にははっきりしません。ユーザーのブラウザーで実行できるもの、または少なくとも html ページが既に読み込まれているものについて話しているようです。

そうでない場合、答えはURI::Titleです。

score 0 · Accepted Answer

ファイルからタイトル名を取得します。

                    my $spool = 0;

                    open my $fh, "<", $absPath or die $!; 
                    #open ($fh, "<$tempfile" );
                    # wrtie the opening brace
                    print WFL "[";
            while (<$fh>) {
                    # removes the new line from the line read
                        chomp;
                    # removes the leading and trailing spaces.
                    $_=~ s/^\s+|\s+$//g;
            # case where the <title> and </title> occures in one line
            # we print and exit in one instant
                if (($_=~/$startstring/i)&&($_=~/$endstring/i)) {

                        print WFL "'";

                    my ($title) = $_=~ m/$startstring(.+)$endstring/si;
                        print WFL "$title";
                        print WFL "',";
                        last;
                        }
            # case when the <title> is in one line and </title> is in other line

            #starting <title> string is found in the line
                elsif ($_=~/$startstring/i) {

                        print WFL "'";
            # extract everything after <title> but nothing before <title>       
                    my ($title) = $_=~ m/$startstring(.+)/si;
                        print WFL "$title";
                        $spool = 1;
                        }
            # ending string </title> is found
                elsif ($_=~/$endstring/i) {
            # read everything before </title> and nothing above that                                
                    my ($title) = $_=~ m/(.+)$endstring/si;
                        print WFL " ";
                        print WFL "$title";
                        print WFL "',";
                        $spool = 0;
                        last;
                        }
            # this will useful in reading all line between <title> and </title>
                elsif ($spool == 1) {
                        print WFL " ";
                        print WFL "$_";

                        }

                    }
        close $fh;
        # end of getting the title name

score -2 · Accepted Answer

ページタイトルだけを抽出したい場合は、正規表現を使用できます。私はそれが次のようなものになると信じています:

my ($title) = $html =~ m/<title>(.+)<\/title>/si;

HTML ページが string に格納されている場所$html。ではsi、は単一行モード(つまり、ドットは改行にも一致する) と大文字と小文字を区別しないことsを表します。i

html - Perl で HTML タイトルを抽出するにはどうすればよいですか?

7 に答える 7

ファイルからタイトル名を取得します。

Related

Reference