php - テキスト内の URL を HTML リンクに置き換える

Question

ただし、ここにデザインがあります。たとえば、次のようなリンクを配置しています

http://example.com

テキストエリアで。http://リンクであることをPHPに検出させてから、次のように印刷するにはどうすればよいですか

print "<a href='http://www.example.com'>http://www.example.com</a>";

以前にこのようなことをしたことを覚えていますが、複雑なリンクのために壊れ続けていたのはばかげた証拠ではありませんでした.

別の良いアイデアは、次のようなリンクがある場合です

http://example.com/test.php?val1=bla&val2blablabla%20bla%20bla.bl

そうなるように修正する

print "<a href='http://example.com/test.php?val1=bla&val2=bla%20bla%20bla.bla'>";
print "http://example.com/test.php";
print "</a>";

これは単なる後付けです..stackoverflowはおそらくこれも使用できます:D

何か案は

score 122 · Accepted Answer

要件を見てみましょう。ハイパーリンクされたURLで表示したいユーザー提供のプレーンテキストがあります。

「http：//」プロトコルプレフィックスはオプションである必要があります。
ドメインとIPアドレスの両方を受け入れる必要があります。
.aeroや.xn--jxalpdlpなど、有効なトップレベルドメインを受け入れる必要があります。
ポート番号を許可する必要があります。
URLは、通常の文のコンテキストで許可される必要があります。たとえば、「Visit stackoverflow.com。」では、最後のピリオドはURLの一部ではありません。
おそらく「https：//」URLも許可したいと思うでしょうし、おそらく他のURLも許可したいと思うでしょう。
ユーザー提供のテキストをHTMLで表示する場合はいつものように、クロスサイトスクリプティング（XSS）を防止する必要があります。また、URLのアンパサンドを＆amp;として正しくエスケープする必要があります。
おそらく、IPv6アドレスのサポートは必要ありません。
編集：コメントに記載されているように、電子メールアドレスのサポートは間違いなくプラスです。
編集：プレーンテキスト入力のみがサポートされます–入力内のHTMLタグは尊重されるべきではありません。（BitbucketバージョンはHTML入力をサポートしています。）

編集：GitHubで最新バージョンを確認してください。メールアドレス、認証されたURL、引用符とかっこで囲まれたURL、HTML入力、更新されたTLDリストがサポートされています。

これが私の見解です：

<?php
$text = <<<EOD
Here are some URLs:
stackoverflow.com/questions/1188129/pregreplace-to-detect-html-php
Here's the answer: http://www.google.com/search?rls=en&q=42&ie=utf-8&oe=utf-8&hl=en. What was the question?
A quick look at http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax is helpful.
There is no place like 127.0.0.1! Except maybe http://news.bbc.co.uk/1/hi/england/surrey/8168892.stm?
Ports: 192.168.0.1:8080, https://example.net:1234/.
Beware of Greeks bringing internationalized top-level domains: xn--hxajbheg2az3al.xn--jxalpdlp.
And remember.Nobody is perfect.

<script>alert('Remember kids: Say no to XSS-attacks! Always HTML escape untrusted input!');</script>
EOD;

$rexProtocol = '(https?://)?';
$rexDomain   = '((?:[-a-zA-Z0-9]{1,63}\.)+[-a-zA-Z0-9]{2,63}|(?:[0-9]{1,3}\.){3}[0-9]{1,3})';
$rexPort     = '(:[0-9]{1,5})?';
$rexPath     = '(/[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]*?)?';
$rexQuery    = '(\?[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]+?)?';
$rexFragment = '(#[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]+?)?';

// Solution 1:

function callback($match)
{
    // Prepend http:// if no protocol specified
    $completeUrl = $match[1] ? $match[0] : "http://{$match[0]}";

    return '<a href="' . $completeUrl . '">'
        . $match[2] . $match[3] . $match[4] . '</a>';
}

print "<pre>";
print preg_replace_callback("&\\b$rexProtocol$rexDomain$rexPort$rexPath$rexQuery$rexFragment(?=[?.!,;:\"]?(\s|$))&",
    'callback', htmlspecialchars($text));
print "</pre>";

<および＆文字を適切にエスケープするために、処理する前にテキスト全体をhtmlspecialcharsにスローします。htmlのエスケープは、URL境界の誤検出を引き起こす可能性があるため、これは理想的ではありません。
「そして覚えておいてください。誰も完璧ではありません。」によって示されているように。行（スペースが不足しているため、誰もURLとして扱われません）、有効なトップレベルドメインをさらにチェックする必要がある場合があります。

編集preg_replace_callback：次のコードは上記の2つの問題を修正しますが、を使用して多かれ少なかれ再実装しているため、かなり冗長ですpreg_match。

// Solution 2:

$validTlds = array_fill_keys(explode(" ", ".aero .asia .biz .cat .com .coop .edu .gov .info .int .jobs .mil .mobi .museum .name .net .org .pro .tel .travel .ac .ad .ae .af .ag .ai .al .am .an .ao .aq .ar .as .at .au .aw .ax .az .ba .bb .bd .be .bf .bg .bh .bi .bj .bm .bn .bo .br .bs .bt .bv .bw .by .bz .ca .cc .cd .cf .cg .ch .ci .ck .cl .cm .cn .co .cr .cu .cv .cx .cy .cz .de .dj .dk .dm .do .dz .ec .ee .eg .er .es .et .eu .fi .fj .fk .fm .fo .fr .ga .gb .gd .ge .gf .gg .gh .gi .gl .gm .gn .gp .gq .gr .gs .gt .gu .gw .gy .hk .hm .hn .hr .ht .hu .id .ie .il .im .in .io .iq .ir .is .it .je .jm .jo .jp .ke .kg .kh .ki .km .kn .kp .kr .kw .ky .kz .la .lb .lc .li .lk .lr .ls .lt .lu .lv .ly .ma .mc .md .me .mg .mh .mk .ml .mm .mn .mo .mp .mq .mr .ms .mt .mu .mv .mw .mx .my .mz .na .nc .ne .nf .ng .ni .nl .no .np .nr .nu .nz .om .pa .pe .pf .pg .ph .pk .pl .pm .pn .pr .ps .pt .pw .py .qa .re .ro .rs .ru .rw .sa .sb .sc .sd .se .sg .sh .si .sj .sk .sl .sm .sn .so .sr .st .su .sv .sy .sz .tc .td .tf .tg .th .tj .tk .tl .tm .tn .to .tp .tr .tt .tv .tw .tz .ua .ug .uk .us .uy .uz .va .vc .ve .vg .vi .vn .vu .wf .ws .ye .yt .yu .za .zm .zw .xn--0zwm56d .xn--11b5bs3a9aj6g .xn--80akhbyknj4f .xn--9t4b11yi5a .xn--deba0ad .xn--g6w251d .xn--hgbk6aj7f53bba .xn--hlcj6aya9esc7a .xn--jxalpdlp .xn--kgbechtv .xn--zckzah .arpa"), true);

$position = 0;
while (preg_match("{\\b$rexProtocol$rexDomain$rexPort$rexPath$rexQuery$rexFragment(?=[?.!,;:\"]?(\s|$))}", $text, &$match, PREG_OFFSET_CAPTURE, $position))
{
    list($url, $urlPosition) = $match[0];

    // Print the text leading up to the URL.
    print(htmlspecialchars(substr($text, $position, $urlPosition - $position)));

    $domain = $match[2][0];
    $port   = $match[3][0];
    $path   = $match[4][0];

    // Check if the TLD is valid - or that $domain is an IP address.
    $tld = strtolower(strrchr($domain, '.'));
    if (preg_match('{\.[0-9]{1,3}}', $tld) || isset($validTlds[$tld]))
    {
        // Prepend http:// if no protocol specified
        $completeUrl = $match[1][0] ? $url : "http://$url";

        // Print the hyperlink.
        printf('<a href="%s">%s</a>', htmlspecialchars($completeUrl), htmlspecialchars("$domain$port$path"));
    }
    else
    {
        // Not a valid URL.
        print(htmlspecialchars($url));
    }

    // Continue text parsing from after the URL.
    $position = $urlPosition + strlen($url);
}

// Print the remainder of the text.
print(htmlspecialchars(substr($text, $position)));

score 15 · Accepted Answer

これは私が試してテストしたものです

function make_links_blank($text)
{
  return  preg_replace(
     array(
       '/(?(?=<a[^>]*>.+<\/a>)
             (?:<a[^>]*>.+<\/a>)
             |
             ([^="\']?)((?:https?|ftp|bf2|):\/\/[^<> \n\r]+)
         )/iex',
       '/<a([^>]*)target="?[^"\']+"?/i',
       '/<a([^>]+)>/i',
       '/(^|\s)(www.[^<> \n\r]+)/iex',
       '/(([_A-Za-z0-9-]+)(\\.[_A-Za-z0-9-]+)*@([A-Za-z0-9-]+)
       (\\.[A-Za-z0-9-]+)*)/iex'
       ),
     array(
       "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\">\\2</a>\\3':'\\0'))",
       '<a\\1',
       '<a\\1 target="_blank">',
       "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\">\\2</a>\\3':'\\0'))",
       "stripslashes((strlen('\\2')>0?'<a href=\"mailto:\\0\">\\0</a>':'\\0'))"
       ),
       $text
   );
}

わたしにはできる。そして、それは電子メールと URL で機能します。申し訳ありませんが、私自身の質問に答えてください。:(

しかし、これは機能する唯一のものです

ここに私が見つけたリンクがあります: http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_21878567.html

専門家交換であることを事前に確認してください。

score 15 · Accepted Answer

あなたは、状況によっては良い高度で複雑なものについて話していますが、ほとんどの場合、単純な不注意な解決策が必要です. 単純にこれはどうですか？

preg_replace('/(http[s]{0,1}\:\/\/\S{4,})\s{0,}/ims', '<a href="$1" target="_blank">$1</a> ', $text_msg);

試してみて、満足できないクレイジーなURLを教えてください。

score 4 · Accepted Answer

私はこの機能を使用してきました、それは私のために働きます

function AutoLinkUrls($str,$popup = FALSE){
    if (preg_match_all("#(^|\s|\()((http(s?)://)|(www\.))(\w+[^\s\)\<]+)#i", $str, $matches)){
        $pop = ($popup == TRUE) ? " target=\"_blank\" " : "";
        for ($i = 0; $i < count($matches['0']); $i++){
            $period = '';
            if (preg_match("|\.$|", $matches['6'][$i])){
                $period = '.';
                $matches['6'][$i] = substr($matches['6'][$i], 0, -1);
            }
            $str = str_replace($matches['0'][$i],
                    $matches['1'][$i].'<a href="http'.
                    $matches['4'][$i].'://'.
                    $matches['5'][$i].
                    $matches['6'][$i].'"'.$pop.'>http'.
                    $matches['4'][$i].'://'.
                    $matches['5'][$i].
                    $matches['6'][$i].'</a>'.
                    $period, $str);
        }//end for
    }//end if
    return $str;
}//end AutoLinkUrls

すべてのクレジットはhttp://snipplr.com/view/68586/に送られます

楽しみ！

score 4 · Accepted Answer

関数で正規表現を使用するコードは次のとおりです

<?php
//Function definations
function MakeUrls($str)
{
$find=array('`((?:https?|ftp)://\S+[[:alnum:]]/?)`si','`((?<!//)(www\.\S+[[:alnum:]]/?))`si');

$replace=array('<a href="$1" target="_blank">$1</a>', '<a href="http://$1" target="_blank">$1</a>');

return preg_replace($find,$replace,$str);
}
//Function testing
$str="www.cloudlibz.com";
$str=MakeUrls($str);
echo $str;
?>

score 1 · Accepted Answer

この正規表現は、これらの新しい 3 文字以上のトップレベルドメインを除くすべてのリンクと一致する必要があります...

{
  \\b
  # 先頭部分に一致 (proto://hostname、または単にホスト名)
  (
    # http://、または https:// 先頭部分
    (https?)://[-\\w]+(\\.\\w[-\\w]*)+
  | |
    # または、より具体的な部分式でホスト名を見つけようとする
    (?i: [a-z0-9] (?:[-a-z0-9]*[a-z0-9])? \\. )+ # サブドメイン
    # 現在は .com などを終了しています。これらには小文字が必要です
    (?-i:com\\b
        | | 教育\\b
        | | ビジネス\\b
        | | 政府\\b
        | | in(?:t|fo)\\b # .int または .info
        | | ミル\\b
        | | ネット\\b
        | | 組織\\b
        | | [az][az]\\.[az][az]\\b # 2 文字の国コード
    )
  )

  # オプションのポート番号を許可
  ( : \\d+ )?

  # 残りの URL はオプションで、/ で始まります
  (
    /
    # 残りはうまくいくように見えるヒューリスティックです
    [^.!,?;"\\'()\[\]\{\}\s\x7F-\\xFF]*
    (
      [.!,?]+ [^.!,?;"\\'()\\[\\]\{\\}\s\\x7F-\\xFF]+
    )*
  )?
}ix

それは私が書いたものではありません。どこから入手したのかよくわかりません。申し訳ありませんが、クレジットを与えることができません...

score 1 · Accepted Answer

これにより、メールアドレスが取得されます。

$string = "bah bah steve@gmail.com foo";
$match = preg_match('/[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+(?:\.[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+)*\@[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+(?:\.[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+)+/', $string, $array);
print_r($array);

// outputs:
Array
(
    [0] => steve@gmail.com
)

score 1 · Accepted Answer

上記のコメントの 1 つで述べたように、php 7 を実行している私の VPS は警告を発し始めましたWarning: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead . 置換後のバッファは空/偽でした。

コードを書き直し、いくつかの改善を行いました。作成者セクションに参加する必要があると思われる場合は、関数 make_links_blank 名の上のコメントを自由に編集してください。出力に空白が挿入されるのを避けるために、意図的に最後の php ?> を使用していません。

<?php

class App_Updater_String_Util {
    public static function get_default_link_attribs( $regex_matches = [] ) {
        $t = ' target="_blank" ';
        return $t;
    }

    /**
     * App_Updater_String_Util::set_protocol();
     * @param string $link
     * @return string
     */
    public static function set_protocol( $link ) {
        if ( ! preg_match( '#^https?#si', $link ) ) {
            $link = 'http://' . $link;
        }
        return $link;
    }

/**
     * Goes through text and makes whatever text that look like a link an html link
     * which opens in a new tab/window (by adding target attribute).
     * 
     * Usage: App_Updater_String_Util::make_links_blank( $text );
     * 
     * @param str $text
     * @return str
     * @see http://stackoverflow.com/questions/1188129/replace-urls-in-text-with-html-links
     * @author Angel.King.47 | http://dashee.co.uk
     * @author Svetoslav Marinov (Slavi) | http://orbisius.com
     */
    public static function make_links_blank( $text ) {
        $patterns = [
            '#(?(?=<a[^>]*>.+?<\/a>)
                 (?:<a[^>]*>.+<\/a>)
                 |
                 ([^="\']?)((?:https?|ftp):\/\/[^<> \n\r]+)
             )#six' => function ( $matches ) {
                $r1 = empty( $matches[1] ) ? '' : $matches[1];
                $r2 = empty( $matches[2] ) ? '' : $matches[2];
                $r3 = empty( $matches[3] ) ? '' : $matches[3];

                $r2 = empty( $r2 ) ? '' : App_Updater_String_Util::set_protocol( $r2 );
                $res = ! empty( $r2 ) ? "$r1<a href=\"$r2\">$r2</a>$r3" : $matches[0];
                $res = stripslashes( $res );

                return $res;
             },

            '#(^|\s)((?:https?://|www\.|https?://www\.)[^<>\ \n\r]+)#six' => function ( $matches ) {
                $r1 = empty( $matches[1] ) ? '' : $matches[1];
                $r2 = empty( $matches[2] ) ? '' : $matches[2];
                $r3 = empty( $matches[3] ) ? '' : $matches[3];

                $r2 = ! empty( $r2 ) ? App_Updater_String_Util::set_protocol( $r2 ) : '';
                $res = ! empty( $r2 ) ? "$r1<a href=\"$r2\">$r2</a>$r3" : $matches[0];
                $res = stripslashes( $res );

                return $res;
            },

            // Remove any target attribs (if any)
            '#<a([^>]*)target="?[^"\']+"?#si' => '<a\\1',

            // Put the target attrib
            '#<a([^>]+)>#si' => '<a\\1 target="_blank">',

            // Make emails clickable Mailto links
            '/(([\w\-]+)(\\.[\w\-]+)*@([\w\-]+)
                (\\.[\w\-]+)*)/six' => function ( $matches ) {

                $r = $matches[0];
                $res = ! empty( $r ) ? "<a href=\"mailto:$r\">$r</a>" : $r;
                $res = stripslashes( $res );

                return $res;
            },
        ];

        foreach ( $patterns as $regex => $callback_or_replace ) {
            if ( is_callable( $callback_or_replace ) ) {
                $text = preg_replace_callback( $regex, $callback_or_replace, $text );
            } else {
                $text = preg_replace( $regex, $callback_or_replace, $text );
            }
        }

        return $text;
    }
}

score 1 · Accepted Answer

この回答は受け入れられており、この質問はかなり古いことを知っていますが、他の実装を探している他の人に役立つ可能性があります。

これは、2009 年 7 月 27 日に Angel.King.47 によって投稿されたコードの修正版です。

$text = preg_replace(
 array(
   '/(^|\s|>)(www.[^<> \n\r]+)/iex',
   '/(^|\s|>)([_A-Za-z0-9-]+(\\.[A-Za-z]{2,3})?\\.[A-Za-z]{2,4}\\/[^<> \n\r]+)/iex',
   '/(?(?=<a[^>]*>.+<\/a>)(?:<a[^>]*>.+<\/a>)|([^="\']?)((?:https?):\/\/([^<> \n\r]+)))/iex'
 ),  
 array(
   "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a>&nbsp;\\3':'\\0'))",
   "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a>&nbsp;\\4':'\\0'))",
   "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\" target=\"_blank\">\\3</a>&nbsp;':'\\0'))",
 ),  
 $text
);

変更点:

ルール 2 と 3 を削除しました (どの状況で役立つかはわかりません)。
本当に必要ないので、メールの解析を削除しました。
[ドメイン]/* (www なし) の形式で URL を認識できるようにするルールをもう 1 つ追加しました。例: "example.com/faq/" (複数の tld: domain.{2-3}.{2-4}/)
「http://」で始まる文字列を解析すると、リンクラベルから削除されます。
すべてのリンクに「target='_blank'」を追加しました。
URL は任意の (?) タグの直後に指定できます。例: <b>www.example.com</b>

「Søren Løvborg」が述べているように、この関数は URL をエスケープしません。彼/彼女のクラスを試してみましたが、期待どおりに動作しませんでした (ユーザーを信頼していない場合は、最初に彼/彼女のコードを試してください)。

score 0 · Accepted Answer

の行に沿った何か:

<?php
if(preg_match('@^http://(.*)\s|$@g', $textarea_url, $matches)) {
    echo '<a href=http://", $matches[1], '">', $matches[1], '</a>';
}
?>

score 0 · Accepted Answer

IANA を信頼したい場合は、公式にサポートされている現在使用されている TLD のリストを次のように取得できます。

  $validTLDs = 
explode("\n", file_get_contents('http://data.iana.org/TLD/tlds-alpha-by-domain.txt')); //get the official list of valid tlds
  array_shift($validTLDs); //throw away first line containing meta data
  array_pop($validTLDs); //throw away last element which is empty

Søren Løvborg のソリューション #2 の冗長性を少し減らし、リストを更新する手間を省きます。最近では、新しい tld が不注意に捨てられています ;)

score 0 · Accepted Answer

これは私にとってはうまくいきました（答えの1つをPHP関数に変えました）

function make_urls_from_text ($text){
   return preg_replace('/(http[s]{0,1}\:\/\/\S{4,})\s{0,}/ims', '<a href="$1" target="_blank">$1 </a>', $text);
}

score 0 · Accepted Answer

これclassにより、URL がテキストに変更され、ホーム URL はそのまま維持されます。これが役に立ち、時間を節約できることを願っています。お楽しみください。

class RegClass 
{ 

     function preg_callback_url($matches) 
     { 
        //var_dump($matches); 
        //Get the matched URL  text <a>text</a>
        $text = $matches[2];
        //Get the matched URL link <a href ="http://www.test.com">text</a>
        $url = $matches[1];

        if($url=='href ="http://www.test.com"'){
         //replace all a tag as it is
         return '<a href='.$url.' rel="nofollow"> '.$text.' </a>'; 

         }else{
         //replace all a tag to text
         return " $text " ;
         }
} 
function ParseText($text){ 

    $text = preg_replace( "/www\./", "http://www.", $text );
        $regex ="/http:\/\/http:\/\/www\./"
    $text = preg_replace( $regex, "http://www.", $text );
        $regex2 = "/https:\/\/http:\/\/www\./";
    $text = preg_replace( $regex2, "https://www.", $text );

        return preg_replace_callback('/<a\s(.+?)>(.+?)<\/a>/is',
                array( &$this,        'preg_callback_url'), $text); 
      } 

} 
$regexp = new RegClass();
echo $regexp->ParseText($text);

score -2 · Accepted Answer

完全な URL 仕様に一致させることは困難ですが、一般的にうまく機能する正規表現を次に示します。

([\w-]+(\.[\w-]+)*@([a-z0-9-]+(\.[a-z0-9-]+)*?\.[a-z]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?)

ただし、これを preg_replace で使用するには、エスケープする必要があります。そのとおり：

$pattern = "/([\\w-]+(\\.[\\w-]+)*@([a-z0-9-]+(\\.[a-z0-9-]+)*?\\.[a-z]{2,6}|(\\d{1,3}\\.){3}\\d{1,3})(:\\d{4})?)/";
$replaced_texttext = preg_replace($pattern, '<a href="$0" title="$0">$0</a>', $text);

php - テキスト内の URL を HTML リンクに置き換える

17 に答える 17

Related

Reference