php - HTML を変更せずに HTML 文字列のコンテンツを操作する

Question

HTMLの文字列がある場合、おそらく次のようになります...

<h2>Header</h2><p>all the <span class="bright">content</span> here</p>

そして、たとえばすべての単語が逆になるように文字列を操作したい...

<h2>redaeH</h2><p>lla eht <span class="bright">tnetnoc</span> ereh</p>

HTML から文字列を抽出し、関数に渡して変更された結果を取得する方法を知っていますが、HTML を保持しながらどのように行うのでしょうか?

私は非言語固有のソリューションを好みますが、言語固有である必要がある場合は、php/javascript を知っておくと便利です。

編集

また、複数のDOM要素にまたがるテキストを操作できるようにしたい...

Quick<em>Draw</em>McGraw

warGcM<em>warD</em>kciuQ

別の編集

現在、オリジナルを配列に格納し、トークンを無視する操作を行い、トークンを配列の値に置き換えながら、何らかの方法ですべての HTML ノードを一意のトークンに置き換えることを考えています。

このアプローチは非常に複雑に思えます。また、REGEX を使用せずにすべての HTML を置き換える方法がわかりません。これにより、スタックオーバーフローの刑務所の島に行くことができることがわかりました。

まだ別の編集

ここで問題を明確にしたい。多数の DOM 要素に対してテキスト操作を実行したいx- たとえば、式が単語の途中で文字をランダムに移動し、開始と終了を同じままにする場合、これを実行できるようにしたい...

<em>going</em><i>home</i>

に変換します

<em>goonh</em><i>gmie</i>

したがって、HTML 要素はそのまま残りますが、内部の文字列コンテンツはgoinghome、操作式によって選択された方法で操作されます (全体として、この例では操作式に渡されます)。

score 1 · Accepted Answer

こんにちは、私はずっと前にこの状況に陥り、次のコードを使用しました。ここに大まかなコードがあります

<?php
function keepcase($word, $replace) {
   $replace[0] = (ctype_upper($word[0]) ? strtoupper($replace[0]) : $replace[0]);
   return $replace;
}

// regex - match the contents grouping into HTMLTAG and non-HTMLTAG chunks
$re = '%(</?\w++[^<>]*+>)                 # grab HTML open or close TAG into group 1
|                                         # or...
([^<]*+(?:(?!</?\w++[^<>]*+>)<[^<]*+)*+)  # grab non-HTMLTAG text into group 2
%x';

$contents = '<h2>Header</h2><p>the <span class="bright">content</span> here</p>';

// walk through the content, chunk, by chunk, replacing words in non-NTMLTAG chunks only
$contents = preg_replace_callback($re, 'callback_func', $contents);

function callback_func($matches) { // here's the callback function
    if ($matches[1]) {             // Case 1: this is a HTMLTAG
        return $matches[1];        // return HTMLTAG unmodified
    }
    elseif (isset($matches[2])) {  // Case 2: a non-HTMLTAG chunk.
                                   // declare these here
                                   // or use as global vars?
        return preg_replace('/\b' . $matches[2] . '\b/ei', "keepcase('\\0', '".strrev($matches[2])."')",
            $matches[2]);
    }
    exit("Error!");                // never get here
}
echo ($contents);
?>

score 1 · Accepted Answer

テキストを変更せずに同様の視覚効果を実現したい場合は、css でごまかすことができます。

h2, p {
  direction: rtl;
  unicode-bidi: bidi-override;
}

これにより、テキストが反転します

フィドルの例: http://jsfiddle.net/pn6Ga/

score 0 · Accepted Answer

私は非常にうまく機能しているように見えるバージョンを実装しましたが、テキストから html タグを抽出するためにまだ (かなり一般的で見掛け倒しの) 正規表現を使用しています。ここでは、コメント付きの JavaScript になっています。

方法

/**
* Manipulate text inside HTML according to passed function
* @param html the html string to manipulate
* @param manipulator the funciton to manipulate with (will be passed single word)
* @returns manipulated string including unmodified HTML
*
* Currently limited in that manipulator operates on words determined by regex
* word boundaries, and must return same length manipulated word
*
*/

var manipulate = function(html, manipulator) {

  var block, tag, words, i,
    final = '', // used to prepare return value
    tags = [], // used to store tags as they are stripped from the html string
    x = 0; // used to track the number of characters the html string is reduced by during stripping

  // remove tags from html string, and use callback to store them with their index
  // then split by word boundaries to get plain words from original html
  words = html.replace(/<.+?>/g, function(match, index) {
    tags.unshift({
      match: match,
      index: index - x
    });
    x += match.length;
    return '';
  }).split(/\b/);

  // loop through each word and build the final string
  // appending the word, or manipulated word if not a boundary
  for (i = 0; i < words.length; i++) {
    final += i % 2 ? words[i] : manipulator(words[i]);
  }

  // loop through each stored tag, and insert into final string
  for (i = 0; i < tags.length; i++) {
    final = final.slice(0, tags[i].index) + tags[i].match + final.slice(tags[i].index);
  }

  // ready to go!
  return final;

};

上記で定義された関数は、HTML の文字列と、HTML 要素によって分割されているかどうかに関係なく、文字列内の単語に作用する操作関数を受け入れます。

最初にすべての HTML タグを削除し、取得元のインデックスと共にタグを保存し、次にテキストを操作して、タグを元の位置に逆の順序で追加します。

テスト

/**
 * Test our function with various input
 */

var reverse, rutherford, shuffle, text, titleCase;

// set our test html string
text = "<h2>Header</h2><p>all the <span class=\"bright\">content</span> here</p>\nQuick<em>Draw</em>McGraw\n<em>going</em><i>home</i>";

// function used to reverse words
reverse = function(s) {
  return s.split('').reverse().join('');
};

// function used by rutherford to return a shuffled array
shuffle = function(a) {
  return a.sort(function() {
    return Math.round(Math.random()) - 0.5;
  });
};

// function used to shuffle the middle of words, leaving each end undisturbed
rutherford = function(inc) {
  var m = inc.match(/^(.?)(.*?)(.)$/);
  return m[1] + shuffle(m[2].split('')).join('') + m[3];
};

// function to make word Title Cased
titleCase = function(s) {
  return s.replace(/./, function(w) {
    return w.toUpperCase();
  });
};

console.log(manipulate(text, reverse));
console.log(manipulate(text, rutherford));
console.log(manipulate(text, titleCase));

見出しと段落のテキストが別々の単語として認識されないなど、まだいくつかの癖があります (インラインタグではなく、別々のブロックレベルのタグにあるため)。

また、文字列操作式を実際にテキストを置換/移動するのではなく、実際に追加および削除できるようにしたいと考えています (したがって、操作後の文字列の長さは可変です)。 .

今、私はコードにいくつかのコメントを追加し、javascript の要点としてそれを載せました。誰かがそれを改善することを願っています - 特に誰かが正規表現部分を削除してより良いものに置き換えることができれば!

要点: https://gist.github.com/3309906

デモ: http://jsfiddle.net/gh/gist/underscore/1/3309906/

(コンソールに出力)

そして最後に HTML パーサーを使用する

(http://ejohn.org/files/htmlparser.js)

デモ: http://jsfiddle.net/EDJyU/

score 0 · Accepted Answer

jqueryを使用できますか？

$('div *').each(function(){
    text = $(this).text();
    text = text.split('');
    text = text.reverse();
    text = text.join('');
    $(this).text(text);
});

ここを参照してください - http://jsfiddle.net/GCAvb/

score 0 · Accepted Answer

DOM API を提供する何かで HTML を解析します。

要素の子ノードをループする関数を作成します。

ノードがテキストノードの場合は、データを文字列として取得し、それを単語に分割し、それぞれを反転してから、元に戻します。

ノードが要素の場合は、関数に再帰します。

php - HTML を変更せずに HTML 文字列のコンテンツを操作する

編集

別の編集

まだ別の編集

6 に答える 6

方法

テスト

要点: https://gist.github.com/3309906

デモ: http://jsfiddle.net/gh/gist/underscore/1/3309906/

そして最後に HTML パーサーを使用する

Related

Reference