html - MATLAB で一部の文字列を無視する

Question

HTML タグに含まれるテキストを抽出したいと考えています。例えば：

<html><body>this is a warning message. wrongs values</body></html>

結果は、すべての HTML タグを無視してメッセージを取得する必要があります。

誰か提案はありますか？

score 1 · Accepted Answer

正規表現を使用してHTML タグを削除できます。

str = '<html><body>this is a warning message. wrongs values</body></html>';
str2 = regexprep(str, '<[^>]*>', '')

score 1 · Accepted Answer

次のようなものが必要です。

 a = sscanf('<html><body>this is a warning message. wrongs values</body></html>','<html><body>%[a-zA-Z., ]</body></html>')

2 に答える 2