java - 正規表現：キーワード「univ」を含むタグを除くすべてのタグを削除します

Question

[introduction][position]Lead Researcher and Research Manager[/position] in the [affiliation]Web Search and Mining Group, Microsoft Research[/affiliation]</b>.

I am a [position]lead researcher[/position] at [affiliation]Microsoft Research[/affiliation]. I am also [position]adjunct professor[/position] of [affiliation]Peking University[/affiliation], [affiliation]Xian Jiaotong University[/affiliation] and [affiliation]Nankai University[/affiliation].

I joined [affiliation]Microsoft Research[/affiliation] in June 2001. Prior to that, I worked at the Research Laboratories of NEC Corporation.

I obtained a [bsdegree]B.S.[/bsdegree] in [bsmajor]Electrical Engineering[/bsmajor] from [bsuniv]Kyoto University[/bsuniv] in [bsdate]1988[/bsdate] and a [msdegree]M.S.[/msdegree] in [msmajor]Computer Science[/msmajor] from [msuniv]Kyoto University[/msuniv] in [msdate]1990[/msdate]. I earned my [phddegree]Ph.D.[/phddegree] in [phdmajor]Computer Science[/phdmajor] from the [phduniv]University of Tokyo[/phduniv] in [phddate]1998[/phddate].

I am interested in [interests]statistical learning[/interests], [interests]natural language processing[/interests], [interests]data mining, and information retrieval[/interests].[/introduction]

上記の段落からすべてのタグを削除することができます：

String stripped = html.replaceAll("\\[.*?\\]", "");

[bsuniv][/bsuniv]ただし、段落内に3組のタグ（、、[msuniv][/msuniv]および）を保持したいと思います[phduniv][/phduniv]。つまり、キーワード「univ」を含むタグを削除したくありません。正規表現を書き直す便利な方法が見つかりません。誰か助けてくれますか？

score 1 · Accepted Answer

ここでアサーションを使用できますnegative-look ahead：-

str = str.replaceAll("\\[(.(?!univ))*?\\]", "");

また： -

str = str.replaceAll("\\[((?!univ).)*?\\]", "");

それらの両方があなたに望ましい出力を与えるでしょう。違いは1つだけです-

最初の文字は、現在の文字に対して負の先読みを行い、その後にが続かない場合はuniv、次の文字に移動します。
2つ目は、すべての文字の前にある空の文字列に対して負の先読みを行い、その後にがない場合はuniv、1つの文字に一致するように進みます。

java - 正規表現：キーワード「univ」を含むタグを除くすべてのタグを削除します

1 に答える 1

Related

Reference