この 2 つのテキストを例として
my $line = "[cytokine]<ADJVNT-PROP-0> signaling, which have not [to]<PREP> date been shown [to]<PREP> be [[regulat]<EXP-V-0>ed]<EXP-PP-V-0>";
my $line2 = "[Human [papillomavirus]<VACC-PROP-0>]<VACC-PROP-0> genotype [31]<NUM> does not [express]<EXP-V-0> detectable [microRNA]<MIR-0> levels [during]<PREP> latent or productive virus replication.";
<VAC
or<ADJ
と andで囲まれた文字列をすべて抽出したいのですが<EXP
、左側に複数の一致がある場合は、最も内側から右端まで文字列を抽出します。
たとえば、上記の結果では、これらを返す単一の正規表現が必要です。
Output1: signaling, which have not [to]<PREP> date been shown [to]<PREP> be [[regulat]<EXP-V-0>ed]
Output2: genotype [31]<NUM> does not [express]
このコードが機能しない理由:
my @lines = ("[cytokine]<ADJVNT-PROP-0> signaling, which have not [to]<PREP> date been shown [to]<PREP> be [[regulat]<EXP-V-0>ed]<EXP-PP-V-0>",
"[Human [papillomavirus]<VACC-PROP-0>]<VACC-PROP-0> genotype [31]<NUM> does not [express]<EXP-V-0> detectable [microRNA]<MIR-0> levels [during]<PREP> latent or productive virus replication.");
my $count = 0;
foreach $line (@lines) {
$count++;
my ($sel) = $line =~ /<VAC|<ADJ.*>(.*)<EXP.*>/;
print "Output $count: $sel\n";
}
ここで実行可能: https://eval.in/50772
それを行う正しい方法は何ですか?