r - 複数の繰り返しテキストパターンを抽出する

Question

次のような文字列があります。

txt <- "|M  CHG  6  44  -1  48  -1  53  -1  63   1  64   1  65   1|"

最初の数字 (6) は、パターンが\\s+\\d+\\s+[\\+-]?\\d+6 回繰り返されることを意味します。実際、私はこのパターンの 2 番目の (潜在的に署名された) 桁にのみ関心があります。だから私は結果として私を与える関数または正規表現を探しています

[1] "-1" "-1" "-1" "1" "1" "1"

で試してみました

gsub( "^\\|M\\s+CHG\\s+\\d+(\\s+\\d+\\s+([\\+-]?\\d+))+\\|$", replacement="\\2", x=txt, perl=TRUE )

としても

str_replace_all( x, perl( "^\\|M\\s+CHG\\s+\\d+(\\s+\\d+\\s+([\\+-]?\\d+))+\\|$" ), "\\2" )

しかし、どちらの場合も、最後のオカレンスのみが返されました。

score 1 · Accepted Answer

1つの解決策は、最初の文字を削除することです（私はこれを a で行いましたが、または類似のregexものを使用したい場合がありますsubstr。次にmatrix、必要な次元に入れ、必要な列を返します：

#  regex to strip superfluous characters
#  but `substring( txt , 10 )` would work just as well in this example
pat <- "^\\|M\\s+CHG\\s+\\d+\\s+(.*)\\|$"
x <- gsub( pat , "\\1" , txt )

#  Get result
matrix( unlist( strsplit( x , "\\s+" ) ) , ncol = 2 , byrow = 2 )[,2]
# [1] "-1" "-1" "-1" "1"  "1"  "1"

中間matrixは次のようになります。

#     [,1] [,2]
#[1,] "44" "-1"
#[2,] "48" "-1"
#[3,] "53" "-1"
#[4,] "63" "1" 
#[5,] "64" "1" 
#[6,] "65" "1"

score 1 · Accepted Answer

端を|取り除いて分割して使用するだけです。3番目の要素と奇数要素の後にあるものだけを取ります。

    var txt, txtArray, result;

txt = "|M  CHG  6  44  -1  48  -1  53  -1  63   1  64   1  65   1|";

// Remove the end '|';
txt = txt.slice(0, -1);

// Split on one or more space...
txtArray = txt.split(/\s+/);


// Grab the odd ones only after the third element...
result = txtArray.filter(function(n, i){
  return i > 3 && i % 2 === 0;
});

console.log( result );

score 1 · Accepted Answer

もう一つ

txt <- "|M  CHG  6  44  -1  48  -1  53  -1  63   1  64   1  65   1|"    


#original
#txtsplit<-unlist(strsplit(txt, "\\s+"))
#n=as.numeric(txtsplit[3])
#o<-txtsplit[4+seq(from=1, by=2, length.out=n)]

#fixed
txtsplit<-unlist(strsplit(txt, "\\||\\s+"))
n=as.numeric(txtsplit[4])
o<-txtsplit[5+seq(from=1, by=2, length.out=n)]

#>o
[1] "-1" "-1" "-1" "1"  "1"  "1"

r - 複数の繰り返しテキスト パターンを抽出する

3 に答える 3

Related

Reference

r - 複数の繰り返しテキストパターンを抽出する