r - Rのテキスト行数

Question

Rで基本的なテキスト分析を行っており、Rにロードする.txtファイルからのトランスクリプトの行数をカウントしたいと思います。次の例では、各スピーカーが新しい行を添付するカウントを生成します。スミス氏=4、ゴードン氏= 6、カタラーノ氏=3のような行数。

[71] "\" 511 \ "\ t \"
MRスミス：スピーカーさん、私はこれに同意する精神が好きです。FUFAの管理はここにあります。FUFAはコンジットとして使用できますが、その意図は何ですか。ベティ・カミヤが育てたのは何のホン。ローズ・ナマヤンジャはオクフワだと言っています。これを達成したプレイヤーに感謝の印を与えるだけです。\ ""
[72] "\" 513 \ "\ t \"
MRゴードン：スピーカーさん、どうもありがとうございました。FUFAは組織であり、プレーヤーは私たちのためにカップを手に入れたものです。サッカーだけでなく、すべての活動でモチベーションを高めるには、成功した人々に報酬を与える必要があります。この場合、FUFAの問題について聞いたことがあります。彼らは水道代を支払っていません、そして彼らは水道代を支払うためにこのお金を取ることができます。このお金がプレーヤーとコーチに送られることに同意した場合、そこに行くと、彼らは金額を知り、彼らは自分たちの間に座って、私たちが与えたものに従って分配します。（拍手）ありがとうございます。\ ""
[73] "\" 515 \ "\ t \"
MR Catalano：スピーカーさん、親愛なる同僚に情報を提供したいと思います。精神はとても良いですが、FUFAの管理がこれを実現させたものであることに注意する必要があります。プレイヤーへのお金。これは、FUFAが非常に信頼できることを示しています。これは私たちが話している古いFUFAではありません。\"」

関数countLine（）は、接続が必要なため機能しません。これらは、Rにインポートされた.txtだけです。行数は、テキストが開かれているもののフォーマットの影響を受けることを理解していますが、これは実行可能です。ありがとう。

score 3 · Accepted Answer

あなたの例は再現可能だとは思わなかったので、あなたが投稿したものを含むように編集しましたが、名前が一致するかどうかはわかりません:

txtvec <-   structure(list(`'511'   ` = "MR Smith: Mr Speaker, I like the spirit in which we are agreeing on this. The administration of FUFA is present here. FUFA could be used as a conduit, but the intention of what hon. Beti Kamya brought up and what hon. Rose Namayanja has said was okufuwa - just giving a token of appreciation to the players who achieved this.\"", 
    `'513'  ` = "MR Gordon: Thank you very much, Mr Speaker. FUFA is an organisation and the players are the ones who got the cup for us. To promote motivation in all activities, not only football, you should remunerate people who have done well. In this case, we have heard about FUFA with their problems. They have not paid water bills and they can take this money to pay the water bills. If we agree that this money is supposed to go to the players and the coaches, then when it goes there they would know the amount and they will sit among themselves and distribute according to what we will have given. (Applause) I thank you.\"", 
    `'515'  ` = "MR Catalano: Mr Speaker, I want to give information to my dear colleagues. The spirit is very good but you must be mindful that the administration of FUFA is what has made this happen. The money to the players. That indicates to you that FUFA is very trustworthy. This is not the old FUFA we are talking about.\""), .Names = c("'511'\t", 
"'513'\t", "'515'\t"))

したがって、それは問題であるか、正規表現を実行して結果を表にまとめることだけです。

> table( sapply(txtvec, function(x) sub("(^MR.+)\\:.+", "\\1", x) ) )
#MR Catalano   MR Gordon    MR Smith 
           1           1           1

名前が元の構造にないという懸念が表明されました。これは、名前のないベクトルとわずかに変更された正規表現を使用した別のバージョンです。

txtvec <-  c("\"511\"\t\"\nMR Smith: Mr Speaker, I like the spirit in which we are agreeing on this. The administration of FUFA is present here. FUFA could be used as a conduit, but the intention of what hon. Beti Kamya brought up and what hon. Rose Namayanja has said was okufuwa - just giving a token of appreciation to the players who achieved this.\"", 
"\"513\"\t\"\nMR Gordon: Thank you very much, Mr Speaker. FUFA is an organisation and the players are the ones who got the cup for us. To promote motivation in all activities, not only football, you should remunerate people who have done well. In this case, we have heard about FUFA with their problems. They have not paid water bills and they can take this money to pay the water bills. If we agree that this money is supposed to go to the players and the coaches, then when it goes there they would know the amount and they will sit among themselves and distribute according to what we will have given. (Applause) I thank you.\"", 
"\"515\"\t\"\nMR Catalano: Mr Speaker, I want to give information to my dear colleagues. The spirit is very good but you must be mindful that the administration of FUFA is what has made this happen. The money to the players. That indicates to you that FUFA is very trustworthy. This is not the old FUFA we are talking about.\""
)

 table( sapply(txtvec, function(x) sub(".+\\n(MR.+)\\:.+", "\\1", x) ) )

#MR Catalano   MR Gordon    MR Smith 
#          1           1           1

これらが1行あたり80文字のラッピングデバイスで占有する「行」の数を数えるには、次のコードを使用できます（これは簡単に関数に変換できます）：

 sapply(txtvec, function(tt) 1+nchar(tt) %/% 80)
#[1] 5 8 4

score 2 · Accepted Answer

これはコメントで提起されていますが、それが独自の答えであることは本当にむき出しです:

「行」とは何かを定義せずに「行を数える」ことはできません。行は非常にあいまいな概念であり、使用するプログラムによって異なる場合があります。

もちろん、データになどの改行\nのインジケータが含まれている場合を除きます。しかし、それでも、行数を数えることはなく、改行を数えることになります。次に、ハードコーディングされた改行が分析したいものと一致しているかどうかを自問する必要があります。

--

データに改行が含まれていなくても、行数を数えたい場合は、「行をどのように定義するか」という質問に戻ります。最も基本的な方法は、@flodel が示唆するように、文字の長さを使用することです。たとえば、1 行の長さを 76 文字として定義し、

ceiling(nchar(X) / 76))

もちろん、これは単語をカットできることを前提としています。(単語を完全なものにする必要がある場合は、より狡猾になる必要があります)

r - Rのテキスト行数

2 に答える 2

Related

Reference