string - ハッシュへのPerl文字列解析

Question

それで、私が文字列を持っていたとしましょう。

 $my str = "Hello how are you today. Oh thats good I'm glad you are happy. Thats wonderful; thats fantastic."

各キーが一意の単語であり、値が文字列に出現する回数であるハッシュテーブルを作成したいと考えています。つまり、自動化されたプロセスにしたいと考えています。

my %words {
  "Hello" => 1,
  "are" => 2,
  "thats" => 2,
  "Thats" => 1
  };

正直なところ、私はPERLを初めて使用し、これを行う方法、句読点などを処理する方法についての手がかりがありません.

アップデート：

また、利用は可能ですか

   split('.!?;',$mystring)

この構文ではありませんが、基本的には . で分割されます。また！また？など.. ああ、そして ' ' (空白)

score 4 · Accepted Answer

これを行う簡単な方法の1つはsplit、ビューで有効な単語文字ではない任意の文字の文字列を使用することです。これは、完全な解決策ではないことに注意してください。私は単に限られた文字のセットを取りました。

[ ... ]エッジケースを見つけたら、角かっこ内に有効な単語文字を追加できます。この目的のために設計されたモジュールについては、http：//search.cpan.orgを検索することもできます。

正規表現とは、角かっこ内にない[^ ... ]すべての文字に一致することを意味します。文字のより大きなサブセットであり、他は文字通りです。ダッシュは文字クラスブラケット内のメタ文字であるため、エスケープする必要があります。\pL-

use strict;
use warnings;
use Data::Dumper;

my $str = "Hello how are you today. Oh thats good I'm glad you are happy.
           Thats wonderful; thats fantastic.";
my %hash;
$hash{$_}++                      # increase count for each field
    for                          # in the loop
    split /[^\pL'\-!?]+/, $str;  # over the list from splitting the string 
print Dumper \%hash;

出力：

$VAR1 = {
          'wonderful' => 1,
          'glad' => 1,
          'I\'m' => 1,
          'you' => 2,
          'how' => 1,
          'are' => 2,
          'fantastic' => 1,
          'good' => 1,
          'today' => 1,
          'Hello' => 1,
          'happy' => 1,
          'Oh' => 1,
          'Thats' => 1,
          'thats' => 2
        };

score 1 · Accepted Answer

これは、空白を使用して単語を区切ります。

#!/usr/bin/env perl
use strict;
use warnings;

my $str = "Hello how are you today."
        . " Oh thats good I'm glad you are happy."
        . " Thats wonderful. thats fantastic.";

# Use whitespace to split the string into single "words".
my @words = split /\s+/, $str;

# Store each word in the hash and count its occurrence.
my %hash;
for my $word ( @words ) {
    $hash{ $word }++;
}

# Show each word and its count. Using printf to align output.
for my $key ( sort keys %hash ) {
    printf "\%-10s => \%d\n", $key, $hash{ $key };
}

「本物の」単語を取得するには、微調整が必要になります。

Hello      => 1
I'm        => 1
Oh         => 1
Thats      => 1
are        => 2
fantastic. => 1
glad       => 1
good       => 1
happy.     => 1
how        => 1
thats      => 2
today.     => 1
wonderful. => 1
you        => 2

score 1 · Accepted Answer

これを試して：

use strict;
use warnings;

my $str = "Hello, how are you today. Oh thats good I'm glad you are happy. 
           Thats wonderful.";
my @strAry = split /[:,\.\s\/]+/, $str;
my %strHash;

foreach my $word(@strAry) 
{
    print "\nFOUND WORD: ".$word;
    my $exstCnt = $strHash{$word};

    if(defined($exstCnt)) 
    {
        $exstCnt++;
    } 
    else 
    {
        $exstCnt = 1;
    }

    $strHash{$word} = $exstCnt;
}

print "\n\nNOW REPORTING UNIQUE WORDS:\n";

foreach my $unqWord(sort(keys(%strHash))) 
{
    my $cnt = $strHash{$unqWord};
    print "\n".$unqWord." - ".$cnt." instances";
}

score 0 · Accepted Answer

 use YAML qw(Dump);
 use 5.010;

 my $str = "Hello how are you today. Oh thats good I'm glad you are happy. Thats wonderful; thats fantastic.";
 my @match_words = $str =~ /(\w+)/g;
 my $word_hash = {};
 foreach my $word (sort @match_words) {
     $word_hash->{$word}++;
 }
 say Dump($word_hash);
 # -------output----------
 Hello: 1
 I: 1
 Oh: 1
 Thats: 1
 are: 2
 fantastic: 1
 glad: 1
 good: 1
 happy: 1
 how: 1
 m: 1
 thats: 2
 today: 1
 wonderful: 1
 you: 2

string - ハッシュへのPerl文字列解析

4 に答える 4

Related

Reference