linux - シェルプログラム-ファイル内の平均語長を決定します

Question

ファイル内の平均語長を決定するシェルプログラムを作成しようとしています。どういうわけか使う必要があると思いますwc。expr正しい方向へのガイダンスは素晴らしいでしょう！

score 4 · Accepted Answer

ファイルがASCIIであり、wc実際に読み取ることができると仮定します...

chars=$(cat inputfile | wc -c)
words=$(cat inputfile | wc -w)

次に、単純な

avg_word_size=$(( ${chars} / ${words} ))

（丸められた）整数を計算します。ただし、丸め誤差よりも「間違っている」ことになります。平均的な単語サイズにもすべての空白文字が含まれていることになります。そして、私はあなたがより正確になりたいと思います...

以下では、100を掛けた数値から丸められた整数を計算することにより、精度をいくらか向上させます。

_100x_avg_word_size=$(( $((${chars} * 100)) / ${words} ))

これで、それを世界に伝えるために使用できます。

 echo "Avarage word size is: ${avg_word_size}.${_100x_avg_word_size: -2:2}"

さらに洗練するために、単語を区切る空白文字は1つだけであると想定できます。

 chars=$(cat inputfile | wc -c)
 words=$(cat inputfile | wc -w)

 avg_word_size=$(( $(( ${chars} - $(( ${words} - 1 )) )) / ${words} ))
 _100x_avg_word_size=$(( $((${chars} * 100)) / ${words} ))

 echo "Avarage word size is: ${avg_word_size}.${_100x_avg_word_size: -2:2}"

今度は、「線」の概念を計算に含めるのがあなたの仕事です... :-)

score 1 · Accepted Answer

wc更新：とこの方法の違いを明確に（うまくいけば）示すため。「改行が多すぎる」バグを修正しました。また、単語の末尾のアポストロフィのより細かい制御が追加されました。

をと見なしたい場合は、単独で使用wordすることをお勧めします。ただし、話し言葉/書き言葉の単語と見なしたい場合は、単語の構文解析には使用できません。bash wordwc
wordwc

たとえば、次の単語には1語（平均サイズ= 112.00）がwc含まれていると見なされますが、以下のスクリプトでは、19語（平均サイズ= 4.58）が含まれていることが示されています。

"/home/axiom/zap_notes/apps/eng-hin-devnag-itrans/Platt's_Urdu_and_classical_Hindi_to_English_-_preface5.doc't"

Kurtのスクリプトを使用すると、次の行には7語（平均サイズ= 8.14）が含まれていることが示されて
いますが、以下に示すスクリプトには7語（平均サイズ= 4.43）が含まれていることが示されています... बे=2文字

"बे  = {Platts} ... —be-ḵẖẉabī, s.f. Sleeplessness:"

だから、もしwcあなたの味が良いなら、そしてそうでなければ、このようなものが合うかもしれません：

# Cater for special situation words: eg 's and 't   
# Convert each group of anything which isn't a "character" (including '_') into a newline.  
# Then, convert each CHARACTER which isn't a newline into a BYTE (not character!).  
# This leaves one 'word' per line, each 'word' being made up of the same BYTE ('x').  
# 
# Without any options, wc prints  newline, word, and byte counts (in that order),
#  so we can capture all 3 values in a bash array
#  
# Use `awk` as a floating point calculator (bash can only do integer arithmetic)

count=($(sed "s/\>'s\([[:punct:]]\|$\)/\1/g      # ignore apostrophe-s ('s) word endings 
              s/'t\>/xt/g      # consider words ending in apostrophe-t ('t) as base word + 2 characters   
              s/[_[:digit:][:blank:][:punct:][:cntrl:]]\+/\n/g 
              s/^\n*//; s/\n*$//; s/[^\n]/x/g" "$file" | wc))
echo "chars / word average:" \
      $(awk -vnl=${count[0]} -vch=${count[2]} 'BEGIN{ printf( "%.2f\n", (ch-nl)/nl ) }')

linux - シェルプログラム-ファイル内の平均語長を決定します

2 に答える 2

Related

Reference