language-agnostic - 特定のテキストで最も一般的に使用される単語の ASCII チャートを作成します

Question

チャレンジ：

特定のテキストで最も一般的に使用される単語の ASCII チャートを作成します。

ルール：

a-zand A-Z(英字)のみを単語の一部として受け入れます。
大文字と小文字の区別は無視します (ここではShe==sheです)。
次の単語は無視してください (非常に恣意的です、私は知っています):the, and, of, to, a, i, it, in, or, is
明確化: 考慮don't: これは、範囲内の 2 つの異なる「単語」と見なされます: (a-zおよび)。A-Zdont
必要に応じて (仕様を正式に変更するには遅すぎます)、すべての 1 文字の「単語」を削除することを選択できます (これにより、無視リストも短縮される可能性があります)。

指定されたファイルを解析しtext(コマンドライン引数で指定されたファイルまたはパイプで入力されたファイルを読み取ります。推定) 、次の特性を備えたus-asciia を構築します。word frequency chart

最も一般的な 22 の単語 (頻度の降順) のグラフを表示します (以下の例も参照)。
バーwidthは、単語の出現回数 (頻度) を (比例的に) 表します。スペースを 1 つ追加して単語を出力します。
これらのバー (およびスペースと単語とスペース) が常に収まるようにしてください: bar+ [space]+ word+[space]は常に <=80文字でなければなりません (バーと単語の長さが異なる可能性があることを確認してください:頻度はそれほど変わらないものの、最初のもの)。これらの制約内でバーの幅を最大化し、バーを適切にスケーリングします (バーが表す頻度に従って)。

例：

例のテキストはここにあります(不思議の国のアリスの冒険、ルイス・キャロル著)。

この特定のテキストは、次のチャートを生成します。

_________________________________________________________________________
|__________________________________________________________________________| 彼女
|____________________________________________________________________________________________| あなた
|__________________________________________________________________________| 言った
|____________________________________________________| アリス
|____________________________________________________________| だった
|__________________________________________| それ
|___________________________________| なので
|__________________| 彼女
|____________________________| と
|____________________________| で
|______________| s
|______________| t
|__________________________| の上
|__________________________| 全て
|______________________| これ
|______________________| 為に
|______________________| 持っていました
|_____________________| しかし
|____________________| なれ
|____________________| いいえ
|___________________| 彼ら
|__________________| それで

参考までに、これらは上記のチャートが基づいている周波数です:

[('she', 553), ('you', 481), ('said', 462), ('alice', 403), ('was', 358), ('that
', 330), ('as', 274), ('her', 248), ('with', 227), ('at', 227), ('s', 219), ('t'
, 218), ('on', 204), ('all', 200), ('this', 181), ('for', 179), ('had', 178), ('
しかし', 175), ('be', 167), ('not', 166), ('they', 155), ('so', 152)]

2 番目の例 (完全な仕様を実装したかどうかを確認するため):youリンクされたAlice in Wonderlandファイル内のすべての出現箇所を次のように置き換えますsuperlongstringstring。

__________________________________________________
|_________________________________________________| 彼女
|_______________________________________________________| 超ロングストリングストリング
|_____________________________________________________| 言った
|____________________________________________________________| アリス
|________________________________________| だった
|_____________________________________| それ
|______________________________| なので
|______________| 彼女
|__________________________| と
|__________________________| で
|________________________| s
|________________________| t
|______________________| の上
|_____________________| 全て
|___________________| これ
|___________________| 為に
|___________________| 持っていました
|__________________| しかし
|_________________| なれ
|_________________| いいえ
|________________| 彼ら
|________________| それで

勝者：

最短の解決策 (文字数別、言語別)。楽しむ！

編集: これまでの結果をまとめた表 (2012-02-15) (ユーザー Nas Banov によって最初に追加されました):

言葉遣い 緩い 厳格
========= ======= ======
ゴルフスクリプト 130 143
パール 185
Windows PowerShell 148 199
マテマティカ 199
ルビー 185 205
Unix ツールチェーン 194 228
パイソン 183 243
クロージュア 282
スカラ 311
ハスケル 333
オーク 336
R 298
JavaScript 304 354
グルービー 321
マトラブ 404
C# 422
スモールトーク 386
PHP 450
F# 452
TSQL 483 507

数字は、特定の言語での最短のソリューションの長さを表します。「厳密」とは、仕様を完全に実装するソリューションを指します (バーを描画|____|し、上部の最初のバーを____線で閉じ、頻度の高い長い単語の可能性を考慮するなど)。「リラックスした」とは、解決策を短縮するためにいくつかの自由が取られたことを意味します。

500 文字より短いソリューションのみが含まれます。言語のリストは、「厳密な」ソリューションの長さによってソートされます。「Unix ツールチェーン」は、従来の *nix シェルに加えてツールの組み合わせ (grep、tr、sort、uniq、head、perl、awk など)を使用するさまざまなソリューションを表すために使用されます。

score 122 · Accepted Answer

LabVIEW 51ノード、5構造体、10ダイアグラム

象にタップダンスを教えることは決して美しいことではありません。あ、文字数は飛ばします。

プログラムは左から右に流れます。

score 42 · Accepted Answer

Ruby 1.9、185 文字

(他の Ruby ソリューションに大きく基づいています)

w=($<.read.downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).group_by{|x|x}.map{|x,y|[-y.size,x]}.sort[0,22]
k,l=w[0]
puts [?\s+?_*m=76-l.size,w.map{|f,x|?|+?_*(f*m/k)+"| "+x}]

他のソリューションのようにコマンドラインスイッチを使用する代わりに、単純にファイル名を引数として渡すことができます。(つまりruby1.9 wordfrequency.rb Alice.txt)

ここでは文字リテラルを使用しているため、このソリューションは Ruby 1.9 でのみ機能します。

編集：「読みやすさ」のためにセミコロンを改行に置き換えました。:P

編集 2: Shtééf は、末尾のスペースを忘れたことを指摘しました - それを修正しました。

編集 3: 末尾のスペースを再度削除しました;)

score 39 · Accepted Answer

GolfScript、177 175 173 167 164 163 144 131 130 文字

低速 - サンプルテキストの場合は 3 分 (130)

{32|.123%97<n@if}%]''*n%"oftoitinorisa"2/-"theandi"3/-$(1@{.3$>1{;)}if}/]2/{~~\;}$22<.0=~:2;,76\-:1'_':0*' '\@{"
|"\~1*2/0*'| '@}/

説明：

{           #loop through all characters
 32|.       #convert to uppercase and duplicate
 123%97<    #determine if is a letter
 n@if       #return either the letter or a newline
}%          #return an array (of ints)
]''*        #convert array to a string with magic
n%          #split on newline, removing blanks (stack is an array of words now)
"oftoitinorisa"   #push this string
2/          #split into groups of two, i.e. ["of" "to" "it" "in" "or" "is" "a"]
-           #remove any occurrences from the text
"theandi"3/-#remove "the", "and", and "i"
$           #sort the array of words
(1@         #takes the first word in the array, pushes a 1, reorders stack
            #the 1 is the current number of occurrences of the first word
{           #loop through the array
 .3$>1{;)}if#increment the count or push the next word and a 1
}/
]2/         #gather stack into an array and split into groups of 2
{~~\;}$     #sort by the latter element - the count of occurrences of each word
22<         #take the first 22 elements
.0=~:2;     #store the highest count
,76\-:1     #store the length of the first line
'_':0*' '\@ #make the first line
{           #loop through each word
"
|"\~        #start drawing the bar
1*2/0       #divide by zero
*'| '@      #finish drawing the bar
}/

「正しい」（うまくいけば）。(143)

{32|.123%97<n@if}%]''*n%"oftoitinorisa"2/-"theandi"3/-$(1@{.3$>1{;)}if}/]2/{~~\;}$22<..0=1=:^;{~76@,-^*\/}%$0=:1'_':0*' '\@{"
|"\~1*^/0*'| '@}/

遅くなりません-30分。(162)

'"'/' ':S*n/S*'"#{%q
'\+"
.downcase.tr('^a-z','
')}\""+~n%"oftoitinorisa"2/-"theandi"3/-$(1@{.3$>1{;)}if}/]2/{~~\;}$22<.0=~:2;,76\-:1'_':0*S\@{"
|"\~1*2/0*'| '@}/

リビジョンログに表示される出力。

score 35 · Accepted Answer

206

シェル、grep、tr、grep、sort、uniq、sort、head、perl

~ % wc -c wfg
209 wfg
~ % cat wfg
egrep -oi \\b[a-z]+|tr A-Z a-z|egrep -wv 'the|and|of|to|a|i|it|in|or|is'|sort|uniq -c|sort -nr|head -22|perl -lape'($f,$w)=@F;$.>1or($q,$x)=($f,76-length$w);$b="_"x($f/$q*$x);$_="|$b| $w ";$.>1or$_=" $b\n$_"'
~ % # usage:
~ % sh wfg < 11.txt

~~うーん、ちょうど上で見た: sort -nr->sort -nそしてhead-> tail=> 208 :)~~
update2: ええと、もちろん、上記はばかげています。その後は逆になります。209.
update3: 除外正規表現を最適化 -> 206

egrep -oi \\b[a-z]+|tr A-Z a-z|egrep -wv 'the|and|o[fr]|to|a|i[tns]?'|sort|uniq -c|sort -nr|head -22|perl -lape'($f,$w)=@F;$.>1or($q,$x)=($f,76-length$w);$b="_"x($f/$q*$x);$_="|$b| $w ";$.>1or$_=" $b\n$_"'

楽しみのために、ここに perl のみのバージョンがあります (はるかに高速です):

~ % wc -c pgolf
204 pgolf
~ % cat pgolf
perl -lne'$1=~/^(the|and|o[fr]|to|.|i[tns])$/i||$f{lc$1}++while/\b([a-z]+)/gi}{@w=(sort{$f{$b}<=>$f{$a}}keys%f)[0..21];$Q=$f{$_=$w[0]};$B=76-y///c;print" "."_"x$B;print"|"."_"x($B*$f{$_}/$Q)."| $_"for@w'
~ % # usage:
~ % sh pgolf < 11.txt

score 35 · Accepted Answer

Transact SQL セットベースのソリューション (SQL Server 2005) 1063 892 873 853 827 820 783 683 647 644 630 文字

文字数を減らすための有益な提案をしてくれた Gabe に感謝します。

注意: スクロールバーを避けるために追加された改行は、最後の改行のみが必要です。

DECLARE @ VARCHAR(MAX),@F REAL SELECT @=BulkColumn FROM OPENROWSET(BULK'A',
SINGLE_BLOB)x;WITH N AS(SELECT 1 i,LEFT(@,1)L UNION ALL SELECT i+1,SUBSTRING
(@,i+1,1)FROM N WHERE i<LEN(@))SELECT i,L,i-RANK()OVER(ORDER BY i)R INTO #D
FROM N WHERE L LIKE'[A-Z]'OPTION(MAXRECURSION 0)SELECT TOP 22 W,-COUNT(*)C
INTO # FROM(SELECT DISTINCT R,(SELECT''+L FROM #D WHERE R=b.R FOR XML PATH
(''))W FROM #D b)t WHERE LEN(W)>1 AND W NOT IN('the','and','of','to','it',
'in','or','is')GROUP BY W ORDER BY C SELECT @F=MIN(($76-LEN(W))/-C),@=' '+
REPLICATE('_',-MIN(C)*@F)+' 'FROM # SELECT @=@+' 
|'+REPLICATE('_',-C*@F)+'| '+W FROM # ORDER BY C PRINT @

読み取り可能なバージョン

DECLARE @  VARCHAR(MAX),
        @F REAL
SELECT @=BulkColumn
FROM   OPENROWSET(BULK'A',SINGLE_BLOB)x; /*  Loads text file from path
                                             C:\WINDOWS\system32\A  */

/*Recursive common table expression to
generate a table of numbers from 1 to string length
(and associated characters)*/
WITH N AS
     (SELECT 1 i,
             LEFT(@,1)L

     UNION ALL

     SELECT i+1,
            SUBSTRING(@,i+1,1)
     FROM   N
     WHERE  i<LEN(@)
     )
  SELECT   i,
           L,
           i-RANK()OVER(ORDER BY i)R
           /*Will group characters
           from the same word together*/
  INTO     #D
  FROM     N
  WHERE    L LIKE'[A-Z]'OPTION(MAXRECURSION 0)
             /*Assuming case insensitive accent sensitive collation*/

SELECT   TOP 22 W,
         -COUNT(*)C
INTO     #
FROM     (SELECT DISTINCT R,
                          (SELECT ''+L
                          FROM    #D
                          WHERE   R=b.R FOR XML PATH('')
                          )W
                          /*Reconstitute the word from the characters*/
         FROM             #D b
         )
         T
WHERE    LEN(W)>1
AND      W NOT IN('the',
                  'and',
                  'of' ,
                  'to' ,
                  'it' ,
                  'in' ,
                  'or' ,
                  'is')
GROUP BY W
ORDER BY C

/*Just noticed this looks risky as it relies on the order of evaluation of the 
 variables. I'm not sure that's guaranteed but it works on my machine :-) */
SELECT @F=MIN(($76-LEN(W))/-C),
       @ =' '      +REPLICATE('_',-MIN(C)*@F)+' '
FROM   #

SELECT @=@+' 
|'+REPLICATE('_',-C*@F)+'| '+W
             FROM     #
             ORDER BY C

PRINT @

出力

 _________________________________________________________________________ 
|_________________________________________________________________________| she
|_______________________________________________________________| You
|____________________________________________________________| said
|_____________________________________________________| Alice
|_______________________________________________| was
|___________________________________________| that
|____________________________________| as
|________________________________| her
|_____________________________| at
|_____________________________| with
|__________________________| on
|__________________________| all
|_______________________| This
|_______________________| for
|_______________________| had
|_______________________| but
|______________________| be
|_____________________| not
|____________________| they
|____________________| So
|___________________| very
|__________________| what

そして長い紐で

 _______________________________________________________________ 
|_______________________________________________________________| she
|_______________________________________________________| superlongstringstring
|____________________________________________________| said
|______________________________________________| Alice
|________________________________________| was
|_____________________________________| that
|_______________________________| as
|____________________________| her
|_________________________| at
|_________________________| with
|_______________________| on
|______________________| all
|____________________| This
|____________________| for
|____________________| had
|____________________| but
|___________________| be
|__________________| not
|_________________| they
|_________________| So
|________________| very
|________________| what

score 34 · Accepted Answer

ルビー207 213 211 210 207 203 201 200 文字

rfusca からの提案を取り入れた、Anurag の改良。また、並べ替えの引数とその他のいくつかのマイナーなゴルフを削除します。

w=(STDIN.read.downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).group_by{|x|x}.map{|x,y|[-y.size,x]}.sort.take 22;k,l=w[0];m=76.0-l.size;puts' '+'_'*m;w.map{|f,x|puts"|#{'_'*(m*f/k)}| #{x} "}

次のように実行します。

ruby GolfedWordFrequencies.rb < Alice.txt

編集：「puts」を元に戻します。出力に引用符が含まれないようにする必要があります。
Edit2: File->IO を変更
Edit3: /i を削除
Edit4: (f*1.0) の周りの括弧を削除、
再集計 Edit5: 最初の行に文字列の追加を使用; その場で展開s。
Edit6: m float を作成し、1.0 を削除しました。編集: 機能しません。長さを変更します。EDIT：前よりも悪くない
Edit7：使用STDIN.read.

score 28 · Accepted Answer

Mathematica ( 297 284 248 244 242 199 文字) Pure Functional

および Zipf の法則のテスト

ママを見て... varsも手も..頭もありません

編集 1> いくつかの短縮形が定義されています (284 文字)

f[x_, y_] := Flatten[Take[x, All, y]]; 

BarChart[f[{##}, -1], 
         BarOrigin -> Left, 
         ChartLabels -> Placed[f[{##}, 1], After], 
         Axes -> None
] 
& @@
Take[
  SortBy[
     Tally[
       Select[
        StringSplit[ToLowerCase[Import[i]], RegularExpression["\\W+"]], 
       !MemberQ[{"the", "and", "of", "to", "a", "i", "it", "in", "or","is"}, #]&]
     ], 
  Last], 
-22]

いくつかの説明

Import[] 
   # Get The File

ToLowerCase []
   # To Lower Case :)

StringSplit[ STRING , RegularExpression["\\W+"]]
   # Split By Words, getting a LIST

Select[ LIST, !MemberQ[{LIST_TO_AVOID}, #]&]
   #  Select from LIST except those words in LIST_TO_AVOID
   #  Note that !MemberQ[{LIST_TO_AVOID}, #]& is a FUNCTION for the test

Tally[LIST]
   # Get the LIST {word,word,..} 
     and produce another  {{word,counter},{word,counter}...}

SortBy[ LIST ,Last]
   # Get the list produced bt tally and sort by counters
     Note that counters are the LAST element of {word,counter}

Take[ LIST ,-22]
   # Once sorted, get the biggest 22 counters

BarChart[f[{##}, -1], ChartLabels -> Placed[f[{##}, 1], After]] &@@ LIST
   # Get the list produced by Take as input and produce a bar chart

f[x_, y_] := Flatten[Take[x, All, y]]
   # Auxiliary to get the list of the first or second element of lists of lists x_
     dependending upon y
   # So f[{##}, -1] is the list of counters
   # and f[{##}, 1] is the list of words (labels for the chart)

出力

代替テキスト http://i49.tinypic.com/2n8mrer.jpg

Mathematica はゴルフにはあまり適していません.それは単に関数名が長くて説明的だからです. 「RegularExpression[]」や「StringSplit[]」などの関数は、私をすすり泣かせます:(。

ジップの法則のテスト

Zipfの法則は、自然言語テキストの場合、対数 (ランク)対対数 (発生)プロットが線形関係に従うことを予測しています。

この法則は、暗号化とデータ圧縮のアルゴリズムを開発する際に使用されます。(ただし、LZW アルゴリズムの「Z」ではありません)。

私たちのテキストでは、次のようにテストできます

 f[x_, y_] := Flatten[Take[x, All, y]]; 
 ListLogLogPlot[
     Reverse[f[{##}, -1]], 
     AxesLabel -> {"Log (Rank)", "Log Counter"}, 
     PlotLabel -> "Testing Zipf's Law"]
 & @@
 Take[
  SortBy[
    Tally[
       StringSplit[ToLowerCase[b], RegularExpression["\\W+"]]
    ], 
   Last],
 -1000]

結果は（かなりよく線形です）

代替テキスト http://i46.tinypic.com/33fcmdk.jpg

編集 6 > (242 文字)

正規表現のリファクタリング (Select 関数はもうありません)
1 文字の単語を削除
関数 "f" のより効率的な定義

f = Flatten[Take[#1, All, #2]]&; 
BarChart[
     f[{##}, -1], 
     BarOrigin -> Left, 
     ChartLabels -> Placed[f[{##}, 1], After], 
     Axes -> None] 
& @@
  Take[
    SortBy[
       Tally[
         StringSplit[ToLowerCase[Import[i]], 
          RegularExpression["(\\W|\\b(.|the|and|of|to|i[tns]|or)\\b)+"]]
       ],
    Last],
  -22]

編集 7 → 199 文字

BarChart[#2, BarOrigin->Left, ChartLabels->Placed[#1, After], Axes->None]&@@ 
  Transpose@Take[SortBy[Tally@StringSplit[ToLowerCase@Import@i, 
    RegularExpression@"(\\W|\\b(.|the|and|of|to|i[tns]|or)\\b)+"],Last], -22]

および( / ) 引数に置き換えられfます。TransposeSlot#1#2
臭いブラケットは必要ありません (可能な場合はf@x代わりに使用してくださいf[x])

score 26 · Accepted Answer

C# - 510 451 436 446 434 426 422 文字 (縮小)

それほど短くはありませんが、おそらく正しいでしょう! 以前のバージョンではバーの最初の行が表示されず、バーが正しくスケーリングされず、stdin からファイルを取得する代わりにファイルがダウンロードされ、必要な C# の冗長性がすべて含まれていませんでした。C# に余分ながらくたがそれほど必要ない場合は、多くのストロークを簡単に削ることができます。たぶん、Powershellの方がうまくいくかもしれません。

using C=System.Console;   // alias for Console
using System.Linq;  // for Split, GroupBy, Select, OrderBy, etc.

class Class // must define a class
{
    static void Main()  // must define a Main
    {
        // split into words
        var allwords = System.Text.RegularExpressions.Regex.Split(
                // convert stdin to lowercase
                C.In.ReadToEnd().ToLower(),
                // eliminate stopwords and non-letters
                @"(?:\b(?:the|and|of|to|a|i[tns]?|or)\b|\W)+")
            .GroupBy(x => x)    // group by words
            .OrderBy(x => -x.Count()) // sort descending by count
            .Take(22);   // take first 22 words

        // compute length of longest bar + word
        var lendivisor = allwords.Max(y => y.Count() / (76.0 - y.Key.Length));

        // prepare text to print
        var toPrint = allwords.Select(x=> 
            new { 
                // remember bar pseudographics (will be used in two places)
                Bar = new string('_',(int)(x.Count()/lendivisor)), 
                Word=x.Key 
            })
            .ToList();  // convert to list so we can index into it

        // print top of first bar
        C.WriteLine(" " + toPrint[0].Bar);
        toPrint.ForEach(x =>  // for each word, print its bar and the word
            C.WriteLine("|" + x.Bar + "| " + x.Word));
    }
}

以下の形式でレンディバイザーをインライン化した 422 文字(22 倍遅くなります) (選択スペースに改行を使用):

using System.Linq;using C=System.Console;class M{static void Main(){var
a=System.Text.RegularExpressions.Regex.Split(C.In.ReadToEnd().ToLower(),@"(?:\b(?:the|and|of|to|a|i[tns]?|or)\b|\W)+").GroupBy(x=>x).OrderBy(x=>-x.Count()).Take(22);var
b=a.Select(x=>new{p=new string('_',(int)(x.Count()/a.Max(y=>y.Count()/(76d-y.Key.Length)))),t=x.Key}).ToList();C.WriteLine(" "+b[0].p);b.ForEach(x=>C.WriteLine("|"+x.p+"| "+x.t));}}

score 25 · Accepted Answer

Perl、237 229 209 文字

(Ruby バージョンを打ち負かすために、より汚いゴルフトリックを追加split/[^a-z/,lcし、に置き換えlc=~/[a-z]+/g、別の場所の空の文字列のチェックを削除するように再度更新しました。これらは Ruby バージョンに触発されたので、クレジットが必要な場合はクレジットしてください。)

更新: Perl 5.10 になりました! に置き換えprint、sayを使用~~してを回避しmapます。これは、コマンドラインでとして呼び出す必要がありますperl -E '<one-liner>' alice.txt。スクリプト全体が 1 行にあるため、1 ライナーで記述しても問題はありません :)。

 @s=qw/the and of to a i it in or is/;$c{$_}++foreach grep{!($_~~@s)}map{lc=~/[a-z]+/g}<>;@s=sort{$c{$b}<=>$c{$a}}keys%c;$f=76-length$s[0];say" "."_"x$f;say"|"."_"x($c{$_}/$c{$s[0]}*$f)."| $_ "foreach@s[0..21];

このバージョンでは、大文字と小文字が正規化されることに注意してください。,lc削除すると（小文字の場合）A-Z、分割正規表現に追加する必要があるため、これはソリューションを短縮しません。

改行が 2 文字ではなく 1 文字であるシステムを使用している場合は、の代わりにリテラル改行を使用して、これをさらに 2 文字短縮できます\n。しかし、私は上記のサンプルをそのように書いていません。

これはほとんど正しいですが、リモートで十分に短いというわけではありませんが、perl ソリューションです。

use strict;
use warnings;

my %short = map { $_ => 1 } qw/the and of to a i it in or is/;
my %count = ();

$count{$_}++ foreach grep { $_ && !$short{$_} } map { split /[^a-zA-Z]/ } (<>);
my @sorted = (sort { $count{$b} <=> $count{$a} } keys %count)[0..21];
my $widest = 76 - (length $sorted[0]);

print " " . ("_" x $widest) . "\n";
foreach (@sorted)
{
    my $width = int(($count{$_} / $count{$sorted[0]}) * $widest);
    print "|" . ("_" x $width) . "| $_ \n";
}

以下は、比較的読みやすいままで、できるだけ短くまとめたものです。(392 文字)。

%short = map { $_ => 1 } qw/the and of to a i it in or is/;
%count;

$count{$_}++ foreach grep { $_ && !$short{$_} } map { split /[^a-z]/, lc } (<>);
@sorted = (sort { $count{$b} <=> $count{$a} } keys %count)[0..21];
$widest = 76 - (length $sorted[0]);

print " " . "_" x $widest . "\n";
print"|" . "_" x int(($count{$_} / $count{$sorted[0]}) * $widest) . "| $_ \n" foreach @sorted;

score 20 · Accepted Answer

Windows PowerShell、199 文字

$x=$input-split'\P{L}'-notmatch'^(the|and|of|to|.?|i[tns]|or)$'|group|sort *
filter f($w){' '+'_'*$w
$x[-1..-22]|%{"|$('_'*($w*$_.Count/$x[-1].Count))| "+$_.Name}}
f(76..1|?{!((f $_)-match'.'*80)})[0]

(最後の改行は必要ありませんが、読みやすくするためにここに含まれています。)

(現在のコードと私のテストファイルは、私の SVN リポジトリで入手できます。私のテストケースで、最も一般的なエラー (バーの長さ、正規表現の一致に関する問題、その他いくつかの問題) が検出されることを願っています)

仮定:

入力としての US ASCII。おそらくUnicodeではおかしくなるでしょう。
テキスト内の少なくとも2 つのノンストップワード

歴史

緩和されたバージョン(137)。

($x=$input-split'\P{L}'-notmatch'^(the|and|of|to|.?|i[tns]|or)$'|group|sort *)[-1..-22]|%{"|$('_'*(76*$_.Count/$x[-1].Count))| "+$_.Name}

最初のバーを閉じません
最初の単語以外の単語の長さは考慮されません

他のソリューションと比較して 1 文字のバーの長さにばらつきがあるのは、PowerShell が浮動小数点数を整数に変換するときに切り捨てではなく丸めを使用するためです。ただし、タスクには比例棒の長さだけが必要なので、これで問題ありません。

他のソリューションと比較して、行が 80 文字を超えない最大の長さを試して取得することで、最長のバーの長さを決定する際に少し異なるアプローチを取りました。

説明されている古いバージョンについては、こちらを参照してください。

score 19 · Accepted Answer

Python 2.x、緯度経度アプローチ = 227 183 文字

import sys,re
t=re.split('\W+',sys.stdin.read().lower())
r=sorted((-t.count(w),w)for w in set(t)if w not in'andithetoforinis')[:22]
for l,w in r:print(78-len(r[0][1]))*l/r[0][0]*'=',w

実装の自由を考慮して、除外を要求されたすべての単語 ( the, and, of, to, a, i, it, in, or, is) を含む文字列連結を作成しました。さらに、悪名高い 2 つの「単語」sとt例からも除外されan, for, heます。アリス、欽定訳聖書、専門用語ファイルの単語のコーパスに対して、これらの単語のすべての連結を試して、文字列によって誤って除外される単語があるかどうかを確認しました。そして、それが私が2つの除外文字列で終わった方法です:itheandtoforinisとandithetoforinis.

PS。コードを短縮するために他のソリューションから借用しました。

=========================================================================== she 
================================================================= you
============================================================== said
====================================================== alice
================================================ was
============================================ that
===================================== as
================================= her
============================== at
============================== with
=========================== on
=========================== all
======================== this
======================== had
======================= but
====================== be
====================== not
===================== they
==================== so
=================== very
=================== what
================= little

暴言

無視する単語については、英語で最もよく使用される単語のリストから取得されると思われます。そのリストは、使用されるテキストコーパスによって異なります。最も人気のあるリスト ( http://en.wikipedia.org/wiki/Most_common_words_in_English、http://www.english-for-students.com/Frequently-Used-Words.html、http://www. sporcle.com/games/common_english_words.php )、上位 10 語は次のとおりです。the be(am/are/is/was/were) to of and a in that have I

Alice in Wonderland テキストthe and to a of it she i you said
の上位 10 語は、Jargon File (v4.4.7) の上位 10 語です。the a of to and in is that or for

問題は、なぜorこの問題の無視リストに含まれていたのかということです。ここでは、単語that(8 番目に使用されている) が含まれていないのに、人気が 30 番目に高くなっています。などなど。したがって、無視リストは動的に提供する必要があると思います（または省略できます）。

別のアイデアは、結果から上位 10 語をスキップすることです。これにより、実際にはソリューションが短縮されます (初級 - 11 番目から 32 番目のエントリのみを表示する必要があります)。

Python 2.x、パンクチラスなアプローチ = 277 243 文字

上記のコードで描かれたグラフは簡略化されています (バーに 1 文字のみを使用)。問題の説明からチャートを正確に再現したい場合 (これは必須ではありません)、次のコードで実行できます。

import sys,re
t=re.split('\W+',sys.stdin.read().lower())
r=sorted((-t.count(w),w)for w in set(t)-set(sys.argv))[:22]
h=min(9*l/(77-len(w))for l,w in r)
print'',9*r[0][0]/h*'_'
for l,w in r:print'|'+9*l/h*'_'+'|',w

除外する 10 個の単語をややランダムに選択して問題を解決するthe, and, of, to, a, i, it, in, or, isため、それらは次のようにコマンドラインパラメーターとして渡されます。
python WordFrequencyChart.py the and of to a i it in or is <"Alice's Adventures in Wonderland.txt"

コマンドラインで渡された「元の」無視リストを考慮すると、これは 213 文字 + 30 = 243 です。

PS。2 番目のコードは、上位のすべての単語の長さの「調整」も行うため、縮退した場合にオーバーフローすることはありません。

 _______________________________________________________________
|_______________________________________________________________| she
|_______________________________________________________| superlongstringstring
|_____________________________________________________| said
|______________________________________________| alice
|_________________________________________| was
|______________________________________| that
|_______________________________| as
|____________________________| her
|__________________________| at
|__________________________| with
|_________________________| s
|_________________________| t
|_______________________| on
|_______________________| all
|____________________| this
|____________________| for
|____________________| had
|____________________| but
|___________________| be
|___________________| not
|_________________| they
|_________________| so

score 19 · Accepted Answer

ルビー、215、216、218、221、224、236、237文字_ _ _ _ _ _ _ _ _

更新 1:万歳! JS Bangsのソリューションとのタイです。これ以上削減する方法が思い浮かびません:)

更新 2: 汚いゴルフトリックをプレイしました。each1文字保存するように変更map:)

更新 3: +2に変更File.readされました。あまり実りがなかったので+6に変更。in regex +1 で小文字を区別した後は、大文字と小文字を区別しないチェックは必要ありません。降順でのソートは、値 +6 を否定することで簡単に実行できます。合計節約額 +15IO.readArray.group_byreducedowncase

更新 4:[0]ではなく.first、+3。(@シュテフ)

更新 5: 変数lをその場で展開、+1。変数sをその場で展開、+2。(@シュテフ)

更新 6: 最初の行 +2 では、補間ではなく文字列の加算を使用します。(@シュテフ)

w=(IO.read($_).downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).reduce(Hash.new 0){|m,o|m[o]+=1;m}.sort_by{|k,v|-v}.take 22;m=76-w[0][0].size;puts' '+'_'*m;w.map{|x,f|puts"|#{'_'*(f*1.0/w[0][1]*m)}| #{x} "}

更新 7:インスタンス変数を使用して、ループ内の最初の反復を検出するために、たくさんの騒ぎを経験しました。おそらく可能性はありますが、私が得たのは+1だけです。これは黒魔術だと思うので、以前のバージョンを保存します。(@シュテフ)

(IO.read($_).downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).reduce(Hash.new 0){|m,o|m[o]+=1;m}.sort_by{|k,v|-v}.take(22).map{|x,f|@f||(@f=f;puts' '+'_'*(@m=76-x.size));puts"|#{'_'*(f*1.0/@f*@m)}| #{x} "}

可読版

string = File.read($_).downcase

words = string.scan(/[a-z]+/i)
allowed_words = words - %w{the and of to a i it in or is}
sorted_words = allowed_words.group_by{ |x| x }.map{ |x,y| [x, y.size] }.sort{ |a,b| b[1] <=> a[1] }.take(22)
highest_frequency = sorted_words.first
highest_frequency_count = highest_frequency[1]
highest_frequency_word = highest_frequency[0]

word_length = highest_frequency_word.size
widest = 76 - word_length

puts " #{'_' * widest}"    
sorted_words.each do |word, freq|
  width = (freq * 1.0 / highest_frequency_count) * widest
  puts "|#{'_' * width}| #{word} "
end

使用するには:

echo "Alice.txt" | ruby -ln GolfedWordFrequencies.rb

出力：

 _________________________________________________________________________
|_________________________________________________________________________| she 
|_______________________________________________________________| you 
|____________________________________________________________| said 
|_____________________________________________________| alice 
|_______________________________________________| was 
|___________________________________________| that 
|____________________________________| as 
|________________________________| her 
|_____________________________| with 
|_____________________________| at 
|____________________________| s 
|____________________________| t 
|__________________________| on 
|__________________________| all 
|_______________________| this 
|_______________________| for 
|_______________________| had 
|_______________________| but 
|______________________| be 
|_____________________| not 
|____________________| they 
|____________________| so

score 12 · Accepted Answer

Haskell - 366 351 344 337 333 文字

(読みやすくするために改行が 1 つmain追加されています。最終行の最後に改行は必要ありません。)

import Data.List
import Data.Char
l=length
t=filter
m=map
f c|isAlpha c=toLower c|0<1=' '
h w=(-l w,head w)
x!(q,w)='|':replicate(minimum$m(q?)x)'_'++"| "++w
q?(g,w)=q*(77-l w)`div`g
b x=m(x!)x
a(l:r)=(' ':t(=='_')l):l:r
main=interact$unlines.a.b.take 22.sort.m h.group.sort
  .t(`notElem`words"the and of to a i it in or is").words.m f

それがどのように機能するかは、引数をinteract逆に読むことで最もよくわかります。

map fアルファベットを小文字にし、それ以外はすべてスペースに置き換えます。
words単語のリストを生成し、区切りの空白を削除します。
filter (notElemwords "the and of to a i it in or is")は、禁止語を含むすべてのエントリを破棄します。
group . sort単語をソートし、同一のものをリストにグループ化します。
map hは、同一の単語の各リストを形式のタプルにマップします(-frequency, word)。
take 22 . sortタプルを頻度の降順 (最初のタプルエントリ) で並べ替え、最初の 22 個のタプルのみを保持します。
bタプルをバーにマップします (以下を参照)。
aアンダースコアの最初の行を先頭に追加して、一番上のバーを完成させます。
unlinesこれらすべての行を改行で結合します。

トリッキーなビットは、バーの長さを正しくすることです. アンダースコアのみがバーの長さにカウントされるため、||長さゼロのバーになると想定しました。関数はにbマップc xされますx。ここxで、はヒストグラムのリストです。リスト全体がに渡されるcため、を呼び出すたびにを呼び出すことでc、それ自体の倍率を計算できますu。このようにして、変換関数とインポートが多くの文字を消費する浮動小数点演算や有理数の使用を避けています。

の使い方に注意-frequency。これにより、並べ替え (昇順)によって頻度が最大の単語が最初に配置されるreverseため、の必要がなくなります。その後、関数で 2 つの値が乗算され、否定がキャンセルされます。sort-frequencyu-frequency

score 11 · Accepted Answer

Python3.1-245229文字_

Counterを使用することは一種の不正行為だと思います:)私は約1週間前にそれについて読んだばかりなので、これはそれがどのように機能するかを見る絶好の機会でした。

import re,collections
o=collections.Counter([w for w in re.findall("[a-z]+",open("!").read().lower())if w not in"a and i in is it of or the to".split()]).most_common(22)
print('\n'.join('|'+76*v//o[0][1]*'_'+'| '+k for k,v in o))

プリントアウト：

|____________________________________________________________________________| she
|__________________________________________________________________| you
|_______________________________________________________________| said
|_______________________________________________________| alice
|_________________________________________________| was
|_____________________________________________| that
|_____________________________________| as
|__________________________________| her
|_______________________________| with
|_______________________________| at
|______________________________| s
|_____________________________| t
|____________________________| on
|___________________________| all
|________________________| this
|________________________| for
|________________________| had
|________________________| but
|______________________| be
|______________________| not
|_____________________| they
|____________________| so

一部のコードは、AKXのソリューションから「借用」されました。

score 11 · Accepted Answer

PHP CLI version (450 chars)

This solution takes into account the last requirement which most purists have conviniently chosen to ignore. That costed 170 characters!

Usage: php.exe <this.php> <file.txt>

Minified:

<?php $a=array_count_values(array_filter(preg_split('/[^a-z]/',strtolower(file_get_contents($argv[1])),-1,1),function($x){return !preg_match("/^(.|the|and|of|to|it|in|or|is)$/",$x);}));arsort($a);$a=array_slice($a,0,22);function R($a,$F,$B){$r=array();foreach($a as$x=>$f){$l=strlen($x);$r[$x]=$b=$f*$B/$F;if($l+$b>76)return R($a,$f,76-$l);}return$r;}$c=R($a,max($a),76-strlen(key($a)));foreach($a as$x=>$f)echo '|',str_repeat('-',$c[$x]),"| $x\n";?>

Human readable:

<?php

// Read:
$s = strtolower(file_get_contents($argv[1]));

// Split:
$a = preg_split('/[^a-z]/', $s, -1, PREG_SPLIT_NO_EMPTY);

// Remove unwanted words:
$a = array_filter($a, function($x){
       return !preg_match("/^(.|the|and|of|to|it|in|or|is)$/",$x);
     });

// Count:
$a = array_count_values($a);

// Sort:
arsort($a);

// Pick top 22:
$a=array_slice($a,0,22);


// Recursive function to adjust bar widths
// according to the last requirement:
function R($a,$F,$B){
    $r = array();
    foreach($a as $x=>$f){
        $l = strlen($x);
        $r[$x] = $b = $f * $B / $F;
        if ( $l + $b > 76 )
            return R($a,$f,76-$l);
    }
    return $r;
}

// Apply the function:
$c = R($a,max($a),76-strlen(key($a)));


// Output:
foreach ($a as $x => $f)
    echo '|',str_repeat('-',$c[$x]),"| $x\n";

?>

Output:

|-------------------------------------------------------------------------| she
|---------------------------------------------------------------| you
|------------------------------------------------------------| said
|-----------------------------------------------------| alice
|-----------------------------------------------| was
|-------------------------------------------| that
|------------------------------------| as
|--------------------------------| her
|-----------------------------| at
|-----------------------------| with
|--------------------------| on
|--------------------------| all
|-----------------------| this
|-----------------------| for
|-----------------------| had
|-----------------------| but
|----------------------| be
|---------------------| not
|--------------------| they
|--------------------| so
|-------------------| very
|------------------| what

When there is a long word, the bars are adjusted properly:

|--------------------------------------------------------| she
|---------------------------------------------------| thisisareallylongwordhere
|-------------------------------------------------| you
|-----------------------------------------------| said
|-----------------------------------------| alice
|------------------------------------| was
|---------------------------------| that
|---------------------------| as
|-------------------------| her
|-----------------------| with
|-----------------------| at
|--------------------| on
|--------------------| all
|------------------| this
|------------------| for
|------------------| had
|-----------------| but
|-----------------| be
|----------------| not
|---------------| they
|---------------| so
|--------------| very

score 11 · Accepted Answer

perl, 205 191 189 文字/ 205 文字 (完全実装)

いくつかの部分は、以前の perl/ruby の提出物に触発されたものであり、いくつかの同様のアイデアは個別に考え出されたものであり、その他はオリジナルです。短いバージョンには、他の提出物から見たり学んだことも組み込まれています。

オリジナル：

$k{$_}++for grep{$_!~/^(the|and|of|to|a|i|it|in|or|is)$/}map{lc=~/[a-z]+/g}<>;@t=sort{$k{$b}<=>$k{$a}}keys%k;$l=76-length$t[0];printf" %s
",'_'x$l;printf"|%s| $_
",'_'x int$k{$_}/$k{$t[0]}*$l for@t[0..21];

191 文字~~までの最新バージョン:~~

/^(the|and|of|to|.|i[tns]|or)$/||$k{$_}++for map{lc=~/[a-z]+/g}<>;@e=sort{$k{$b}<=>$k{$a}}keys%k;$n=" %s
";$r=(76-y///c)/$k{$_=$e[0]};map{printf$n,'_'x($k{$_}*$r),$_;$n="|%s| %s
"}@e[0,0..21]

189 文字までの最新バージョン:

/^(the|and|of|to|.|i[tns]|or)$/||$k{$_}++for map{lc=~/[a-z]+/g}<>;@_=sort{$k{$b}<=>$k{$a}}keys%k;$n=" %s
";$r=(76-m//)/$k{$_=$_[0]};map{printf$n,'_'x($k{$_}*$r),$_;$n="|%s| %s
"}@_[0,0..21]

このバージョン (205 文字) は、後で発見されるよりも長い単語を含む行を説明しています。

/^(the|and|of|to|.|i[tns]|or)$/||$k{$_}++for map{lc=~/[a-z]+/g}<>;($r)=sort{$a<=>$b}map{(76-y///c)/$k{$_}}@e=sort{$k{$b}<=>$k{$a}}keys%k;$n=" %s
";map{printf$n,'_'x($k{$_}*$r),$_;$n="|%s| %s
";}@e[0,0..21]

score 11 · Accepted Answer

JavaScript 1.8 (SpiderMonkey) - 354

x={};p='|';e=' ';z=[];c=77
while(l=readline())l.toLowerCase().replace(/\b(?!(the|and|of|to|a|i[tns]?|or)\b)\w+/g,function(y)x[y]?x[y].c++:z.push(x[y]={w:y,c:1}))
z=z.sort(function(a,b)b.c-a.c).slice(0,22)
for each(v in z){v.r=v.c/z[0].c
c=c>(l=(77-v.w.length)/v.r)?l:c}for(k in z){v=z[k]
s=Array(v.r*c|0).join('_')
if(!+k)print(e+s+e)
print(p+s+p+e+v.w)}

悲しいことに、for([k,v]in z)Rhino バージョンのは SpiderMonkey で動作したくないようでreadFile()、使用するよりも少し簡単ですが、readline()1.8 に移行すると、関数クロージャーを使用してさらに数行を削減できます....

読みやすくするために空白を追加します。

x={};p='|';e=' ';z=[];c=77
while(l=readline())
  l.toLowerCase().replace(/\b(?!(the|and|of|to|a|i[tns]?|or)\b)\w+/g,
   function(y) x[y] ? x[y].c++ : z.push( x[y] = {w: y, c: 1} )
  )
z=z.sort(function(a,b) b.c - a.c).slice(0,22)
for each(v in z){
  v.r=v.c/z[0].c
  c=c>(l=(77-v.w.length)/v.r)?l:c
}
for(k in z){
  v=z[k]
  s=Array(v.r*c|0).join('_')
  if(!+k)print(e+s+e)
  print(p+s+p+e+v.w)
}

使用法： js golf.js < input.txt

出力：

_________________________________________________________________________
|__________________________________________________________________________| 彼女
|____________________________________________________________________________________________| あなた
|__________________________________________________________________________| 言った
|____________________________________________________| アリス
|____________________________________________________________| だった
|__________________________________________________________| それ
|___________________________________| なので
|_________________| 彼女
|________________| で
|________________| と
|____________________________| s
|____________________________| t
|__________________________| の上
|__________________________| 全て
|_______________________| これ
|______________________| 為に
|______________________| 持っていました
|______________________| しかし
|_____________________| なれ
|_____________________| いいえ
|___________________| 彼ら
|___________________| それで

(基本バージョン - バーの幅を正しく処理しません)

JavaScript (ライノ) -405 395 387 377 368 343304文字

~~ソートロジックがオフになっていると思いますが..わかりません。~~ブレインファートが修正されました。

縮小 ( を悪用\nすると、時々として解釈され;ます):

x={};p='|';e=' ';z=[]
readFile(arguments[0]).toLowerCase().replace(/\b(?!(the|and|of|to|a|i[tns]?|or)\b)\w+/g,function(y){x[y]?x[y].c++:z.push(x[y]={w:y,c:1})})
z=z.sort(function(a,b){return b.c-a.c}).slice(0,22)
for([k,v]in z){s=Array((v.c/z[0].c)*70|0).join('_')
if(!+k)print(e+s+e)
print(p+s+p+e+v.w)}

score 10 · Accepted Answer

Perl: 203 202 201 198 195 208 203 / 231 文字

$/=\0;/^(the|and|of|to|.|i[tns]|or)$/i||$x{lc$_}++for<>=~/[a-z]+/gi;map{$z=$x{$_};$y||{$y=(76-y///c)/$z}&&warn" "."_"x($z*$y)."\n";printf"|%.78s\n","_"x($z*$y)."| $_"}(sort{$x{$b}<=>$x{$a}}keys%x)[0..21]

セカンダリワードが人気があり、組み合わせて 80 文字を超える長さである病理学的ケース (この実装は 231 文字)の、指定された動作 (グローバルバースクイッシング) を含む代替の完全な実装:

$/=\0;/^(the|and|of|to|.|i[tns]|or)$/i||$x{lc$_}++for<>=~/[a-z]+/gi;@e=(sort{$x{$b}<=>$x{$a}}keys%x)[0..21];for(@e){$p=(76-y///c)/$x{$_};($y&&$p>$y)||($y=$p)}warn" "."_"x($x{$e[0]}*$y)."\n";for(@e){warn"|"."_"x($x{$_}*$y)."| $_\n"}

仕様には、これを STDOUT に送信する必要があるとはどこにも記載されていなかったので、print の代わりに perl の warn() を使用しました - そこに 4 文字が保存されました。foreach の代わりに map を使用しましたが、split(join()) でさらに節約できると思います。それでも、203まで下げました-それで眠るかもしれません。少なくとも、Perl は今のところ「shell、grep、tr、grep、sort、uniq、sort、head、perl」の文字数の下にあります ;)

PS: Reddit は「こんにちは」と言います ;)

更新: 割り当てと暗黙的なスカラー変換の結合を優先して、join() を削除しました。202 まで。また、オプションの「1 文字の単語を無視する」ルールを利用して 2 文字を削っていることに注意してください。そのため、頻度カウントにはこれが反映されることに注意してください。

更新 2: 最初に <> を使用して一度にファイルを取得するために、$/ を強制終了するための代入と暗黙の結合を交換しました。同じサイズですが、より厄介です。if(!$y){} を $y||{}&& に置き換え、さらに 1 文字 => 201 節約しました。

更新 3: lc をマップブロックの外に移動することにより、小文字化 (lc<>) を早期に制御しました - 両方の正規表現をスワップアウトして、/i オプションを使用しなくなりました。従来の perlgolf || の明示的な条件付き x?y:z 構文を交換しました。暗黙の条件構文 - /^...$/i?1:$x{$ }++ for /^...$/||$x{$ }++ 3 文字を保存しました! => 198、200の壁を破った。そろそろ寝ようかな…多分。

更新 4: 睡眠不足で頭がおかしくなりました。良い。もっと正気じゃない。これは通常のハッピーテキストファイルを解析するだけでよいと考えて、null にヒットするとあきらめるようにしました。キャラを2人救った。「長さ」を 1 文字短く (そしてよりゴルフっぽい) に置き換えました y///c - 聞こえますか、GolfScript?? 私はあなたのために来ています！！！すすり泣く

更新 5: Sleep dep により、22 行の制限と後続の行の制限を忘れてしまいました。扱ったもので208までバックアップ。13 人のキャラクターを処理することは、世界の終わりではありません。perl の正規表現のインライン eval をいじってみましたが、動作と文字の保存の両方に問題がありました... 笑。現在の出力と一致するように例を更新しました。

更新 6: (...)for を保護する不要な中かっこを削除しました。これは、構文上の candy ++ が for に対してうまく押し付けられるためです。Chas からの入力に感謝します。Owens (私の疲れた脳を思い起こさせる) は、そこに文字クラス i[tns] ソリューションを取得しました。203に戻ります。

更新 7: 2 番目の作業を追加し、仕様を完全に実装しました (病的な例のない元の仕様に基づいて、ほとんどの人が行っている切り捨ての代わりに、二次的な長い単語の完全なバースクイッシング動作を含む)

例:

 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|___________________________________________| that
|____________________________________| as
|________________________________| her
|_____________________________| with
|_____________________________| at
|__________________________| on
|__________________________| all
|_______________________| this
|_______________________| for
|_______________________| had
|_______________________| but
|______________________| be
|_____________________| not
|____________________| they
|____________________| so
|___________________| very
|__________________| what

病理学的ケースの例での代替実装:

 _______________________________________________________________
|_______________________________________________________________| she
|_______________________________________________________| superlongstringstring
|____________________________________________________| said
|______________________________________________| alice
|________________________________________| was
|_____________________________________| that
|_______________________________| as
|____________________________| her
|_________________________| with
|_________________________| at
|_______________________| on
|______________________| all
|____________________| this
|____________________| for
|____________________| had
|____________________| but
|___________________| be
|__________________| not
|_________________| they
|_________________| so
|________________| very
|________________| what

score 9 · Accepted Answer

F#、452 文字

単純明快:a単語数のペアのシーケンスを取得し、列ごとの単語数の最適な乗数を見つけて、k結果を出力します。

let a=
 stdin.ReadToEnd().Split(" .?!,\":;'\r\n".ToCharArray(),enum 1)
 |>Seq.map(fun s->s.ToLower())|>Seq.countBy id
 |>Seq.filter(fun(w,n)->not(set["the";"and";"of";"to";"a";"i";"it";"in";"or";"is"].Contains w))
 |>Seq.sortBy(fun(w,n)-> -n)|>Seq.take 22
let k=a|>Seq.map(fun(w,n)->float(78-w.Length)/float n)|>Seq.min
let u n=String.replicate(int(float(n)*k)-2)"_"
printfn" %s "(u(snd(Seq.nth 0 a)))
for(w,n)in a do printfn"|%s| %s "(u n)w

例 (私はあなたとは異なる頻度カウントを持っていますが、理由は不明です):

% app.exe < Alice.txt

 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|_____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|___________________________________________| that
|___________________________________| as
|________________________________| her
|_____________________________| with
|_____________________________| at
|____________________________| t
|____________________________| s
|__________________________| on
|_________________________| all
|_______________________| this
|______________________| had
|______________________| for
|_____________________| but
|_____________________| be
|____________________| not
|___________________| they
|__________________| so

score 8 · Accepted Answer

Python 2.6、347文字

import re
W,x={},"a and i in is it of or the to".split()
[W.__setitem__(w,W.get(w,0)-1)for w in re.findall("[a-z]+",file("11.txt").read().lower())if w not in x]
W=sorted(W.items(),key=lambda p:p[1])[:22]
bm=(76.-len(W[0][0]))/W[0][1]
U=lambda n:"_"*int(n*bm)
print "".join(("%s\n|%s| %s "%((""if i else" "+U(n)),U(n),w))for i,(w,n)in enumerate(W))

出力：

 _________________________________________________________________________
|_________________________________________________________________________| she 
|_______________________________________________________________| you 
|____________________________________________________________| said 
|_____________________________________________________| alice 
|_______________________________________________| was 
|___________________________________________| that 
|____________________________________| as 
|________________________________| her 
|_____________________________| with 
|_____________________________| at 
|____________________________| s 
|____________________________| t 
|__________________________| on 
|__________________________| all 
|_______________________| this 
|_______________________| for 
|_______________________| had 
|_______________________| but 
|______________________| be 
|_____________________| not 
|____________________| they 
|____________________| so

score 7 · Accepted Answer

一般的な LISP、670 文字

私は LISP の初心者で、これはカウントにハッシュテーブルを使用する試みです (したがって、おそらく最もコンパクトな方法ではありません)。

(flet((r()(let((x(read-char t nil)))(and x(char-downcase x)))))(do((c(
make-hash-table :test 'equal))(w NIL)(x(r)(r))y)((not x)(maphash(lambda
(k v)(if(not(find k '("""the""and""of""to""a""i""it""in""or""is"):test
'equal))(push(cons k v)y)))c)(setf y(sort y #'> :key #'cdr))(setf y
(subseq y 0(min(length y)22)))(let((f(apply #'min(mapcar(lambda(x)(/(-
76.0(length(car x)))(cdr x)))y))))(flet((o(n)(dotimes(i(floor(* n f)))
(write-char #\_))))(write-char #\Space)(o(cdar y))(write-char #\Newline)
(dolist(x y)(write-char #\|)(o(cdr x))(format t "| ~a~%"(car x))))))
(cond((char<= #\a x #\z)(push x w))(t(incf(gethash(concatenate 'string(
reverse w))c 0))(setf w nil)))))

たとえば、で実行できます cat alice.txt | clisp -C golf.lisp。

読み取り可能な形式では

(flet ((r () (let ((x (read-char t nil)))
               (and x (char-downcase x)))))
  (do ((c (make-hash-table :test 'equal))  ; the word count map
       w y                                 ; current word and final word list
       (x (r) (r)))  ; iteration over all chars
       ((not x)

        ; make a list with (word . count) pairs removing stopwords
        (maphash (lambda (k v)
                   (if (not (find k '("" "the" "and" "of" "to"
                                      "a" "i" "it" "in" "or" "is")
                                  :test 'equal))
                       (push (cons k v) y)))
                 c)

        ; sort and truncate the list
        (setf y (sort y #'> :key #'cdr))
        (setf y (subseq y 0 (min (length y) 22)))

        ; find the scaling factor
        (let ((f (apply #'min
                        (mapcar (lambda (x) (/ (- 76.0 (length (car x)))
                                               (cdr x)))
                                y))))
          ; output
          (flet ((outx (n) (dotimes (i (floor (* n f))) (write-char #\_))))
             (write-char #\Space)
             (outx (cdar y))
             (write-char #\Newline)
             (dolist (x y)
               (write-char #\|)
               (outx (cdr x))
               (format t "| ~a~%" (car x))))))

       ; add alphabetic to current word, and bump word counter
       ; on non-alphabetic
       (cond
        ((char<= #\a x #\z)
         (push x w))
        (t
         (incf (gethash (concatenate 'string (reverse w)) c 0))
         (setf w nil)))))

score 7 · Accepted Answer

Gawk -- 336 (元は 507) 文字

（出力フォーマットを修正した後、短縮形を修正し、微調整し、再度微調整し、まったく不要な並べ替えステップを削除し、さらに微調整し、そして再び（おっと、これはフォーマットを壊しました）;さらに微調整し、マットの挑戦を取り上げて、私は必死に微調整しましたもっと; いくつかを保存する別の場所を見つけましたが、バーの長さのバグを修正するために2つを返しました)

へへへ！^{[Matt の JavaScript][1] ソリューションカウンターチャレンジ} を少し先取りしました! ^;)そして[AKX の python][2]。

この問題は、ネイティブの連想配列を実装する言語を必要としているように思われるので、もちろん、私は恐ろしく不十分な一連の演算子を持つ言語を選択しました。特に、awk がハッシュマップの要素を提供する順序を制御することはできないため、マップ全体を繰り返しスキャンして現在最も数の多い項目を見つけ、それを出力し、配列から削除します。

それはすべて非常に非効率的であり、私が作成したすべてのゴルフも同様にかなりひどいものになっています.

縮小:

{gsub("[^a-zA-Z]"," ");for(;NF;NF--)a[tolower($NF)]++}
END{split("the and of to a i it in or is",b," ");
for(w in b)delete a[b[w]];d=1;for(w in a){e=a[w]/(78-length(w));if(e>d)d=e}
for(i=22;i;--i){e=0;for(w in a)if(a[w]>e)e=a[x=w];l=a[x]/d-2;
t=sprintf(sprintf("%%%dc",l)," ");gsub(" ","_",t);if(i==22)print" "t;
print"|"t"| "x;delete a[x]}}

わかりやすくするためだけに改行を入れています。これらは必要ないため、数えるべきではありません。

出力：

$ gawk -f wordfreq.awk.min < 11.txt 
 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|____________________________________________________________| said
|____________________________________________________| alice
|______________________________________________| was
|__________________________________________| that
|___________________________________| as
|_______________________________| her
|____________________________| with
|____________________________| at
|___________________________| s
|___________________________| t
|_________________________| on
|_________________________| all
|______________________| this
|______________________| for
|______________________| had
|_____________________| but
|____________________| be
|____________________| not
|___________________| they
|__________________| so
$ sed 's/you/superlongstring/gI' 11.txt | gawk -f wordfreq.awk.min
 ______________________________________________________________________
|______________________________________________________________________| she
|_____________________________________________________________| superlongstring
|__________________________________________________________| said
|__________________________________________________| alice
|____________________________________________| was
|_________________________________________| that
|_________________________________| as
|______________________________| her
|___________________________| with
|___________________________| at
|__________________________| s
|__________________________| t
|________________________| on
|________________________| all
|_____________________| this
|_____________________| for
|_____________________| had
|____________________| but
|___________________| be
|___________________| not
|__________________| they
|_________________| so

読みやすい; 633 文字 (元は 949 文字):

{
    gsub("[^a-zA-Z]"," ");
    for(;NF;NF--)
    a[tolower($NF)]++
}
END{
    # remove "short" words
    split("the and of to a i it in or is",b," ");
    for (w in b) 
    delete a[b[w]];
    # Find the bar ratio
    d=1;
    for (w in a) {
    e=a[w]/(78-length(w));
    if (e>d)
        d=e
    }
    # Print the entries highest count first
    for (i=22; i; --i){               
    # find the highest count
    e=0;
    for (w in a) 
        if (a[w]>e)
        e=a[x=w];
        # Print the bar
    l=a[x]/d-2;
    # make a string of "_" the right length
    t=sprintf(sprintf("%%%dc",l)," ");
    gsub(" ","_",t);
    if (i==22) print" "t;
    print"|"t"| "x;
    delete a[x]
    }
}

score 7 · Accepted Answer

*sh (+カール)、部分解法

これは不完全ですが、念のため、問題の半分を 192 バイトで数えた単語頻度を次に示します。

curl -s http://www.gutenberg.org/files/11/11.txt|sed -e 's@[^a-z]@\n@gi'|tr '[:upper:]' '[:lower:]'|egrep -v '(^[^a-z]*$|\b(the|and|of|to|a|i|it|in|or|is)\b)' |sort|uniq -c|sort -n|tail -n 22

score 6 · Accepted Answer

C（828）

難読化されたコードによく似ており、文字列、リスト、ハッシュにglibを使用します。文字数は828wc -mと言います。単一文字の単語は考慮されません。バーの最大長を計算するために、最初の22だけでなく、すべての中で可能な最長の単語を考慮します。これは仕様からの逸脱ですか？

障害を処理せず、使用済みメモリを解放しません。

#include <glib.h>
#define S(X)g_string_##X
#define H(X)g_hash_table_##X
GHashTable*h;int m,w=0,z=0;y(const void*a,const void*b){int*A,*B;A=H(lookup)(h,a);B=H(lookup)(h,b);return*B-*A;}void p(void*d,void*u){int *v=H(lookup)(h,d);if(w<22){g_printf("|");*v=*v*(77-z)/m;while(--*v>=0)g_printf("=");g_printf("| %s\n",d);w++;}}main(c){int*v;GList*l;GString*s=S(new)(NULL);h=H(new)(g_str_hash,g_str_equal);char*n[]={"the","and","of","to","it","in","or","is"};while((c=getchar())!=-1){if(isalpha(c))S(append_c)(s,tolower(c));else{if(s->len>1){for(c=0;c<8;c++)if(!strcmp(s->str,n[c]))goto x;if((v=H(lookup)(h,s->str))!=NULL)++*v;else{z=MAX(z,s->len);v=g_malloc(sizeof(int));*v=1;H(insert)(h,g_strdup(s->str),v);}}x:S(truncate)(s,0);}}l=g_list_sort(H(get_keys)(h),y);m=*(int*)H(lookup)(h,g_list_first(l)->data);g_list_foreach(l,p,NULL);}

score 6 · Accepted Answer

Perl、185文字

~~200（わずかに壊れている）~~ ~~199~~ ~~197~~ ~~195~~ ~~193187185~~ 文字。最後の2つの改行は重要です。仕様に準拠しています。

map$X{+lc}+=!/^(.|the|and|to|i[nst]|o[rf])$/i,/[a-z]+/gfor<>;
$n=$n>($:=$X{$_}/(76-y+++c))?$n:$:for@w=(sort{$X{$b}-$X{$a}}%X)[0..21];
die map{$U='_'x($X{$_}/$n);" $U
"x!$z++,"|$U| $_
"}@w

最初の行は、有効な単語の数をにロードします%X。

2行目は、すべての出力行が80文字未満になるように、最小スケーリング係数を計算します。

3行目（2つの改行文字を含む）が出力を生成します。

score 5 · Accepted Answer

Java - 886 865 756 744 742 744 752 742714680 文字

最初の742より前の更新：正規表現の改善、不要なパラメーター化されたタイプの削除、不要な空白の削除。
742> 744文字の更新：固定長のハックを修正しました。それは最初の単語にのみ依存し、他の単語には依存しません（まだ）。コードを短縮する場所がいくつか見つかりました（\\s正規表現でに置き換えられ、にArrayList置き換えられましたVector）。Commons IOの依存関係を削除し、stdinから読み取るための簡単な方法を探しています。
744> 752文字の更新：コモンズの依存関係を削除しました。stdinから読み取ります。テキストをstdinに貼り付け、を押してCtrl+Z結果を取得します。
更新752>742文字：スペースを削除publicし、クラス名を2文字ではなく1文字にしたところ、1文字の単語が無視されるようになりました。
更新742>714文字：Carlのコメントに従って更新：冗長な割り当て（742> 730）を削除m.containsKey(k)し、m.get(k)!=null（730> 728）に置き換え、行のサブストリングを導入（728> 714）。
更新714>680文字：Rotsorのコメントに従って更新：不要なキャストを削除するためにバーサイズの計算を改善split()し、不要なを削除するために改善しましたreplaceAll()。

import java.util.*;class F{public static void main(String[]a)throws Exception{StringBuffer b=new StringBuffer();for(int c;(c=System.in.read())>0;b.append((char)c));final Map<String,Integer>m=new HashMap();for(String w:b.toString().toLowerCase().split("(\\b(.|the|and|of|to|i[tns]|or)\\b|\\W)+"))m.put(w,m.get(w)!=null?m.get(w)+1:1);List<String>l=new Vector(m.keySet());Collections.sort(l,new Comparator(){public int compare(Object l,Object r){return m.get(r)-m.get(l);}});int c=76-l.get(0).length();String s=new String(new char[c]).replace('\0','_');System.out.println(" "+s);for(String w:l.subList(0,22))System.out.println("|"+s.substring(0,m.get(w)*c/m.get(l.get(0)))+"| "+w);}}

より読みやすいバージョン：

import java.util.*;
class F{
 public static void main(String[]a)throws Exception{
  StringBuffer b=new StringBuffer();for(int c;(c=System.in.read())>0;b.append((char)c));
  final Map<String,Integer>m=new HashMap();for(String w:b.toString().toLowerCase().split("(\\b(.|the|and|of|to|i[tns]|or)\\b|\\W)+"))m.put(w,m.get(w)!=null?m.get(w)+1:1);
  List<String>l=new Vector(m.keySet());Collections.sort(l,new Comparator(){public int compare(Object l,Object r){return m.get(r)-m.get(l);}});
  int c=76-l.get(0).length();String s=new String(new char[c]).replace('\0','_');System.out.println(" "+s);
  for(String w:l.subList(0,22))System.out.println("|"+s.substring(0,m.get(w)*c/m.get(l.get(0)))+"| "+w);
 }
}

出力：

_________________________________________________________________________
| _________________________________________________________________________ | 彼女
| _______________________________________________________________ | 君
| ____________________________________________________________ | 言った
| _____________________________________________________ | アリス
| _______________________________________________ | だった
| ___________________________________________ | それ
| ____________________________________ | なので
| ________________________________ | 彼女
| _____________________________ | と
| _____________________________ | で
| __________________________ | の上
| __________________________ | すべて
| _______________________ | これ
| _______________________ | にとって
| _______________________ | 持っていました
| _______________________ | しかし
| ______________________ | なれ
| _____________________ | いいえ
| ____________________ | 彼ら
| ____________________ | それで
| ___________________ | 非常に
| __________________ | 何

Javaには（まだ）クロージャがないString#join()のはかなり残念です。

Rotsorによる編集：

私はあなたのソリューションにいくつかの変更を加えました：

ListをString[]に置き換えました
独自の文字列配列を宣言する代わりに、「args」引数を再利用しました。.ToArray（）の引数としても使用しました
StringBufferを文字列に置き換えました（はい、はい、ひどいパフォーマンス）
Javaの並べ替えをselection-sortwithEarly stopに置き換えました（最初の22個の要素のみを見つける必要があります）
いくつかのint宣言を1つのステートメントに集約しました
出力の最も制限された行を見つける非不正行為アルゴリズムを実装しました。FPなしで実装しました。
テキストに22未満の異なる単語が含まれている場合にプログラムがクラッシュする問題を修正しました
入力を読み取る新しいアルゴリズムを実装しました。これは高速で、低速のアルゴリズムよりもわずか9文字長くなっています。

要約されたコードの長さは688711684文字です。

import java.util.*;class F{public static void main(String[]l)throws Exception{Map<String,Integer>m=new HashMap();String w="";int i=0,k=0,j=8,x,y,g=22;for(;(j=System.in.read())>0;w+=(char)j);for(String W:w.toLowerCase().split("(\\b(.|the|and|of|to|i[tns]|or)\\b|\\W)+"))m.put(W,m.get(W)!=null?m.get(W)+1:1);l=m.keySet().toArray(l);x=l.length;if(x<g)g=x;for(;i<g;++i)for(j=i;++j<x;)if(m.get(l[i])<m.get(l[j])){w=l[i];l[i]=l[j];l[j]=w;}for(;k<g;k++){x=76-l[k].length();y=m.get(l[k]);if(k<1||y*i>x*j){i=x;j=y;}}String s=new String(new char[m.get(l[0])*i/j]).replace('\0','_');System.out.println(" "+s);for(k=0;k<g;k++){w=l[k];System.out.println("|"+s.substring(0,m.get(w)*i/j)+"| "+w);}}}

高速バージョン（~~720~~ 693文字）

import java.util.*;class F{public static void main(String[]l)throws Exception{Map<String,Integer>m=new HashMap();String w="";int i=0,k=0,j=8,x,y,g=22;for(;j>0;){j=System.in.read();if(j>90)j-=32;if(j>64&j<91)w+=(char)j;else{if(!w.matches("^(|.|THE|AND|OF|TO|I[TNS]|OR)$"))m.put(w,m.get(w)!=null?m.get(w)+1:1);w="";}}l=m.keySet().toArray(l);x=l.length;if(x<g)g=x;for(;i<g;++i)for(j=i;++j<x;)if(m.get(l[i])<m.get(l[j])){w=l[i];l[i]=l[j];l[j]=w;}for(;k<g;k++){x=76-l[k].length();y=m.get(l[k]);if(k<1||y*i>x*j){i=x;j=y;}}String s=new String(new char[m.get(l[0])*i/j]).replace('\0','_');System.out.println(" "+s);for(k=0;k<g;k++){w=l[k];System.out.println("|"+s.substring(0,m.get(w)*i/j)+"| "+w);}}}

より読みやすいバージョン：

import java.util.*;class F{public static void main(String[]l)throws Exception{
    Map<String,Integer>m=new HashMap();String w="";
    int i=0,k=0,j=8,x,y,g=22;
    for(;j>0;){j=System.in.read();if(j>90)j-=32;if(j>64&j<91)w+=(char)j;else{
        if(!w.matches("^(|.|THE|AND|OF|TO|I[TNS]|OR)$"))m.put(w,m.get(w)!=null?m.get(w)+1:1);w="";
    }}
    l=m.keySet().toArray(l);x=l.length;if(x<g)g=x;
    for(;i<g;++i)for(j=i;++j<x;)if(m.get(l[i])<m.get(l[j])){w=l[i];l[i]=l[j];l[j]=w;}
    for(;k<g;k++){x=76-l[k].length();y=m.get(l[k]);if(k<1||y*i>x*j){i=x;j=y;}}
    String s=new String(new char[m.get(l[0])*i/j]).replace('\0','_');
    System.out.println(" "+s);
    for(k=0;k<g;k++){w=l[k];System.out.println("|"+s.substring(0,m.get(w)*i/j)+"| "+w);}}
}

動作が改善されていないバージョンは615文字です。

import java.util.*;class F{public static void main(String[]l)throws Exception{Map<String,Integer>m=new HashMap();String w="";int i=0,k=0,j=8,g=22;for(;j>0;){j=System.in.read();if(j>90)j-=32;if(j>64&j<91)w+=(char)j;else{if(!w.matches("^(|.|THE|AND|OF|TO|I[TNS]|OR)$"))m.put(w,m.get(w)!=null?m.get(w)+1:1);w="";}}l=m.keySet().toArray(l);for(;i<g;++i)for(j=i;++j<l.length;)if(m.get(l[i])<m.get(l[j])){w=l[i];l[i]=l[j];l[j]=w;}i=76-l[0].length();String s=new String(new char[i]).replace('\0','_');System.out.println(" "+s);for(k=0;k<g;k++){w=l[k];System.out.println("|"+s.substring(0,m.get(w)*i/m.get(l[0]))+"| "+w);}}}

score 4 · Accepted Answer

Scala 2.8、311 314 320 330 332 336341375 文字_

長い単語の調整を含みます。他のソリューションから借りたアイデア。

スクリプトとして（a.scala）：

val t="\\w+\\b(?<!\\bthe|and|of|to|a|i[tns]?|or)".r.findAllIn(io.Source.fromFile(argv(0)).mkString.toLowerCase).toSeq.groupBy(w=>w).mapValues(_.size).toSeq.sortBy(-_._2)take 22
def b(p:Int)="_"*(p*(for((w,c)<-t)yield(76.0-w.size)/c).min).toInt
println(" "+b(t(0)._2))
for(p<-t)printf("|%s| %s \n",b(p._2),p._1)

で実行

scala -howtorun:script a.scala alice.txt

ところで、314文字から311文字に編集すると、実際には1文字しか削除されません。誰かが以前にカウントを間違えました（Windows CR？）。

score 4 · Accepted Answer

Scala、368 文字

まず、592 文字の読みやすいバージョン:

object Alice {
  def main(args:Array[String]) {
    val s = io.Source.fromFile(args(0))
    val words = s.getLines.flatMap("(?i)\\w+\\b(?<!\\bthe|and|of|to|a|i|it|in|or|is)".r.findAllIn(_)).map(_.toLowerCase)
    val freqs = words.foldLeft(Map[String, Int]())((countmap, word)  => countmap + (word -> (countmap.getOrElse(word, 0)+1)))
    val sortedFreqs = freqs.toList.sort((a, b)  => a._2 > b._2)
    val top22 = sortedFreqs.take(22)
    val highestWord = top22.head._1
    val highestCount = top22.head._2
    val widest = 76 - highestWord.length
    println(" " + "_" * widest)
    top22.foreach(t => {
      val width = Math.round((t._2 * 1.0 / highestCount) * widest).toInt
      println("|" + "_" * width + "| " + t._1)
    })
  }
}

コンソール出力は次のようになります。

$ scalac alice.scala 
$ scala Alice aliceinwonderland.txt
 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|_____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|____________________________________________| that
|____________________________________| as
|_________________________________| her
|______________________________| at
|______________________________| with
|_____________________________| s
|_____________________________| t
|___________________________| on
|__________________________| all
|_______________________| had
|_______________________| but
|______________________| be
|______________________| not
|____________________| they
|____________________| so
|___________________| very
|___________________| what

積極的な縮小を行って、415 文字まで減らすことができます。

object A{def main(args:Array[String]){val l=io.Source.fromFile(args(0)).getLines.flatMap("(?i)\\w+\\b(?<!\\bthe|and|of|to|a|i|it|in|or|is)".r.findAllIn(_)).map(_.toLowerCase).foldLeft(Map[String, Int]())((c,w)=>c+(w->(c.getOrElse(w,0)+1))).toList.sort((a,b)=>a._2>b._2).take(22);println(" "+"_"*(76-l.head._1.length));l.foreach(t=>println("|"+"_"*Math.round((t._2*1.0/l.head._2)*(76-l.head._1.length)).toInt+"| "+t._1))}}

コンソールセッションは次のようになります。

$ scalac a.scala 
$ scala A aliceinwonderland.txt
 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|_____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|____________________________________________| that
|____________________________________| as
|_________________________________| her
|______________________________| at
|______________________________| with
|_____________________________| s
|_____________________________| t
|___________________________| on
|__________________________| all
|_______________________| had
|_______________________| but
|______________________| be
|______________________| not
|____________________| they
|____________________| so
|___________________| very
|___________________| what

Scala の専門家ならもっとうまくやれるはずです。

更新:コメントで、Thomas は 368 文字のさらに短いバージョンを提供しました。

object A{def main(a:Array[String]){val t=(Map[String, Int]()/:(for(x<-io.Source.fromFile(a(0)).getLines;y<-"(?i)\\w+\\b(?<!\\bthe|and|of|to|a|i|it|in|or|is)".r findAllIn x) yield y.toLowerCase).toList)((c,x)=>c+(x->(c.getOrElse(x,0)+1))).toList.sortBy(_._2).reverse.take(22);val w=76-t.head._1.length;print(" "+"_"*w);t map (s=>"\n|"+"_"*(s._2*w/t.head._2)+"| "+s._1) foreach print}}

読みやすいように、375 文字で:

object Alice {
  def main(a:Array[String]) {
    val t = (Map[String, Int]() /: (
      for (
        x <- io.Source.fromFile(a(0)).getLines
        y <- "(?i)\\w+\\b(?<!\\bthe|and|of|to|a|i|it|in|or|is)".r.findAllIn(x)
      ) yield y.toLowerCase
    ).toList)((c, x) => c + (x -> (c.getOrElse(x, 0) + 1))).toList.sortBy(_._2).reverse.take(22)
    val w = 76 - t.head._1.length
    print (" "+"_"*w)
    t.map(s => "\n|" + "_" * (s._2 * w / t.head._2) + "| " + s._1).foreach(print)
  }
}

score 4 · Accepted Answer

Clojure 282 厳密

(let[[[_ m]:as s](->>(slurp *in*).toLowerCase(re-seq #"\w+\b(?<!\bthe|and|of|to|a|i[tns]?|or)")frequencies(sort-by val >)(take 22))[b](sort(map #(/(- 76(count(key %)))(val %))s))p #(do(print %1)(dotimes[_(* b %2)](print \_))(apply println %&))](p " " m)(doseq[[k v]s](p \| v \| k)))

やや読みやすく：

(let[[[_ m]:as s](->> (slurp *in*)
                   .toLowerCase
                   (re-seq #"\w+\b(?<!\bthe|and|of|to|a|i[tns]?|or)")
                   frequencies
                   (sort-by val >)
                   (take 22))
     [b] (sort (map #(/ (- 76 (count (key %)))(val %)) s))
     p #(do
          (print %1)
          (dotimes[_(* b %2)] (print \_))
          (apply println %&))]
  (p " " m)
  (doseq[[k v] s] (p \| v \| k)))

score 3 · Accepted Answer

C++、647 文字

C++ を使って高得点を取れるとは思っていませんが、気にしないでください。すべての要件を満たしていると確信しています。変数宣言にC++0xautoキーワードを使用したことに注意してください。コードをテストする場合は、コンパイラを適切に調整してください。

最小化されたバージョン

#include <iostream>
#include <cstring>
#include <map>
using namespace std;
#define C string
#define S(x)v=F/a,cout<<#x<<C(v,'_')
#define F t->first
#define G t->second
#define O &&F!=
#define L for(i=22;i-->0;--t)
int main(){map<C,int>f;char d[230];int i=1,v;for(;i<256;i++)d[i<123?i-1:i-27]=i;d[229]=0;char w[99];while(cin>>w){for(i=0;w[i];i++)w[i]=tolower(w[i]);char*p=strtok(w,d);while(p)++f[p],p=strtok(0,d);}multimap<int,C>c;for(auto t=f.end();--t!=f.begin();)if(F!="the"O"and"O"of"O"to"O"a"O"i"O"it"O"in"O"or"O"is")c.insert(pair<int,C>(G,F));auto t=--c.end();float a=0,A;L A=F/(76.0-G.length()),a=a>A?a:A;t=--c.end();S( );L S(\n|)<<"| "<<G;}

stringとではなくchar[]を使用して、より「C++」に近い 2 番目のバージョンを次に示しstrtokます。669 (+22 vs 上記)で少し大きいですが、現時点では小さくできないので、とにかく投稿すると思いました。

#include <iostream>
#include <map>
using namespace std;
#define C string
#define S(x)v=F/a,cout<<#x<<C(v,'_')
#define F t->first
#define G t->second
#define O &&F!=
#define L for(i=22;i-->0;--t)
#define E e=w.find_first_of(d,g);g=w.find_first_not_of(d,e);
int main(){map<C,int>f;int i,v;C w,x,d="abcdefghijklmnopqrstuvwxyz";while(cin>>w){for(i=w.size();i-->0;)w[i]=tolower(w[i]);unsigned g=0,E while(g-e>0){x=w.substr(e,g-e),++f[x],E}}multimap<int,C>c;for(auto t=f.end();--t!=f.begin();)if(F!="the"O"and"O"of"O"to"O"a"O"i"O"it"O"in"O"or"O"is")c.insert(pair<int,C>(G,F));auto t=--c.end();float a=0,A;L A=F/(76.0-G.length()),a=a>A?a:A;t=--c.end();S( );L S(\n|)<<"| "<<G;}

最小化されたバージョンに微調整して更新し続けるのが面倒なので、フルバージョンを削除しました。(古い可能性がある) 長いバージョンに興味がある場合は、編集履歴を参照してください。

score 3 · Accepted Answer

Java - 896 文字

931文字

1233文字が読めなくなった

1977 文字「非圧縮」

更新: 文字数を積極的に減らしました。更新された仕様ごとに 1 文字の単語を省略します。

私は C# と LINQ がとてもうらやましいです。

import java.util.*;import java.io.*;import static java.util.regex.Pattern.*;class g{public static void main(String[] a)throws Exception{PrintStream o=System.out;Map<String,Integer> w=new HashMap();Scanner s=new Scanner(new File(a[0])).useDelimiter(compile("[^a-z]+|\\b(the|and|of|to|.|it|in|or|is)\\b",2));while(s.hasNext()){String z=s.next().trim().toLowerCase();if(z.equals(""))continue;w.put(z,(w.get(z)==null?0:w.get(z))+1);}List<Integer> v=new Vector(w.values());Collections.sort(v);List<String> q=new Vector();int i,m;i=m=v.size()-1;while(q.size()<22){for(String t:w.keySet())if(!q.contains(t)&&w.get(t).equals(v.get(i)))q.add(t);i--;}int r=80-q.get(0).length()-4;String l=String.format("%1$0"+r+"d",0).replace("0","_");o.println(" "+l);o.println("|"+l+"| "+q.get(0)+" ");for(i=m-1;i>m-22;i--){o.println("|"+l.substring(0,(int)Math.round(r*(v.get(i)*1.0)/v.get(m)))+"| "+q.get(m-i)+" ");}}}

「読みやすい」:

import java.util.*;
import java.io.*;
import static java.util.regex.Pattern.*;
class g
{
   public static void main(String[] a)throws Exception
      {
      PrintStream o = System.out;
      Map<String,Integer> w = new HashMap();
      Scanner s = new Scanner(new File(a[0]))
         .useDelimiter(compile("[^a-z]+|\\b(the|and|of|to|.|it|in|or|is)\\b",2));
      while(s.hasNext())
      {
         String z = s.next().trim().toLowerCase();
         if(z.equals(""))
            continue;
         w.put(z,(w.get(z) == null?0:w.get(z))+1);
      }
      List<Integer> v = new Vector(w.values());
      Collections.sort(v);
      List<String> q = new Vector();
      int i,m;
      i = m = v.size()-1;
      while(q.size()<22)
      {
         for(String t:w.keySet())
            if(!q.contains(t)&&w.get(t).equals(v.get(i)))
               q.add(t);
         i--;
      }
      int r = 80-q.get(0).length()-4;
      String l = String.format("%1$0"+r+"d",0).replace("0","_");
      o.println(" "+l);
      o.println("|"+l+"| "+q.get(0)+" ");
      for(i = m-1; i > m-22; i--)
      {
         o.println("|"+l.substring(0,(int)Math.round(r*(v.get(i)*1.0)/v.get(m)))+"| "+q.get(m-i)+" ");
      }
   }
}

アリスの出力:

 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|_____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|____________________________________________| that
|____________________________________| as
|_________________________________| her
|______________________________| with
|______________________________| at
|___________________________| on
|__________________________| all
|________________________| this
|________________________| for
|_______________________| had
|_______________________| but
|______________________| be
|______________________| not
|____________________| they
|____________________| so
|___________________| very
|___________________| what

Don Quixote の出力 (これも Gutenberg から):

 ________________________________________________________________________
|________________________________________________________________________| that
|________________________________________________________| he
|______________________________________________| for
|__________________________________________| his
|________________________________________| as
|__________________________________| with
|_________________________________| not
|_________________________________| was
|________________________________| him
|______________________________| be
|___________________________| don
|_________________________| my
|_________________________| this
|_________________________| all
|_________________________| they
|________________________| said
|_______________________| have
|_______________________| me
|______________________| on
|______________________| so
|_____________________| you
|_____________________| quixote

score 3 · Accepted Answer

さらに別の python 2.x - 206 文字 (または「幅バー」で 232)

質問に完全に準拠している場合、これを信じています。無視リストはここにあり、行の長さを完全にチェックします（5番目の項目を最長の行にするテキスト全体に置き換えAliceた例を参照してくださいAliceinwonderlandbylewiscarroll。ファイル名でさえ、ハードコードされているのではなくコマンドラインから提供されています（ハードコードすると約10文字が削除されます）。これには欠点が 1 つあります (ただし、質問には問題ないと思います)。行を 80 文字より短くするために整数除算器を計算するため、最長の行は 80 文字より短く、正確に 80 文字ではありません。Python 3.x バージョンはそうではありません。この欠陥があります（ただし、はるかに長くなります）。

また、読むのはそれほど難しくないと思います。

import sys,re
t=re.split("\W+(?:(?:the|and|o[fr]|to|a|i[tns]?)\W+)*",sys.stdin.read().lower())
b=sorted((-t.count(x),x)for x in set(t))[:22]
for l,w in b:print"|"+l/min(z/(78-len(e))for z,e in b)*'-'+"|",w

|----------------------------------------------------------------| she
|--------------------------------------------------------| you
|-----------------------------------------------------| said
|----------------------------------------------| aliceinwonderlandbylewiscarroll
|-----------------------------------------| was
|--------------------------------------| that
|-------------------------------| as
|----------------------------| her
|--------------------------| at
|--------------------------| with
|-------------------------| s
|-------------------------| t
|-----------------------| on
|-----------------------| all
|---------------------| this
|--------------------| for
|--------------------| had
|--------------------| but
|-------------------| be
|-------------------| not
|------------------| they
|-----------------| so

最大バーをその行に単独で出力する必要があるかどうかは明確ではないため (サンプル出力のように)。以下はそれを行う別のものですが、232文字です。

import sys,re
t=re.split("\W+(?:(?:the|and|o[fr]|to|a|i[tns]?)\W+)*",sys.stdin.read().lower())
b=sorted((-t.count(x),x)for x in set(t))[:22]
f=min(z/(78-len(e))for z,e in b)
print"",b[0][0]/f*'-'
for y,w in b:print"|"+y/f*'-'+"|",w

Python 3.x - 256 文字

Python 3.x の Counter クラスを使用すると、(ここで必要なすべてのことを Counter が行うため) 短くすることが期待されていました。それは良くないことがわかります。以下は私のトライアル266文字です：

import sys,re,collections as c
b=c.Counter(re.split("\W+(?:(?:the|and|o[fr]|to|a|i[tns]?)\W+)*",
sys.stdin.read().lower())).most_common(22)
F=lambda p,x,w:print(p+'-'*int(x/max(z/(77.-len(e))for e,z in b))+w)
F(" ",b[0][1],"")
for w,y in b:F("|",y,"| "+w)

問題は、collectionsandmost_commonが非常に長い単語であり、さらにCounterは短くないことです...実際には、を使用しないCounterと、コードが 2 文字だけ長くなります ;-(

Python 3.x では、他の制約も導入されています。2 つの整数を分割することは、もはや整数ではありません (したがって、int にキャストする必要があります)。print は関数になりました (括弧を追加する必要があります)。 python2.x バージョンですが、はるかに高速です。おそらく、さらに実験された python 3.x コーダーには、コードを短縮するためのアイデアがあるでしょう。

score 2 · Accepted Answer

Java-991文字_{^{（改行とインデントを含む）}}

私は@seanizerのコードを取得し、バグを修正し（彼は最初の出力行を省略しました）、コードをより「ゴルフ」にするためにいくつかの改善を行いました。

import java.util.*;
import java.util.regex.*;
import org.apache.commons.io.IOUtils;
public class WF{
 public static void main(String[] a)throws Exception{
  String t=IOUtils.toString(new java.net.URL(a[0]).openStream());
  class W implements Comparable<W> {
   String w;int f=1;W(String W){w=W;}public int compareTo(W o){return o.f-f;}
   String d(float r){char[]c=new char[(int)(f/r)];Arrays.fill(c,'_');return "|"+new String(c)+"| "+w;}
  }
  Map<String,W>M=new HashMap<String,W>();
  Matcher m=Pattern.compile("\\b\\w+\\b").matcher(t.toLowerCase());
  while(m.find()){String w=m.group();W W=M.get(w);if(W==null)M.put(w,new W(w));else W.f++;}
  M.keySet().removeAll(Arrays.asList("the,and,of,to,a,i,it,in,or,is".split(",")));
  List<W>L=new ArrayList<W>(M.values());Collections.sort(L);int l=76-L.get(0).w.length();
  System.out.println(" "+new String(new char[l]).replace('\0','_'));
  for(W w:L.subList(0,22))System.out.println(w.d((float)L.get(0).f/(float)l));
 }
}

出力：

_________________________________________________________________________
| _________________________________________________________________________ | 彼女
| _______________________________________________________________ | 君
| ____________________________________________________________ | 言った
| _____________________________________________________ | アリス
| _______________________________________________ | だった
| ___________________________________________ | それ
| ____________________________________ | なので
| ________________________________ | 彼女
| _____________________________ | と
| _____________________________ | で
| ____________________________ | s
| ____________________________ | t
| __________________________ | の上
| __________________________ | すべて
| _______________________ | これ
| _______________________ | にとって
| _______________________ | 持っていました
| _______________________ | しかし
| ______________________ | なれ
| _____________________ | いいえ
| ____________________ | 彼ら
| ____________________ | それで

score 2 · Accepted Answer

Python 2.6、273 269 267 266 文字。

（編集：キャラクターシェービングの提案のためのChristopheDへの小道具）

import sys,re
t=re.findall('[a-z]+',"".join(sys.stdin).lower())
d=sorted((t.count(w),w)for w in set(t)-set("the and of to a i it in or is".split()))[:-23:-1]
r=min((78.-len(m[1]))/m[0]for m in d)
print'','_'*(int(d[0][0]*r-2))
for(a,b)in d:print"|"+"_"*(int(a*r-2))+"|",b

出力：

 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|____________________________________________________________| said
|____________________________________________________| alice
|______________________________________________| was
|__________________________________________| that
|___________________________________| as
|_______________________________| her
|____________________________| with
|____________________________| at
|___________________________| s
|___________________________| t
|_________________________| on
|_________________________| all
|______________________| this
|______________________| for
|______________________| had
|_____________________| but
|____________________| be
|____________________| not
|___________________| they
|__________________| so

score 2 · Accepted Answer

マットラボ 335~~404~~ ~~410バイト~~ ~~357 バイト。~~ ~~390 バイト。~~

更新されたコードは 404 文字ではなく 335 文字になり、両方の例でうまくいくようです。

オリジナルメッセージ（404文字のコードの場合）

このバージョンは少し長くなりますが、途方もなく長い単語がある場合、バーの長さを適切にスケーリングして、列が 80 を超えないようにします。

したがって、私のコードは再スケーリングなしで 357 バイト、再スケーリングありで 410 バイトです。

A=textscan(fopen('11.txt'),'%s','delimiter',' 0123456789,.!?-_*^:;=+\\/(){}[]@&#$%~`|"''');
s=lower(A{1});s(cellfun('length', s)<2)=[];s(ismember(s,{'the','and','of','to','it','in','or','is'}))=[];
[w,~,i]=unique(s);N=hist(i,max(i)); [j,k]=sort(N,'descend'); b=k(1:22); n=cellfun('length',w(b));
q=80*N(b)'/N(k(1))+n; q=floor(q*78/max(q)-n); for i=1:22, fprintf('%s| %s\n',repmat('_',1,l(i)),w{k(i)});end

結果：

___________________________________________________________________________| she
_________________________________________________________________| you
______________________________________________________________| said
_______________________________________________________| alice
________________________________________________| was
____________________________________________| that
_____________________________________| as
_________________________________| her
______________________________| at
______________________________| with
____________________________| on
___________________________| all
_________________________| this
________________________| for
________________________| had
________________________| but
_______________________| be
_______________________| not
_____________________| they
____________________| so
___________________| very
___________________| what

たとえば、不思議の国のアリスのテキストの「you」のすべてのインスタンスを「superlongstringofridiculousness」に置き換えると、私のコードは結果を正しくスケーリングします。

____________________________________________________________________| she
_________________________________________________________| superlongstringstring
________________________________________________________| said
_________________________________________________| alice
____________________________________________| was
________________________________________| that
_________________________________| as
______________________________| her
___________________________| with
___________________________| at
_________________________| on
________________________| all
_____________________| this
_____________________| for
_____________________| had
_____________________| but
____________________| be
____________________| not
__________________| they
__________________| so
_________________| very
_________________| what

もう少し読みやすく更新されたコードを次に示します。

A=textscan(fopen('t'),'%s','delimiter','':'@');
s=lower(A{1});
s(cellfun('length', s)<2|ismember(s,{'the','and','of','to','it','in','or','is'}))=[];
[w,~,i]=unique(s);
N=hist(i,max(i)); 
[j,k]=sort(N,'descend'); 
n=cellfun('length',w(k));
q=80*N(k)'/N(k(1))+n; 
q=floor(q*78/max(q)-n); 
for i=1:22, 
    fprintf('%s| %s\n',repmat('_',1,q(i)),w{k(i)});
end

score 2 · Accepted Answer

Clojure - 611 文字 (最小化されていない)

夜遅くまで、可能な限り慣用的な Clojure でコードを書いてみました。私はその機能をあまり誇りに思っていませんがdraw-chart、コードは Clojure の簡潔さを物語っていると思います。

(ns word-freq
(:require [clojure.contrib.io :as io]))

(defn word-freq
  [f]
  (take 22 (->> f
                io/read-lines ;;; slurp should work too, but I love map/red
                (mapcat (fn [l] (map #(.toLowerCase %) (re-seq #"\w+" l))))
                (remove #{"the" "and" "of" "to" "a" "i" "it" "in" "or" "is"})
                (reduce #(assoc %1 %2 (inc (%1 %2 0))) {})
                (sort-by (comp - val)))))

(defn draw-chart
  [fs]
  (let [[[w f] & _] fs]
    (apply str
           (interpose \newline
                      (map (fn [[k v]] (apply str (concat "|" (repeat (int (* (- 76 (count w)) (/ v f 1))) "_") "| " k " ")) ) fs)))))

;;; (println (draw-chart (word-freq "/Users/ghoseb/Desktop/alice.txt")))

出力：

|_________________________________________________________________________| she 
|_______________________________________________________________| you 
|____________________________________________________________| said 
|____________________________________________________| alice 
|_______________________________________________| was 
|___________________________________________| that 
|____________________________________| as 
|________________________________| her 
|_____________________________| with 
|_____________________________| at 
|____________________________| t 
|____________________________| s 
|__________________________| on 
|__________________________| all 
|_______________________| for 
|_______________________| had 
|_______________________| this 
|_______________________| but 
|______________________| be 
|_____________________| not 
|____________________| they 
|____________________| so

私は知っています、これは仕様に従っていませんが、ねえ、これはすでに非常に小さいいくつかの非常にきれいなClojureコードです:)

score 2 · Accepted Answer

シェル、228 文字、80 文字の制約が機能する

tr A-Z a-z|tr -Cs a-z "\n"|sort|egrep -v "^(the|and|of|to|a|i|it|in|or|is)$" |uniq -c|sort -r|head -22>g
n=1
while :
do
awk '{printf "|%0*s| %s\n",$1*'$n'/1e3,"",$2;}' g|tr 0 _>o 
egrep -q .{80} o&&break
n=$((n+1))
done
cat o

printf の驚くべき * 機能を誰も使用していないように見えることに驚いています。

猫 11-very.txt > ゴルフ.sh

|__________________________________________________________________________| she
|________________________________________________________________| you
|_____________________________________________________________| said
|______________________________________________________| alice
|_______________________________________________| was
|____________________________________________| that
|____________________________________| as
|_________________________________| her
|______________________________| with
|______________________________| at
|_____________________________| s
|_____________________________| t
|___________________________| on
|__________________________| all
|________________________| this
|_______________________| for
|_______________________| had
|_______________________| but
|______________________| be
|______________________| not
|____________________| they
|____________________| so

猫 11 | ゴルフ.sh

|_________________________________________________________________| she
|_________________________________________________________| verylongstringstring
|______________________________________________________| said
|_______________________________________________| alice
|__________________________________________| was
|_______________________________________| that
|________________________________| as
|_____________________________| her
|___________________________| with
|___________________________| at
|__________________________| s
|_________________________| t
|________________________| on
|_______________________| all
|_____________________| this
|_____________________| for
|_____________________| had
|____________________| but
|___________________| be
|___________________| not
|__________________| they
|__________________| so

score 2 · Accepted Answer

Scala、327 文字

これは、Python バージョンに触発されたmkneissl の回答から採用されましたが、より大きくなっています。誰かがそれを短くできる場合に備えて、ここに残します。

val f="\\w+\\b(?<!\\bthe|and|of|to|a|i[tns]?|or)".r.findAllIn(io.Source.fromFile("11.txt").mkString.toLowerCase).toSeq
val t=f.toSet[String].map(x=> -f.count(x==)->x).toSeq.sorted take 22
def b(p:Int)="_"*(-p/(for((c,w)<-t)yield-c/(76.0-w.size)).max).toInt
println(" "+b(t(0)._1))
for(p<-t)printf("|%s| %s \n",b(p._1),p._2)

score 2 · Accepted Answer

R 449文字

短くなるかも…

bar <- function(w, l)
    {
    b <- rep("-", l)
    s <- rep(" ", l)
    cat(" ", b, "\n|", s, "| ", w, "\n ", b, "\n", sep="")
    }

f <- "alice.txt"
e <- c("the", "and", "of", "to", "a", "i", "it", "in", "or", "is", "")
w <- unlist(lapply(readLines(file(f)), strsplit, s=" "))
w <- tolower(w)
w <- unlist(lapply(w, gsub, pa="[^a-z]", r=""))
u <- unique(w[!w %in% e])
n <- unlist(lapply(u, function(x){length(w[w==x])}))
o <- rev(order(n))
n <- n[o]
m <- 77 - max(unlist(lapply(u[1:22], nchar)))
n <- floor(m*n/n[1])
u <- u[o]

for (i in 1:22)
    bar(u[i], n[i])

score 2 · Accepted Answer

Groovy、424 389 378 321 文字

に置き換えb=map.get(a)、b=map[a]分割をマッチャー/イテレータに置き換え

def r,s,m=[:],n=0;def p={println it};def w={"_".multiply it};(new URL(this.args[0]).text.toLowerCase()=~/\b\w+\b/).each{s=it;if(!(s==~/(the|and|of|to|a|i[tns]?|or)/))m[s]=m[s]==null?1:m[s]+1};m.keySet().sort{a,b->m[b]<=>m[a]}.subList(0,22).each{k->if(n++<1){r=(m[k]/(76-k.length()));p" "+w(m[k]/r)};p"|"+w(m[k]/r)+"|"+k}

(コマンド行引数として URL を使用して groovy スクリプトとして実行されます。インポートは必要ありません!)

読みやすいバージョンはこちら：

def r,s,m=[:],n=0;
def p={println it};
def w={"_".multiply it};
(new URL(this.args[0]).text.toLowerCase()
        =~ /\b\w+\b/
        ).each{
        s=it;
        if (!(s ==~/(the|and|of|to|a|i[tns]?|or)/))
            m[s] = m[s] == null ? 1 : m[s] + 1
        };
    m.keySet()
        .sort{
            a,b -> m[b] <=> m[a]
        }
        .subList(0,22).each{
            k ->
                if( n++ < 1 ){
                    r=(m[k]/(76-k.length()));
                    p " " + w(m[k]/r)
                };
                p "|" + w(m[k]/r) + "|" + k
}

score 1 · Accepted Answer

Luaソリューション：478文字。

t,u={},{}for l in io.lines()do
for w in l:gmatch("%a+")do
w=w:lower()if not(" the and of to a i it in or is "):find(" "..w.." ")then
t[w]=1+(t[w]or 0)end
end
end
for k,v in next,t do
u[#u+1]={k,v}end
table.sort(u,function(a,b)return a[2]>b[2]end)m,n=u[1][2],math.min(#u,22)for w=80,1,-1 do
s=""for i=1,n do
a,b=u[i][1],w*u[i][2]/m
if b+#a>=78 then s=nil break end
s2=("_"):rep(b)if i==1 then
s=s.." " ..s2.."\n"end
s=s.."|"..s2.."| "..a.."\n"end
if s then print(s)break end end

読み取り可能なバージョン：

t,u={},{}
for line in io.lines() do
    for w in line:gmatch("%a+") do
        w = w:lower()
        if not (" the and of to a i it in or is "):find(" "..w.." ") then
            t[w] = 1 + (t[w] or 0)
        end
    end
end
for k, v in pairs(t) do
    u[#u+1]={k, v}
end

table.sort(u, function(a, b)
    return a[2] > b[2]
end)

local max = u[1][2]
local n = math.min(#u, 22)

for w = 80, 1, -1 do
    s=""
    for i = 1, n do
        f = u[i][2]
        word = u[i][1]
        width = w * f / max
        if width + #word >= 78 then
            s=nil
            break
        end
        s2=("_"):rep(width)
        if i==1 then
            s=s.." " .. s2 .."\n"
        end
        s=s.."|" .. s2 .. "| " .. word.."\n"
    end
    if s then
        print(s)
        break
    end
end

score 1 · Accepted Answer

Go、613文字、おそらくはるかに小さい可能性があります。

package main
import(r "regexp";. "bytes";. "io/ioutil";"os";st "strings";s "sort";. "container/vector")
type z struct{c int;w string}
func(e z)Less(o interface{})bool{return o.(z).c<e.c}
func main(){b,_:=ReadAll(os.Stdin);g:=r.MustCompile
c,m,x:=g("[A-Za-z]+").AllMatchesIter(b,0),map[string]int{},g("the|and|of|it|in|or|is|to")
for w:=range c{w=ToLower(w);if len(w)>1&&!x.Match(w){m[string(w)]++}}
o,y:=&Vector{},0
for k,v:=range m{o.Push(z{v,k});if v>y{y=v}}
s.Sort(o)
for i,v:=range *o{if i>21{break};x:=v.(z);c:=int(float(x.c)/float(y)*80)
u:=st.Repeat("_",c);if i<1{println(" "+u)};println("|"+u+"| "+x.w)}}

とても汚いです。

score 1 · Accepted Answer

大きなものを愛さなければならない...Objective-C（~~1070~~ ~~931~~ 905文字）

#define S NSString
#define C countForObject
#define O objectAtIndex
#define U stringWithCString
main(int g,char**b){id c=[NSCountedSet set];S*d=[S stringWithContentsOfFile:[S U:b[1]]];id p=[NSPredicate predicateWithFormat:@"SELF MATCHES[cd]'(the|and|of|to|a|i[tns]?|or)|[^a-z]'"];[d enumerateSubstringsInRange:NSMakeRange(0,[d length])options:NSStringEnumerationByWords usingBlock:^(S*s,NSRange x,NSRange y,BOOL*z){if(![p evaluateWithObject:s])[c addObject:[s lowercaseString]];}];id s=[[c allObjects]sortedArrayUsingComparator:^(id a,id b){return(NSComparisonResult)([c C:b]-[c C:a]);}];g=[c C:[s O:0]];int j=76-[[s O:0]length];char*k=malloc(80);memset(k,'_',80);S*l=[S U:k length:80];printf(" %s\n",[[l substringToIndex:j]cString]),[[s subarrayWithRange:NSMakeRange(0,22)]enumerateObjectsUsingBlock:^(id a,NSUInteger x,BOOL*y){printf("|%s| %s\n",[[l substringToIndex:[c C:a]*j/g]cString],[a cString]);}];}

多くの価値の低い API を使用するように切り替え、不要なメモリ管理を削除し、より積極的な空白を削除しました

 _________________________________________________________________________
|_________________________________________________________________________| she
|______________________________________________________________| said
|__________________________________________________________| you
|____________________________________________________| alice
|________________________________________________| was
|_______________________________________| that
|____________________________________| as
|_________________________________| her
|______________________________| with
|______________________________| at
|___________________________| on
|__________________________| all
|________________________| this
|________________________| for
|________________________| had
|_______________________| but
|______________________| be
|______________________| not
|____________________| so
|___________________| very
|__________________| what
|_________________| they

score 1 · Accepted Answer

Bourne シェル、213/240 文字

以前に投稿したシェルバージョンを改良すると、213 文字まで減らすことができます。

tr A-Z a-z|tr -Cs a-z \\n|sort|egrep -v '^(the|and|of|to|a|i|it|in|or|is)$'|uniq -c|sort -rn|sed 22q>g
n=1
>o
until egrep -q .{80} o
do
awk '{printf "|%0*d| %s\n",$1*'$n'/1e3,0,$2}' g|tr 0 _>o 
((n++))
done
cat o

トップバーの上部のアウトラインを取得するために、240 文字に拡張する必要がありました。

tr A-Z a-z|tr -Cs a-z \\n|sort|egrep -v "^(the|and|of|to|a|i|it|in|or|is)$"|uniq -c|sort -r|sed 1p\;22q>g
n=1
>o
until egrep -q .{80} o
do
awk '{printf "|%0*d| %s\n",$1*'$n'/1e3,0,NR==1?"":$2}' g|sed '1s,|, ,g'|tr 0 _>o 
((n++))
done
cat o

score 1 · Accepted Answer

shell、grep、tr、grep、sort、uniq、sort、head、perl - 194 文字

-i フラグをいくつか追加すると、過度に長い tr AZ az| が削除される場合があります。ステップ; 仕様では、表示される大文字と小文字については何も述べられておらず、uniq -ci は大文字と小文字の違いを削除します。

egrep -oi [a-z]+|egrep -wiv 'the|and|o[fr]|to|a|i[tns]?'|sort|uniq -ci|sort -nr|head -22|perl -lape'($f,$w)=@F;$.>1or($q,$x)=($f,76-length$w);$b="_"x($f/$q*$x);$_="|$b| $w ";$.>1or$_=" $b\n$_"'

これは、元の 206 文字と比較して、tr の場合はマイナス 11、-i の場合は 2 です。

編集:パターンマッチングがとにかく境界で開始されるため、除外できる \\b のマイナス 3。

sort は最初に小文字を与え、 uniq -ci は最初に出現したものを取るため、出力の唯一の実際の変化は、アリスが大文字のイニシャルを保持することです。

score 1 · Accepted Answer

TCL 554 厳格

foreach w [regexp -all -inline {[a-z]+} [string tolower [read stdin]]] {if {[lsearch {the and of to it in or is a i} $w]>=0} {continue};if {[catch {incr Ws($w)}]} {set Ws($w) 1}}
set T [lrange [lsort -decreasing -stride 2 -index 1 -integer [array get Ws]] 0 43]
foreach {w c} $T {lappend L [string length $w];lappend C $c}
set N [tcl::mathfunc::max {*}$L]
set C [lsort -integer $C]
set M [lindex $C end]
puts " [string repeat _ [expr {int((76-$N) * [lindex $T 1] / $M)}]] "
foreach {w c} $T {puts "|[string repeat _ [expr {int((76-$N) * $c / $M)}]]| $w"}

または、より読みやすく

foreach w [regexp -all -inline {[a-z]+} [string tolower [read stdin]]] {
    if {[lsearch {the and of to a i it in or is} $w] >= 0} { continue }
    if {[catch {incr words($w)}]} {
        set words($w) 1
    }
}
set topwords [lrange [lsort -decreasing -stride 2 -index 1 -integer [array get words]] 0 43]
foreach {word count} $topwords {
    lappend lengths [string length $word]
    lappend counts $count
}
set maxlength [lindex [lsort -integer $lengths] end]
set counts [lsort -integer $counts]
set mincount [lindex $counts 0].0
set maxcount [lindex $counts end].0
puts " [string repeat _ [expr {int((76-$maxlength) * [lindex $topwords 1] / $maxcount)}]] "
foreach {word count} $topwords {
    set barlength [expr {int((76-$maxlength) * $count / $maxcount)}]
    puts "|[string repeat _ $barlength]| $word"
}

score 1 · Accepted Answer

GNU スモールトーク (386)

もう少し短くできると思いますが、まだ方法がわかりません。

|q s f m|q:=Bag new. f:=FileStream stdin. m:=0.[f atEnd]whileFalse:[s:=f nextLine.(s notNil)ifTrue:[(s tokenize:'\W+')do:[:i|(((i size)>1)&({'the'.'and'.'of'.'to'.'it'.'in'.'or'.'is'}includes:i)not)ifTrue:[q add:(i asLowercase)]. m:=m max:(i size)]]].(q:=q sortedByCount)from:1to:22 do:[:i|'|'display.((i key)*(77-m)//(q first key))timesRepeat:['='display].('| %1'%{i value})displayNl]

score 1 · Accepted Answer

ルビー、205

この Ruby バージョンでは、「超長ストリング」を処理します。 (最初の 2 行は、以前の Ruby プログラムとほぼ同じです。)

次のように実行する必要があります。

ruby -n0777 golf.rb Alice.txt

W=($_.upcase.scan(/\w+/)-%w(THE AND OF TO A I IT
IN OR IS)).group_by{|x|x}.map{|k,v|[-v.size,k]}.sort[0,22]
u=proc{|m|"_"*(W.map{|n,s|(76.0-s.size)/n}.max*m)}
puts" "+u[W[0][0]],W.map{|n,s|"|%s| "%u[n]+s}

3 行目は、正しくスケーリングされたアンダースコアの文字列を生成するクロージャまたはラムダを作成します。

u = プロシージャ {|m|
  "_" *
    (W.map{|n,s| (76.0 - s.size)/n}.max * m)
}

.max.min数値が負であるため、代わりに使用されます。

score 1 · Accepted Answer

R、298文字

f=scan("stdin","ch")
u=unlist
s=strsplit
a=u(s(u(s(tolower(f),"[^a-z]")),"^(the|and|of|to|it|in|or|is|.|)$"))
v=unique(a)
r=sort(sapply(v,function(i) sum(a==i)),T)[2:23]  #the first item is an empty string, just skipping it
w=names(r)
q=(78-max(nchar(w)))*r/max(r)
cat(" ",rep("_",q[1])," \n",sep="")
for(i in 1:22){cat("|",rep("_",q[i]),"| ",w[i],"\n",sep="")}

出力は次のとおりです。

 _________________________________________________________________________ 
|_________________________________________________________________________| she
|_______________________________________________________________| you
|____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|___________________________________________| that
|____________________________________| as
|________________________________| her
|_____________________________| at
|_____________________________| with
|__________________________| on
|__________________________| all
|_______________________| this
|_______________________| for
|_______________________| had
|_______________________| but
|______________________| be
|_____________________| not
|____________________| they
|____________________| so
|___________________| very
|__________________| what

そして、「あなた」がより長いものに置き換えられた場合:

 ____________________________________________________________ 
|____________________________________________________________| she
|____________________________________________________| veryverylongstring
|__________________________________________________| said
|___________________________________________| alice
|______________________________________| was
|___________________________________| that
|_____________________________| as
|__________________________| her
|________________________| at
|________________________| with
|______________________| on
|_____________________| all
|___________________| this
|___________________| for
|___________________| had
|__________________| but
|__________________| be
|__________________| not
|________________| they
|________________| so
|_______________| very
|_______________| what

score 1 · Accepted Answer

Python、320 文字

import sys
i="the and of to a i it in or is".split()
d={}
for j in filter(lambda x:x not in i,sys.stdin.read().lower().split()):d[j]=d.get(j,0)+1
w=sorted(d.items(),key=lambda x:x[1])[:-23:-1]
m=sorted(dict(w).values())[-1]
print" %s\n"%("_"*(76-m)),"\n".join(map(lambda x:("|%s| "+x[0])%("_"*((76-m)*x[1]/w[0][1])),w))

score 1 · Accepted Answer

パール、188文字

上記の perl バージョン (および任意の正規表現分割ベースのバージョン) は、個別のリストとしてではなく、否定的な先読みアサーションとして禁止単語のリストを含めることにより、数バイト短くすることができます。さらに、末尾のセミコロンは省略できます。

また、他の提案もいくつか含めました (- <=> の代わりに、for/foreach、ドロップされた「キー」)。

$c{$_}++for grep{$_}map{lc=~/\b(?!(?:the|and|a|of|or|i[nts]?|to)\b)[a-z]+/g}<>;@s=sort{$c{$b}-$c{$a}}%c;$f=76-length$s[0];say$"."_"x$f;say"|"."_"x($c{$_}/$c{$s[0]}*$f)."| $_ "for@s[0..21]

perl はわかりませんが、(?!(?:...)\b) は ?: を失う可能性があると思います。

score 1 · Accepted Answer

Java、ゆっくりと短くなります ( ~~1500~~ ~~1358~~ ~~1241~~ ~~1020~~ ~~913~~ 890 文字)

さらに多くの空白と var 名の長さを取り除きました。可能であればジェネリックを削除し、インラインクラスと try/catch ブロックを削除しました。私の 900 バージョンにはバグがありました。

別の try / catch ブロックを削除しました

import java.net.*;import java.util.*;import java.util.regex.*;import org.apache.commons.io.*;public class G{public static void main(String[]a)throws Exception{String text=IOUtils.toString(new URL(a[0]).openStream()).toLowerCase().replaceAll("\\b(the|and|of|to|a|i[tns]?|or)\\b","");final Map<String,Integer>p=new HashMap();Matcher m=Pattern.compile("\\b\\w+\\b").matcher(text);Integer b;while(m.find()){String w=m.group();b=p.get(w);p.put(w,b==null?1:b+1);}List<String>v=new Vector(p.keySet());Collections.sort(v,new Comparator(){public int compare(Object l,Object m){return p.get(m)-p.get(l);}});boolean t=true;float r=0;for(String w:v.subList(0,22)){if(t){t=false;r=p.get(w)/(float)(80-(w.length()+4));System.out.println(" "+new String(new char[(int)(p.get(w)/r)]).replace('\0','_'));}System.out.println("|"+new String(new char[(int)(((Integer)p.get(w))/r)]).replace('\0','_')+"|"+w);}}}

読み取り可能なバージョン:

import java.net.*;
import java.util.*;
import java.util.regex.*;
import org.apache.commons.io.*;

public class G{

    public static void main(String[] a) throws Exception{
        String text =
            IOUtils.toString(new URL(a[0]).openStream())
                .toLowerCase()
                .replaceAll("\\b(the|and|of|to|a|i[tns]?|or)\\b", "");
        final Map<String, Integer> p = new HashMap();
        Matcher m = Pattern.compile("\\b\\w+\\b").matcher(text);
        Integer b;
        while(m.find()){
            String w = m.group();
            b = p.get(w);
            p.put(w, b == null ? 1 : b + 1);
        }
        List<String> v = new Vector(p.keySet());
        Collections.sort(v, new Comparator(){

            public int compare(Object l, Object m){
                return p.get(m) - p.get(l);
            }
        });
        boolean t = true;
        float r = 0;
        for(String w : v.subList(0, 22)){
            if(t){
                t = false;
                r = p.get(w) / (float) (80 - (w.length() + 4));
                System.out.println(" "
                    + new String(new char[(int) (p.get(w) / r)]).replace('\0',
                        '_'));
            }
            System.out.println("|"
                + new String(new char[(int) (((Integer) p.get(w)) / r)]).replace('\0',
                    '_') + "|" + w);
        }
    }
}

score 1 · Accepted Answer

パイソン290、255、 253

Python で 290 文字 (標準入力から読み取ったテキスト)

import sys,re
c={}
for w in re.findall("[a-z]+",sys.stdin.read().lower()):c[w]=c.get(w,0)+1-(","+w+","in",a,i,the,and,of,to,it,in,or,is,")
r=sorted((-v,k)for k,v in c.items())[:22]
sf=max((76.0-len(k))/v for v,k in r)
print" "+"_"*int(r[0][0]*sf)
for v,k in r:print"|"+"_"*int(v*sf)+"| "+k

しかし...他のソリューションを読んだ後、効率は要求ではないことに突然気付きました。したがって、これは別の短くてはるかに遅いものです（255文字）

import sys,re
w=re.findall("\w+",sys.stdin.read().lower())
r=sorted((-w.count(x),x)for x in set(w)-set("the and of to a i it in or is".split()))[:22]
f=max((76.-len(k))/v for v,k in r)
print" "+"_"*int(f*r[0][0])
for v,k in r:print"|"+"_"*int(f*v)+"| "+k

そして、他のソリューションをさらに読んだ後...

import sys,re
w=re.findall("\w+",sys.stdin.read().lower())
r=sorted((-w.count(x),x)for x in set(w)-set("the and of to a i it in or is".split()))[:22]
f=max((76.-len(k))/v for v,k in r)
print"","_"*int(f*r[0][0])
for v,k in r:print"|"+"_"*int(f*v)+"|",k

そして今、このソリューションはバイト単位でアスタチンのソリューションとほぼ同じです:-D

score 1 · Accepted Answer

Javascript、348 文字

私が自分のものを完成させた後、私はマットからいくつかのアイデアを盗みました:3

t=prompt().toLowerCase().replace(/\b(the|and|of|to|a|i[tns]?|or)\b/gm,'');r={};o=[];t.replace(/\b([a-z]+)\b/gm,function(a,w){r[w]?++r[w]:r[w]=1});for(i in r){o.push([i,r[i]])}m=o[0][1];o=o.slice(0,22);o.sort(function(F,D){return D[1]-F[1]});for(B in o){F=o[B];L=new Array(~~(F[1]/m*(76-F[0].length))).join('_');print(' '+L+'\n|'+L+'| '+F[0]+' \n')}

印刷およびプロンプト機能のサポートが必要です。

score 1 · Accepted Answer

Object Rexx 4.0 と PC パイプ

PC-Pipesライブラリがある場所。
このソリューションでは、1 文字の単語は無視されます。


address rxpipe 'pipe (end ?) < Alice.txt',
   '|regex split /[^a-zA-Z]/', -- split at non alphbetic character
   '|locate 2',                -- discard words shorter that 2 char  
   '|xlate lower',             -- translate all words to lower case
   ,                           -- discard list words that match list
   '|regex not match /^(the||and||of||to||it||in||or||is)$/',
   '|l:lookup autoadd before count',  -- accumulate and count words
 '? l:',                       -- no master records to feed into lookup 
 '? l:',                       -- list of counted words comes here
   ,                           -- columns 1-10 hold count, 11-n hold word
   '|sort 1.10 d',             -- sort in desending order by count
   '|take 22',                 -- take first 22 records only
   '|array wordlist',          -- store into a rexx array
   '|count max',               -- get length of longest record 
   '|var maxword'              -- save into a rexx variable

parse value wordlist[1] with count 11 .  -- get frequency of first word
barunit = count % (76-(maxword-10))      -- frequency units per chart bar char

say ' '||copies('_', (count+barunit)%barunit)  -- first line of the chart
do cntwd over wordlist                    
  parse var cntwd count 11 word          -- get word frequency and the word
  say '|'||copies('_', (count+barunit)%barunit)||'| '||word||' '
end

生成された出力

________________________________________________________________________________
|________________________________________________________________________________| 彼女
|_____________________________________________________________________| あなた
|___________________________________________________________________| 言った
|__________________________________________________________| アリス
|____________________________________________________| だった
|_________________________________| それ
|________________________________________| なので
|____________________________________| 彼女
|_________________________________| で
|_________________________________| と
|______________________________| の上
|________________| 全て
|__________________________| これ
|__________________________| 為に
|__________________________| 持っていました
|__________________________| しかし
|________________________| なれ
|________________________| いいえ
|_______________________| 彼ら
|______________________| それで
|_____________________| とても
|_____________________| 何

score 1 · Accepted Answer

Martin のソリューション(min76- など)からいくつかのアイデアを借りた別の T-SQL ソリューション。

declare @ varchar(max),@w real,@j int;select s=@ into[ ]set @=(select*
from openrowset(bulk'a',single_blob)a)while @>''begin set @=stuff(@,1,
patindex('%[a-z]%',@)-1,'')+'.'set @j=patindex('%[^a-z]%',@)if @j>2insert[ ]
select lower(left(@,@j-1))set @=stuff(@,1,@j,'')end;select top(22)s,count(*)
c into # from[ ]where',the,and,of,to,it,in,or,is,'not like'%,'+s+',%'
group by s order by 2desc;select @w=min((76.-len(s))/c),@=' '+replicate(
'_',max(c)*@w)from #;select @=@+'
|'+replicate('_',c*@w)+'| '+s+' 'from #;print @

ソリューション全体は 2 行 (最初の 7 行を連結) である必要がありますが、カット、ペースト、およびそのまま実行できます。合計文字数 = 507 (Unix 形式で保存し、SQLCMD を使用して実行した場合、改行を 1 として数えます)

仮定:

一時テーブルがありません#
という名前のテーブルはありません[ ]
入力はデフォルトのシステムフォルダーにあります。C:\windows\system32\a
クエリウィンドウで 'set nocount on' がアクティブになっています (偽の「影響を受ける行」メッセージを防止します)。

そして、ソリューションのリスト (<500 文字) に入るために、483文字の「リラックスした」版があります (垂直バーなし / トップバーなし / 単語の後に末尾のスペースなし)

declare @ varchar(max),@w real,@j int;select s=@ into[ ]set @=(select*
from openrowset(bulk'b',single_blob)a)while @>''begin set @=stuff(@,1,
patindex('%[a-z]%',@)-1,'')+'.'set @j=patindex('%[^a-z]%',@)if @j>2insert[ ]
select lower(left(@,@j-1))set @=stuff(@,1,@j,'')end;select top(22)s,count(*)
c into # from[ ]where',the,and,of,to,it,in,or,is,'not like'%,'+s+',%'
group by s order by 2desc;select @w=min((78.-len(s))/c),@=''from #;select @=@+'
'+replicate('_',c*@w)+' '+s from #;print @

可読版

declare @ varchar(max), @w real, @j int
select s=@ into[ ] -- shortcut to create table; use defined variable to specify column type
-- openrowset reads an entire file
set @=(select * from openrowset(bulk'a',single_blob) a) -- a bit shorter than naming 'BulkColumn'

while @>'' begin -- loop until input is empty
    set @=stuff(@,1,patindex('%[a-z]%',@)-1,'')+'.' -- remove lead up to first A-Z char *
    set @j=patindex('%[^a-z]%',@) -- find first non A-Z char. The +'.' above makes sure there is one
    if @j>2insert[ ] select lower(left(@,@j-1)) -- insert only words >1 char
    set @=stuff(@,1,@j,'') -- remove word and trailing non A-Z char
end;

select top(22)s,count(*)c
into #
from[ ]
where ',the,and,of,to,it,in,or,is,' not like '%,'+s+',%' -- exclude list
group by s
order by 2desc; -- highest occurence, assume no ties at 22!

-- 80 - 2 vertical bars - 2 spaces = 76
-- @w = weighted frequency
-- this produces a line equal to the length of the max occurence (max(c))
select @w=min((76.-len(s))/c),@=' '+replicate('_',max(c)*@w)
from #;

-- for each word, append it as a new line. note: embedded newline
select @=@+'
|'+replicate('_',c*@w)+'| '+s+' 'from #;
-- note: 22 words in a table should always fit on an 8k page
--       the order of processing should always be the same as the insert-orderby
--       thereby producing the correct output

print @ -- output

score 0 · Accepted Answer

Python、250 文字

他のすべての Python スニペットからの借用

import re,sys
t=re.findall("\w+","".join(sys.stdin).lower())
W=sorted((-t.count(w),w)for w in set(t)-set("the and of to a i it in or is".split()))[:22]
Z,U=W[0],lambda n:"_"*int(n*(76.-len(Z[1]))/Z[0])
print"",U(Z[0])
for(n,w)in W:print"|"+U(n)+"|",w

生意気で避けるべき言葉を引数に入れると、223文字

import re,sys
t=re.findall("\w+","".join(sys.stdin).lower())
W=sorted((-t.count(w),w)for w in set(t)-set(sys.argv[1:]))[:22]
Z,U=W[0],lambda n:"_"*int(n*(76.-len(Z[1]))/Z[0])
print"",U(Z[0])
for(n,w)in W:print"|"+U(n)+"|",w