ファイルのコンテンツをトークン化してコンテンツを表示するプログラムですが、いくつかの用語がマージされて表示されていますか?
import java.io.*;
import java.util.*;
class JavaApplication1
{
static HashMap<String,Integer>hTable=new HashMap<String,Integer>();
static int word,uwords,oncewords;
public static void main(String args[])throws IOException
{
File folder=new File(File.txt);
File[] lFile=folder.listFiles();
int len=lFile.length;
for(int i=0 ;i<1 ;i++) {
File file=lFile[i];
if(file.isFile()) {
Scanner scanner=new Scanner(file);
String line = null;
StringBuilder sb = new StringBuilder();
while(scanner.hasNextLine()) {
line=scanner.nextLine();
sb.append(line);
}
// StringTokenizer st=new StringTokenizer(sb.toString(),"</>,?.[/]=()+|");
StringTokenizer st=new StringTokenizer(sb.toString()," </DOC>.,TITLE-\n");
//System.out.println("*************************");
while(st.hasMoreTokens())
{
String next=st.nextToken();
word=word+1;
if(hTable.containsKey(next))
{
int a=hTable.get(next);
hTable.put(next, a+1);
uwords++;
}
else
{
hTable.put(next,1);
System.out.println(next);
oncewords++;
}
}
}
}
System.out.println("Total number of tokens in the database is"+word);
System.out.println("Total number of tokens that are unique in the database are "+ uwords);
System.out.println("Total number of tokens that occur only once in the database is" +oncewords);
int count=0;
Collection <Integer> setofvalues=hTable.values();
Object[] Varr=setofvalues.toArray();
Arrays.sort(Varr,Collections.reverseOrder());
Set<Object> Set1 = new LinkedHashSet<Object>(Arrays.asList(Varr));
for (Object i:Set1)
{
for (Map.Entry<String, Integer> entry : hTable.entrySet())
{
/* if (i.equals(entry.getValue())&&count<30)
{
System.out.println(entry.getKey()+ "=" +entry.getValue());
count=count+1;
}*/
}
}
int avg=(word/len);
System.out.println("The average number of tokens per document" +avg);
}
}
and contents of file are:
<DOC>
<DOCNO>
1
</DOCNO>
<TITLE>
experimental investigation of the aerodynamics of a
wing in a slipstream .
</TITLE>
<AUTHOR>
brenckman,m.
</AUTHOR>
<BIBLIO>
j. ae. scs. 25, 1958, 324.
</BIBLIO>
<TEXT>
an experimental study of a wing in a propeller slipstream was
made in order to determine the spanwise distribution of the lift
increase due to slipstream at different angles of attack of the wing
and at different free stream to slipstream velocity ratios . the
results were intended in part as an evaluation basis for different
theoretical treatments of this problem .
the comparative span loading curves, together with supporting
evidence, showed that a substantial part of the lift increment
produced by the slipstream was due to a /destalling/ or boundary-layer-control
effect . the integrated remaining lift increment,
after subtracting this destalling lift, was found to agree
well with a potential flow theory .
an empirical evaluation of the destalling effects was made for
the specific configuration of the experiment .
</TEXT>
</DOC>
and the output is:
N
1
experimental
investigation
of
the
aerodynamics
awing
in
a
slipstream
AU
H
R
brenckman
m
B
j
ae
scs
25
1958
324
X
an
study
wing
propeller
wasmade
order
to
determine
spanwise
distribution
liftincrease
due
at
different
angles
attack
wingand
free
stream
velocity
ratios
theresults
were
intended
part
as
evaluation
basis
for
differenttheoretical
treatments
this
problem
comparative
span
loading
curves
together
with
supportingevidence
showed
that
substantial
lift
incrementproduced
by
was
destalling
or
boundary
layer
controleffect
integrated
remaining
increment
after
subtracting
found
agreewell
potential
flow
theory
empirical
effects
made
forthe
specific
configuration
experiment
Total number of tokens in the database is151
Total number of tokens that are unique in the database are 58
Total number of tokens that occur only once in the database is93