java - 複数の入力ファイル用の HTML パーサー

Question

一度に複数の html ファイルを選択し、html パーサーを使用してテキストのみを抽出したいのですが、各 html ファイルは個別のテキストファイルを作成します。このためのJavaコードを提案できますか。

`FileReader f0 = new FileReader("j.html");
StringBuilder sb = new StringBuilder();
 BufferedReader br = new BufferedReader(f0);
while((temp1=br.readLine())!=null)
  { sb.append(temp1); }
String para = sb.toString().replaceAll("<br>","\n");
String textonly = Jsoup.parse(para).text();
System.out.println(textonly);
FileWriter f1=new FileWriter("j.txt");
char buf1[] = new char[textonly.length()];
textonly.getChars(0,textonly.length(),buf1,0);
for(i=0;i<buf1.length;i++) {
 if(buf1[i]=='\n')
f1.write("\r\n");
f1.write(buf1[i]);
}`

私はこのコードを持っていますが、一度に 1 つのファイルしか使用していません。複数のファイルを選択したいです。

score 0 · Accepted Answer

コードをループに入れることはできませんか? 次のようなもの（テストされていません）：

// loop over files you want to change
for (int i = 1; i < 1000; i++) {
   FileReader f0 = new FileReader(i + ".html");
   StringBuilder sb = new StringBuilder();
   BufferedReader br = new BufferedReader(f0);
   while((temp1=br.readLine())!=null) { 
      sb.append(temp1); 
   }
   String para = sb.toString().replaceAll("<br>","\n");
   String textonly = Jsoup.parse(para).text();
   System.out.println(textonly);
   // stick .txt on the end of the filename to write out
   FileWriter f1=new FileWriter(i + ".txt"); 
   char buf1[] = new char[textonly.length()];
   textonly.getChars(0,textonly.length(),buf1,0);
   for(i=0;i<buf1.length;i++) {
      if(buf1[i]=='\n') {
         f1.write("\r\n");
      }
      f1.write(buf1[i]);
   }

java - 複数の入力ファイル用の HTML パーサー

1 に答える 1

Related

Reference