java - 行末の混乱

Question

一度に1文字ずつファイルを読み取り、単語を作成するJavaを使用して単純なパーサーを作成しました。

Linuxで実行しようとしましたが、検索が'\n'機能しないことに気付きました。文字と値を比較すると、10期待どおりに機能しますが。ASCIIテーブルによると、値10はLF（ラインフィード）です。私はどこかで（どこを覚えていませんが）、Javaはを探すだけで改行を見つけることができるはずだと読みました'\n'。

私が使用BufferedReaderしているのは、read文字を読み取る方法です。

編集

readLine他の問題が発生するため使用できません

Linuxでmac/windowsファイルの末尾が付いたファイルを使用していると、問題が発生するようです。

score 2 · Accepted Answer

readLine()行ごとにテキストを読むために使用します

例

FileInputStream fstream = new FileInputStream("textfile.txt");
  // Get the object of DataInputStream
  DataInputStream in = new DataInputStream(fstream);
  BufferedReader br = new BufferedReader(new InputStreamReader(in));
  String strLine;
  //Read File Line By Line
  while ((strLine = br.readLine()) != null)   {
  // Print the content on the console
  System.out.println (strLine);
  }
  //Close the input stream
  in.close();
    }catch (Exception e){//Catch exception if any
  System.err.println("Error: " + e.getMessage());
  }

score 1 · Accepted Answer

ここにそれを行うことができる2つの方法があります

1-行ごとに読み取りを使用し、正規表現を使用してそれぞれを分割して単一の単語を取得します

2-独自のisDelimiterメソッドを作成し、それを使用して分割条件に到達したかどうかを確認します

package misctests;

import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertNotNull;
import java.util.ArrayList;
import java.util.List;
import org.junit.Test;


public class SplitToWords {

    String someWords = "Lorem ipsum\r\n(dolor@sit)amet,\nconsetetur!\rsadipscing'elitr;sed~diam";
    String delimsRegEx = "[\\s;,\\(\\)!'@~]+";
    String delimsPlain = ";,()!'@~"; // without whitespaces

    String[] expectedWords = {
        "Lorem",
        "ipsum",
        "dolor",
        "sit",
        "amet",
        "consetetur",
        "sadipscing",
        "elitr",
        "sed",
        "diam"
    };

    private static final class StringReader {
        String input = null;
        int pos = 0;
        int len = 0;
        StringReader(String input) {
            this.input = input == null ? "" : input;
            len = this.input.length();
        }

        public boolean hasMoreChars() {
            return pos < len;
        }

        public int read() {
            return hasMoreChars() ? ((int) input.charAt(pos++)) : 0;
        }
    }

    @Test
    public void splitToWords_1() {
        String[] actual = someWords.split(delimsRegEx);
        assertEqualsWords(expectedWords, actual);
    }

    @Test
    public void splitToWords_2() {
        StringReader sr = new StringReader(someWords);
        List<String> words = new ArrayList<String>();
        StringBuilder sb = null;
        int c = 0;
        while(sr.hasMoreChars()) {
            c = sr.read();
            while(sr.hasMoreChars() && isDelimiter(c)) {
                c = sr.read();
            }
            sb = new StringBuilder();
            while(sr.hasMoreChars() && ! isDelimiter(c)) {
                sb.append((char)c);
                c = sr.read();
            }
            if(! isDelimiter(c)) {
                sb.append((char)c);
            }
            words.add(sb.toString());
        }

        String[] actual = new String[words.size()];
        words.toArray(actual);

        assertEqualsWords(expectedWords, actual);
    }

    private boolean isDelimiter(int c) {
        return (Character.isWhitespace(c) ||
            delimsPlain.contains(new String(""+(char)c))); // this part is subject for optimization
    }

    private void assertEqualsWords(String[] expected, String[] actual) {
        assertNotNull(expected);
        assertNotNull(actual);
        assertEquals(expected.length, actual.length);
        for(int i = 0; i < expected.length; i++) {
            assertEquals(expected[i], actual[i]);
        }
    }
}

score 1 · Accepted Answer

ファイルをバイトごとに読み取る場合、Linux の場合は「\n」、Windows の場合は「\r\n」、Mac の場合は「\r」の 3 つのケースすべてに注意する必要があります。

代わりにメソッド readLine を使用してください。これらの処理は自動的に行われ、ターミネータのない行のみが返されます。各行を読んだ後、それをトークン化して単一の単語を取得できます。

また、システムプロパティ「line.separator」を使用することも検討してください。それは常にシステム依存の行ターミネータを保持します魔女は少なくともあなたのコード (生成されたファイルではない) をよりポータルにします。

java - 行末の混乱

編集

3 に答える 3

Related

Reference