java - .tsv ファイルの読み取り中に代替行をスキップする

Question

最後に 39 列の .tsv ファイルがありますが、1 つの列には 100,000 文字を超える長さの文字列としてデータがあります。

何が起こっているかというと、1行目を読んだ後、3行目、5行目、7行目に進みます。すべての行に同じデータがありますが、取得しているログに従ってください

lineNo=3, rowNo=2, customer=503837-100 , last but one cell length=111275
lineNo=5, rowNo=3, customer=503837-100 , last but one cell length=111275
lineNo=7, rowNo=4, customer=503837-100 , last but one cell length=111275
lineNo=9, rowNo=5, customer=503837-100 , last but one cell length=111275
lineNo=11, rowNo=6, customer=503837-100 , last but one cell length=111275
lineNo=13, rowNo=7, customer=503837-100 , last but one cell length=111275
lineNo=15, rowNo=8, customer=503837-100 , last but one cell length=111275
lineNo=17, rowNo=9, customer=503837-100 , last but one cell length=111275
lineNo=19, rowNo=10, customer=503837-100 , last but one cell length=111275

以下は私のコードです：

import java.io.FileReader;
import org.supercsv.cellprocessor.Optional;
import org.supercsv.cellprocessor.constraint.NotNull;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.io.CsvBeanReader;
import org.supercsv.io.ICsvBeanReader;
import org.supercsv.prefs.CsvPreference;

public class readWithCsvBeanReader {
    public static void main(String[] args) throws Exception{
        readWithCsvBeanReader();
    }


private static void readWithCsvBeanReader() throws Exception {

    ICsvBeanReader beanReader = null;

    try {

        beanReader = new CsvBeanReader(new FileReader("C:\MAP TSV\abc.tsv"), CsvPreference.TAB_PREFERENCE);
        // the header elements are used to map the values to the bean (names must match)
        final String[] header = beanReader.getHeader(true);
        final CellProcessor[] processors = getProcessors();
        TSVReaderBrandDTO tsvReaderBrandDTO = new TSVReaderBrandDTO();

        int i = 0;
        int last = 0;

        while( (tsvReaderBrandDTO = beanReader.read(TSVReaderBrandDTO.class, header, processors)) != null ) {
            if(null == tsvReaderBrandDTO.getPage_cache()){
                last = 0;
            }
            else{
                last = tsvReaderBrandDTO.getPage_cache().length();
            }
            System.out.println(String.format("lineNo=%s, rowNo=%s, customer=%s , last but one cell length=%s", beanReader.getLineNumber(),
                beanReader.getRowNumber(), tsvReaderBrandDTO.getUnique_ID(), last));
            i++;
        }

        System.out.println("Number of rows : "+i);

    }
    finally {
        if( beanReader != null ) {
            beanReader.close();
        }
    }
}

private static CellProcessor[] getProcessors() {

    final CellProcessor[] processors = new CellProcessor[] { 
         new Optional(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new NotNull(), new NotNull(),
         new NotNull(), new NotNull(), new NotNull(), new Optional()};

        return processors;
    }
}

どこが間違っているのか教えてください

score 0 · Accepted Answer

http://supercsv.sourceforge.net/examples_reading.htmlを確認しました。Example CSV fileとOutputをよく見てください。行にエスケープされていない"(ダブルアポストロフィ) 文字が含まれているため、パーサーはデータレコードが 2 つの物理行にまたがっていると見なす可能性はありませんか?

二重引用符として二重アポストロフィ文字を使用しない場合は、CsvPreference を変更できます。http://supercsv.sourceforge.net/apidocs/org/supercsv/prefs/CsvPreference.html を参照してください。引用符とみなされない:

CsvPreference MY_PREFERENCES = new CsvPreference.Builder(
    SOME_NEVER_USED_CHARACTER, ',', "\r\n").build();

もちろん、タブ区切りの CSV の場合は、次のようなものを使用します。

CsvPreference MY_PREFERENCES = new CsvPreference.Builder(
    SOME_NEVER_USED_CHARACTER, '\t', "\r\n").build();

Builder の署名については javadoc を参照し、CsvPreferenceそれに応じて実際の値を修正してください。

java - .tsv ファイルの読み取り中に代替行をスキップする

2 に答える 2

Related

Reference