java-如何使用supercsv跳过仅空白行和具有可变列的行
作者:互联网
我正在处理CSV解析器要求,并且正在使用supercsv解析器库.我的CSV文件可以有25列(由tab(|)分隔)和最多100k的行以及附加的标题行.
我想忽略仅空白行和少于25列的行.
我正在使用具有名称映射(将csv值设置为pojo)和字段处理器(以处理验证)的IcvBeanReader来读取文件.
我假设Supercsv IcvBeanReader默认情况下会跳过空格行.但是,如果一行包含少于25个列号,该如何处理?
解决方法:
您可以通过编写自己的Tokenizer轻松地做到这一点.
例如,以下Tokenizer的行为与默认行为相同,但是将跳过没有正确列数的任何行.
public class SkipBadColumnCountTokenizer extends Tokenizer {
private final int expectedColumns;
private final List<Integer> ignoredLines = new ArrayList<>();
public SkipBadColumnCountTokenizer(Reader reader,
CsvPreference preferences, int expectedColumns) {
super(reader, preferences);
this.expectedColumns = expectedColumns;
}
@Override
public boolean readColumns(List<String> columns) throws IOException {
boolean moreInputExists;
while ((moreInputExists = super.readColumns(columns)) &&
columns.size() != this.expectedColumns){
System.out.println(String.format("Ignoring line %s with %d columns: %s", getLineNumber(), columns.size(), getUntokenizedRow()));
ignoredLines.add(getLineNumber());
}
return moreInputExists;
}
public List<Integer> getIgnoredLines(){
return this.ignoredLines;
}
}
以及使用此Tokenizer进行的简单测试…
@Test
public void testInvalidRows() throws IOException {
String input = "column1,column2,column3\n" +
"has,three,columns\n" +
"only,two\n" +
"one\n" +
"three,columns,again\n" +
"one,too,many,columns";
CsvPreference preference = CsvPreference.EXCEL_PREFERENCE;
int expectedColumns = 3;
SkipBadColumnCountTokenizer tokenizer = new SkipBadColumnCountTokenizer(
new StringReader(input), preference, expectedColumns);
try (ICsvBeanReader beanReader = new CsvBeanReader(tokenizer, preference)) {
String[] header = beanReader.getHeader(true);
TestBean bean;
while ((bean = beanReader.read(TestBean.class, header)) != null){
System.out.println(bean);
}
System.out.println(String.format("Ignored lines: %s", tokenizer.getIgnoredLines()));
}
}
打印以下输出(注意如何跳过所有无效行):
TestBean{column1='has', column2='three', column3='columns'}
Ignoring line 3 with 2 columns: only,two
Ignoring line 4 with 1 columns: one
TestBean{column1='three', column2='columns', column3='again'}
Ignoring line 6 with 4 columns: one,too,many,columns
Ignored lines: [3, 4, 6]
标签:opencsv,java,csv,supercsv 来源: https://codeday.me/bug/20191013/1911028.html