其他分享
首页 > 其他分享> > lucene核心代码分析报告14

lucene核心代码分析报告14

作者:互联网

2021SC@SDUSC

生成新的段信息对象

代码如下:

newSegment = new SegmentInfo(segment, flushedDocCount, directory, false, true, docStoreOffset, 
docStoreSegment, docStoreIsCompoundFile, docWriter.hasProx()); 
segmentInfos.add(newSegment);

准备删除文档

代码:

docWriter.pushDeletes(); 
 --> deletesFlushed.update(deletesInRAM); 

此处将 deletesInRAM 全部加到 deletesFlushed 中,并把 deletesInRAM 清空。原因上面已经阐明。

生成 cfs 段

代码:

docWriter.createCompoundFile(segment); 
newSegment.setUseCompoundFile(true); 
DocumentsWriter.createCompoundFile(String segment) { 
 CompoundFileWriter cfsWriter = new CompoundFileWriter(directory, segment + "." + 
IndexFileNames.COMPOUND_FILE_EXTENSION); 
 //将上述中记录的文档名全部加入 cfs 段的写对象。
 for (final String flushedFile : flushState.flushedFiles) 
 cfsWriter.addFile(flushedFile); 
 cfsWriter.close(); 
 } 

删除文档

代码:

applyDeletes(); 
boolean applyDeletes(SegmentInfos infos) { 
 if (!hasDeletes()) 
 return false; 
 final int infosEnd = infos.size(); 
 int docStart = 0; 
 boolean any = false; 
 for (int i = 0; i < infosEnd; i++) { 
 assert infos.info(i).dir == directory; 
 SegmentReader reader = writer.readerPool.get(infos.info(i), false); 
 try { 
 any |= applyDeletes(reader, docStart); 
 docStart += reader.maxDoc(); 
 } finally { 
 writer.readerPool.release(reader); 
 } 
 } 
 deletesFlushed.clear(); 
 return any; 
} 

Lucene 删除文档可以用 reader,也可以用 writer,但是归根结底还是用 reader 来删除的。

reader 的删除有以下三种方式:
按照词删除,删除所有包含此词的文档。
按照文档号删除。
按照查询对象删除,删除所有满足此查询的文档。
但是这三种方式归根结底还是按照文档号删除,也就是写.del 文件的过程。

private final synchronized boolean applyDeletes(IndexReader reader, int docIDStart) 
 throws CorruptIndexException, IOException { 
 final int docEnd = docIDStart + reader.maxDoc(); 
 boolean any = false; 
 //按照词删除,删除所有包৿此词的文档。
 TermDocs docs = reader.termDocs(); 
 try { 
 for (Entry<Term, BufferedDeletes.Num> entry: deletesFlushed.terms.entrySet()) { 
 Term term = entry.getKey(); 
 docs.seek(term); 
 int limit = entry.getValue().getNum(); 
 while (docs.next()) { 
 int docID = docs.doc(); 
 if (docIDStart+docID >= limit) 
 break; 
 reader.deleteDocument(docID); 
 any = true; 
 } 
 } 
 } finally { 
 docs.close(); 
 } 
 //按照文档号删除。
 for (Integer docIdInt : deletesFlushed.docIDs) { 
 int docID = docIdInt.intValue(); 
 if (docID >= docIDStart && docID < docEnd) { 
 reader.deleteDocument(docID-docIDStart); 
 any = true; 
 } 
 } 
 //按照查询对象删除,删除所有满足此查询的文档。
 IndexSearcher searcher = new IndexSearcher(reader); 
 for (Entry<Query, Integer> entry : deletesFlushed.queries.entrySet()) { 
 Query query = entry.getKey(); 
 int limit = entry.getValue().intValue(); 
 Weight weight = query.weight(searcher); 
 Scorer scorer = weight.scorer(reader, true, false); 
 if (scorer != null) { 
 while(true) { 
 int doc = scorer.nextDoc(); 
 if (((long) docIDStart) + doc >= limit) 
 break; 
 reader.deleteDocument(doc); 
 any = true; 
 } 
 } 
 } 
 searcher.close(); 
 return any; 
}

标签:14,删除,int,代码,lucene,文档,reader,true,any
来源: https://blog.csdn.net/Embers_Young/article/details/121598434