Package org.opengrok.indexer.analysis
Class FileAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.opengrok.indexer.analysis.AbstractAnalyzer
org.opengrok.indexer.analysis.FileAnalyzer
- All Implemented Interfaces:
Closeable,AutoCloseable
- Direct Known Subclasses:
BZip2Analyzer,ELFAnalyzer,GZIPAnalyzer,JarAnalyzer,JavaClassAnalyzer,TarAnalyzer,TextAnalyzer,ZipAnalyzer
Base class for all different File Analyzers.
An Analyzer for a filetype provides
- the file extensions and magic numbers it analyzes
- a lucene document listing the fields it can support
- TokenStreams for each of the field it said requires tokenizing in 2
- cross reference in HTML format
- The type of file data, plain text etc
- Author:
- Chandan
-
Nested Class Summary
Nested classes/interfaces inherited from class org.opengrok.indexer.analysis.AbstractAnalyzer
AbstractAnalyzer.GenreNested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.Analyzer.ReuseStrategy, org.apache.lucene.analysis.Analyzer.TokenStreamComponents -
Field Summary
Fields inherited from class org.opengrok.indexer.analysis.AbstractAnalyzer
countsAggregator, ctags, DUMMY_READER, factory, foldingEnabled, project, scopesEnabled, symbolTokenizerFactoryFields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY -
Constructor Summary
ConstructorsModifierConstructorDescriptionFileAnalyzer(AnalyzerFactory factory) Creates a new instance of FileAnalyzer.protectedFileAnalyzer(AnalyzerFactory factory, Supplier<JFlexTokenizer> symbolTokenizerFactory) Creates a new instance ofFileAnalyzer. -
Method Summary
Modifier and TypeMethodDescriptionprotected voidaddNumLinesLOC(org.apache.lucene.document.Document doc, NumLinesLOC counts) Add fields to store document number-of-lines and lines-of-code (LOC).voidanalyze(org.apache.lucene.document.Document doc, StreamSource src, Writer xrefOut) Analyze the contents of a source file.protected org.apache.lucene.analysis.Analyzer.TokenStreamComponentscreateComponents(String fieldName) Subclasses should override to return the case-insensitive name aligning with either a built-in Universal Ctags language name or an OpenGrok custom language name.Returns the normalized name of the analyzer, which should corresponds to the file type.final longGets a version number to be used to tag processed documents so that re-analysis can be re-done later if a stored version number is different from the current implementation.protected org.apache.lucene.analysis.TokenStreamprotected booleanwriteXref(WriteXrefArgs args) Derived classes should override to write a cross referenced HTML file for the specified args.Methods inherited from class org.opengrok.indexer.analysis.AbstractAnalyzer
getFactory, getGenre, getSpecializedVersionNo, setCountsAggregator, setCtags, setFoldingEnabled, setProject, setScopesEnabledMethods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, initReader, initReaderForNormalization, normalize, tokenStream, tokenStream
-
Constructor Details
-
FileAnalyzer
Creates a new instance of FileAnalyzer.- Parameters:
factory- defined instance for the analyzer
-
FileAnalyzer
Creates a new instance ofFileAnalyzer.- Parameters:
factory- defined instance for the analyzersymbolTokenizerFactory- a defined instance relevant for the file
-
-
Method Details
-
getCtagsLang
Description copied from class:AbstractAnalyzerSubclasses should override to return the case-insensitive name aligning with either a built-in Universal Ctags language name or an OpenGrok custom language name.- Specified by:
getCtagsLangin classAbstractAnalyzer- Returns:
nullas there is no aligned language
-
getVersionNo
public final long getVersionNo()Gets a version number to be used to tag processed documents so that re-analysis can be re-done later if a stored version number is different from the current implementation.The value is the union of a
FileAnalyzerroot version and the value fromAbstractAnalyzer.getSpecializedVersionNo(). Changing the root version affects all analyzers simultaneously; while subclasses can overrideAbstractAnalyzer.getSpecializedVersionNo()to allow changes that affect a few.- Specified by:
getVersionNoin classAbstractAnalyzer- Returns:
- (20061115_01 << 32) |
AbstractAnalyzer.getSpecializedVersionNo()
-
supportsScopes
protected boolean supportsScopes()- Specified by:
supportsScopesin classAbstractAnalyzer
-
getFileTypeName
Returns the normalized name of the analyzer, which should corresponds to the file type. Example: The analyzer for the C language (CAnalyzer) would return ācā.- Specified by:
getFileTypeNamein classAbstractAnalyzer- Returns:
- Normalized name of the analyzer.
-
analyze
public void analyze(org.apache.lucene.document.Document doc, StreamSource src, Writer xrefOut) throws IOException, InterruptedException Analyze the contents of a source file. This includes populating the Lucene document with fields to add to the index, and writing the cross-referenced data to the specified destination.- Specified by:
analyzein classAbstractAnalyzer- Parameters:
doc- the Lucene documentsrc- the input data sourcexrefOut- where to write the xref (may benull)- Throws:
IOException- if any I/O errorInterruptedException- if a timeout occurs
-
writeXref
Derived classes should override to write a cross referenced HTML file for the specified args.- Specified by:
writeXrefin classAbstractAnalyzer- Parameters:
args- a defined instance- Returns:
- the instance used to write the cross-referencing
- Throws:
IOException- if an error occurs
-
createComponents
protected org.apache.lucene.analysis.Analyzer.TokenStreamComponents createComponents(String fieldName) - Specified by:
createComponentsin classAbstractAnalyzer
-
addNumLinesLOC
Add fields to store document number-of-lines and lines-of-code (LOC).- Specified by:
addNumLinesLOCin classAbstractAnalyzer- Parameters:
doc- Document instancecounts- NumLinesLOC instance
-
normalize
protected org.apache.lucene.analysis.TokenStream normalize(String fieldName, org.apache.lucene.analysis.TokenStream in) - Specified by:
normalizein classAbstractAnalyzer
-