Package org.opengrok.indexer.analysis
Class TextAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.opengrok.indexer.analysis.AbstractAnalyzer
org.opengrok.indexer.analysis.FileAnalyzer
org.opengrok.indexer.analysis.TextAnalyzer
- All Implemented Interfaces:
Closeable
,AutoCloseable
- Direct Known Subclasses:
MandocAnalyzer
,PlainAnalyzer
,TroffAnalyzer
,UuencodeAnalyzer
,XMLAnalyzer
-
Nested Class Summary
Nested classes/interfaces inherited from class org.opengrok.indexer.analysis.AbstractAnalyzer
AbstractAnalyzer.Genre
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.Analyzer.ReuseStrategy, org.apache.lucene.analysis.Analyzer.TokenStreamComponents
-
Field Summary
Fields inherited from class org.opengrok.indexer.analysis.AbstractAnalyzer
countsAggregator, ctags, DUMMY_READER, factory, foldingEnabled, project, scopesEnabled, symbolTokenizerFactory
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprotected
TextAnalyzer
(AnalyzerFactory factory) Creates a new instance ofTextAnalyzer
.protected
TextAnalyzer
(AnalyzerFactory factory, Supplier<JFlexTokenizer> symbolTokenizerFactory) Creates a new instance ofTextAnalyzer
. -
Method Summary
Modifier and TypeMethodDescriptionprotected Reader
getReader
(InputStream stream) Gets a BOM-strippedReader
(default UTF-8 charset) of the specifiedstream
, wrapped in aZeroReader
.protected int
Gets a version number to be used to tag processed documents so that re-analysis can be re-done later if a stored version number is different from the current implementation.protected abstract Xrefer
Derived classes should implement to create an xref for the language supported by this analyzer.writeXref
(WriteXrefArgs args) Write a cross referenced HTML file reads the source from in.Methods inherited from class org.opengrok.indexer.analysis.FileAnalyzer
addNumLinesLOC, analyze, createComponents, getCtagsLang, getFileTypeName, getVersionNo, normalize, supportsScopes
Methods inherited from class org.opengrok.indexer.analysis.AbstractAnalyzer
getFactory, getGenre, setCountsAggregator, setCtags, setFoldingEnabled, setProject, setScopesEnabled
Methods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, initReader, initReaderForNormalization, normalize, tokenStream, tokenStream
-
Constructor Details
-
TextAnalyzer
Creates a new instance ofTextAnalyzer
.- Parameters:
factory
- defined instance for the analyzer
-
TextAnalyzer
Creates a new instance ofTextAnalyzer
.- Parameters:
factory
- defined instance for the analyzersymbolTokenizerFactory
- defined instance for the analyzer
-
-
Method Details
-
getSpecializedVersionNo
protected int getSpecializedVersionNo()Gets a version number to be used to tag processed documents so that re-analysis can be re-done later if a stored version number is different from the current implementation.- Overrides:
getSpecializedVersionNo
in classAbstractAnalyzer
- Returns:
- 20171223_00
-
writeXref
Write a cross referenced HTML file reads the source from in.- Overrides:
writeXref
in classFileAnalyzer
- Parameters:
args
- a defined instance- Returns:
- the instance used to write the cross-referencing
- Throws:
IOException
- if an I/O error occurs
-
newXref
Derived classes should implement to create an xref for the language supported by this analyzer.- Parameters:
reader
- the data to produce xref for- Returns:
- an xref instance
-
getReader
Gets a BOM-strippedReader
(default UTF-8 charset) of the specifiedstream
, wrapped in aZeroReader
.- Parameters:
stream
- input stream- Returns:
- Reader instance
- Throws:
IOException
-