java.lang.Object

org.apache.lucene.analysis.Analyzer

org.opengrok.indexer.analysis.AbstractAnalyzer

org.opengrok.indexer.analysis.FileAnalyzer

All Implemented Interfaces:: Closeable, AutoCloseable

Direct Known Subclasses:: BZip2Analyzer, ELFAnalyzer, GZIPAnalyzer, JarAnalyzer, JavaClassAnalyzer, TarAnalyzer, TextAnalyzer, ZipAnalyzer

public class FileAnalyzer extends AbstractAnalyzer

Base class for all different File Analyzers. An Analyzer for a filetype provides

the file extensions and magic numbers it analyzes
a lucene document listing the fields it can support
TokenStreams for each of the field it said requires tokenizing in 2
cross reference in HTML format
The type of file data, plain text etc

Created on September 21, 2005

Author:: Chandan

Nested Class Summary

Nested classes/interfaces inherited from class org.opengrok.indexer.analysis.AbstractAnalyzer
AbstractAnalyzer.Genre

Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.Analyzer.ReuseStrategy, org.apache.lucene.analysis.Analyzer.TokenStreamComponents
Field Summary

Fields inherited from class org.opengrok.indexer.analysis.AbstractAnalyzer
countsAggregator, ctags, DUMMY_READER, factory, foldingEnabled, project, scopesEnabled, symbolTokenizerFactory

Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
Constructor Summary

Constructors

Modifier

Constructor

Description

FileAnalyzer(AnalyzerFactory factory)

Creates a new instance of FileAnalyzer.

protected

FileAnalyzer(AnalyzerFactory factory, Supplier<JFlexTokenizer> symbolTokenizerFactory)

Creates a new instance of FileAnalyzer.
Method Summary

Modifier and Type

Method

Description

protected void

addNumLinesLOC(org.apache.lucene.document.Document doc, NumLinesLOC counts)

Add fields to store document number-of-lines and lines-of-code (LOC).

void

analyze(org.apache.lucene.document.Document doc, StreamSource src, Writer xrefOut)

Analyze the contents of a source file.

protected org.apache.lucene.analysis.Analyzer.TokenStreamComponents

createComponents(String fieldName)

String

getCtagsLang()

Subclasses should override to return the case-insensitive name aligning with either a built-in Universal Ctags language name or an OpenGrok custom language name.

String

getFileTypeName()

Returns the normalized name of the analyzer, which should corresponds to the file type.

final long

getVersionNo()

Gets a version number to be used to tag processed documents so that re-analysis can be re-done later if a stored version number is different from the current implementation.

protected org.apache.lucene.analysis.TokenStream

normalize(String fieldName, org.apache.lucene.analysis.TokenStream in)

protected boolean

supportsScopes()

Xrefer

writeXref(WriteXrefArgs args)

Derived classes should override to write a cross referenced HTML file for the specified args.

Methods inherited from class org.opengrok.indexer.analysis.AbstractAnalyzer
getFactory, getGenre, getSpecializedVersionNo, setCountsAggregator, setCtags, setFoldingEnabled, setProject, setScopesEnabled

Methods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, initReader, initReaderForNormalization, normalize, tokenStream, tokenStream

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- FileAnalyzer
  
  public FileAnalyzer(AnalyzerFactory factory)
  
  Creates a new instance of FileAnalyzer.
  
  Parameters:
  
  factory - defined instance for the analyzer
- FileAnalyzer
  
  protected FileAnalyzer(AnalyzerFactory factory, Supplier<JFlexTokenizer> symbolTokenizerFactory)
  
  Creates a new instance of FileAnalyzer.
  
  Parameters:
  
  factory - defined instance for the analyzer
  
  symbolTokenizerFactory - a defined instance relevant for the file
Method Details
- getCtagsLang
  
  public String getCtagsLang()
  
  Description copied from class: AbstractAnalyzer
  
  Subclasses should override to return the case-insensitive name aligning with either a built-in Universal Ctags language name or an OpenGrok custom language name.
  
  Specified by:
  
  getCtagsLang in class AbstractAnalyzer
  
  Returns:
  
  null as there is no aligned language
- getVersionNo
  
  public final long getVersionNo()
  
  Gets a version number to be used to tag processed documents so that re-analysis can be re-done later if a stored version number is different from the current implementation.
  The value is the union of a FileAnalyzer root version and the value from AbstractAnalyzer.getSpecializedVersionNo(). Changing the root version affects all analyzers simultaneously; while subclasses can override AbstractAnalyzer.getSpecializedVersionNo() to allow changes that affect a few.
  
  Specified by:
  
  getVersionNo in class AbstractAnalyzer
  
  Returns:
  
  (20061115_01 << 32) | AbstractAnalyzer.getSpecializedVersionNo()
- supportsScopes
  
  protected boolean supportsScopes()
  
  Specified by:
  
  supportsScopes in class AbstractAnalyzer
- getFileTypeName
  
  public String getFileTypeName()
  
  Returns the normalized name of the analyzer, which should corresponds to the file type. Example: The analyzer for the C language (CAnalyzer) would return “c”.
  
  Specified by:
  
  getFileTypeName in class AbstractAnalyzer
  
  Returns:
  
  Normalized name of the analyzer.
- analyze
  
  public void analyze(org.apache.lucene.document.Document doc, StreamSource src, Writer xrefOut) throws IOException, InterruptedException
  
  Analyze the contents of a source file. This includes populating the Lucene document with fields to add to the index, and writing the cross-referenced data to the specified destination.
  
  Specified by:
  
  analyze in class AbstractAnalyzer
  
  Parameters:
  
  doc - the Lucene document
  
  src - the input data source
  
  xrefOut - where to write the xref (may be null)
  
  Throws:
  
  IOException - if any I/O error
  
  InterruptedException - if a timeout occurs
- writeXref
  
  public Xrefer writeXref(WriteXrefArgs args) throws IOException
  
  Derived classes should override to write a cross referenced HTML file for the specified args.
  
  Specified by:
  
  writeXref in class AbstractAnalyzer
  
  Parameters:
  
  args - a defined instance
  
  Returns:
  
  the instance used to write the cross-referencing
  
  Throws:
  
  IOException - if an error occurs
- createComponents
  
  protected org.apache.lucene.analysis.Analyzer.TokenStreamComponents createComponents(String fieldName)
  
  Specified by:
  
  createComponents in class AbstractAnalyzer
- addNumLinesLOC
  
  protected void addNumLinesLOC(org.apache.lucene.document.Document doc, NumLinesLOC counts)
  
  Add fields to store document number-of-lines and lines-of-code (LOC).
  
  Specified by:
  
  addNumLinesLOC in class AbstractAnalyzer
  
  Parameters:
  
  doc - Document instance
  
  counts - NumLinesLOC instance
- normalize
  
  protected org.apache.lucene.analysis.TokenStream normalize(String fieldName, org.apache.lucene.analysis.TokenStream in)
  
  Specified by:
  
  normalize in class AbstractAnalyzer

Class FileAnalyzer

Nested Class Summary

Nested classes/interfaces inherited from class org.opengrok.indexer.analysis.AbstractAnalyzer

Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer

Field Summary

Fields inherited from class org.opengrok.indexer.analysis.AbstractAnalyzer

Fields inherited from class org.apache.lucene.analysis.Analyzer

Constructor Summary

Method Summary

Methods inherited from class org.opengrok.indexer.analysis.AbstractAnalyzer

Methods inherited from class org.apache.lucene.analysis.Analyzer

Methods inherited from class java.lang.Object

Constructor Details

FileAnalyzer

FileAnalyzer

Method Details

getCtagsLang

getVersionNo

supportsScopes

getFileTypeName

analyze

writeXref

createComponents

addNumLinesLOC

normalize