Class FileAnalyzer

java.lang.Object
org.apache.lucene.analysis.Analyzer
org.opengrok.indexer.analysis.AbstractAnalyzer
org.opengrok.indexer.analysis.FileAnalyzer
All Implemented Interfaces:
Closeable, AutoCloseable
Direct Known Subclasses:
BZip2Analyzer, ELFAnalyzer, GZIPAnalyzer, JarAnalyzer, JavaClassAnalyzer, TarAnalyzer, TextAnalyzer, ZipAnalyzer

public class FileAnalyzer extends AbstractAnalyzer
Base class for all different File Analyzers. An Analyzer for a filetype provides
  1. the file extensions and magic numbers it analyzes
  2. a lucene document listing the fields it can support
  3. TokenStreams for each of the field it said requires tokenizing in 2
  4. cross reference in HTML format
  5. The type of file data, plain text etc
Created on September 21, 2005
Author:
Chandan
  • Constructor Details

    • FileAnalyzer

      public FileAnalyzer(AnalyzerFactory factory)
      Creates a new instance of FileAnalyzer.
      Parameters:
      factory - defined instance for the analyzer
    • FileAnalyzer

      protected FileAnalyzer(AnalyzerFactory factory, Supplier<JFlexTokenizer> symbolTokenizerFactory)
      Creates a new instance of FileAnalyzer.
      Parameters:
      factory - defined instance for the analyzer
      symbolTokenizerFactory - a defined instance relevant for the file
  • Method Details

    • getCtagsLang

      public String getCtagsLang()
      Description copied from class: AbstractAnalyzer
      Subclasses should override to return the case-insensitive name aligning with either a built-in Universal Ctags language name or an OpenGrok custom language name.
      Specified by:
      getCtagsLang in class AbstractAnalyzer
      Returns:
      null as there is no aligned language
    • getVersionNo

      public final long getVersionNo()
      Gets a version number to be used to tag processed documents so that re-analysis can be re-done later if a stored version number is different from the current implementation.

      The value is the union of a FileAnalyzer root version and the value from AbstractAnalyzer.getSpecializedVersionNo(). Changing the root version affects all analyzers simultaneously; while subclasses can override AbstractAnalyzer.getSpecializedVersionNo() to allow changes that affect a few.

      Specified by:
      getVersionNo in class AbstractAnalyzer
      Returns:
      (20061115_01 << 32) | AbstractAnalyzer.getSpecializedVersionNo()
    • supportsScopes

      protected boolean supportsScopes()
      Specified by:
      supportsScopes in class AbstractAnalyzer
    • getFileTypeName

      public String getFileTypeName()
      Returns the normalized name of the analyzer, which should corresponds to the file type. Example: The analyzer for the C language (CAnalyzer) would return ā€œcā€.
      Specified by:
      getFileTypeName in class AbstractAnalyzer
      Returns:
      Normalized name of the analyzer.
    • analyze

      public void analyze(org.apache.lucene.document.Document doc, StreamSource src, Writer xrefOut) throws IOException, InterruptedException
      Analyze the contents of a source file. This includes populating the Lucene document with fields to add to the index, and writing the cross-referenced data to the specified destination.
      Specified by:
      analyze in class AbstractAnalyzer
      Parameters:
      doc - the Lucene document
      src - the input data source
      xrefOut - where to write the xref (may be null)
      Throws:
      IOException - if any I/O error
      InterruptedException - if a timeout occurs
    • writeXref

      public Xrefer writeXref(WriteXrefArgs args) throws IOException
      Derived classes should override to write a cross referenced HTML file for the specified args.
      Specified by:
      writeXref in class AbstractAnalyzer
      Parameters:
      args - a defined instance
      Returns:
      the instance used to write the cross-referencing
      Throws:
      IOException - if an error occurs
    • createComponents

      protected org.apache.lucene.analysis.Analyzer.TokenStreamComponents createComponents(String fieldName)
      Specified by:
      createComponents in class AbstractAnalyzer
    • addNumLinesLOC

      protected void addNumLinesLOC(org.apache.lucene.document.Document doc, NumLinesLOC counts)
      Add fields to store document number-of-lines and lines-of-code (LOC).
      Specified by:
      addNumLinesLOC in class AbstractAnalyzer
      Parameters:
      doc - Document instance
      counts - NumLinesLOC instance
    • normalize

      protected org.apache.lucene.analysis.TokenStream normalize(String fieldName, org.apache.lucene.analysis.TokenStream in)
      Specified by:
      normalize in class AbstractAnalyzer