java.lang.Object
org.apache.lucene.analysis.ja.dict.TokenInfoMorphData
All Implemented Interfaces:
JaMorphData, MorphData
Direct Known Subclasses:
UnknownMorphData

class TokenInfoMorphData extends Object implements JaMorphData
Morphological information for system dictionary.
  • Field Details

    • buffer

      private final ByteBuffer buffer
    • posDict

      private final String[] posDict
    • inflTypeDict

      private final String[] inflTypeDict
    • inflFormDict

      private final String[] inflFormDict
    • HAS_BASEFORM

      public static final int HAS_BASEFORM
      flag that the entry has baseform data. otherwise it's not inflected (same as surface form)
      See Also:
    • HAS_READING

      public static final int HAS_READING
      flag that the entry has reading data. otherwise reading is surface form converted to katakana
      See Also:
    • HAS_PRONUNCIATION

      public static final int HAS_PRONUNCIATION
      flag that the entry has pronunciation data. otherwise pronunciation is the reading
      See Also:
  • Constructor Details

  • Method Details

    • populatePosDict

      private static void populatePosDict(DataInput in, int posSize, String[] posDict, String[] inflTypeDict, String[] inflFormDict) throws IOException
      Throws:
      IOException
    • getLeftId

      public int getLeftId(int morphId)
      Description copied from interface: MorphData
      Get left id of specified word
      Specified by:
      getLeftId in interface MorphData
      Returns:
      left id
    • getRightId

      public int getRightId(int morphId)
      Description copied from interface: MorphData
      Get right id of specified word
      Specified by:
      getRightId in interface MorphData
      Returns:
      right id
    • getWordCost

      public int getWordCost(int morphId)
      Description copied from interface: MorphData
      Get word cost of specified word
      Specified by:
      getWordCost in interface MorphData
      Returns:
      word's cost
    • getBaseForm

      public String getBaseForm(int morphId, char[] surfaceForm, int off, int len)
      Description copied from interface: JaMorphData
      Get base form of word
      Specified by:
      getBaseForm in interface JaMorphData
      Parameters:
      morphId - word ID of token
      Returns:
      Base form (only different for inflected words, otherwise null)
    • getReading

      public String getReading(int morphId, char[] surface, int off, int len)
      Description copied from interface: JaMorphData
      Get reading of tokens
      Specified by:
      getReading in interface JaMorphData
      Parameters:
      morphId - word ID of token
      Returns:
      Reading of the token
    • getPartOfSpeech

      public String getPartOfSpeech(int morphId)
      Description copied from interface: JaMorphData
      Get Part-Of-Speech of tokens
      Specified by:
      getPartOfSpeech in interface JaMorphData
      Parameters:
      morphId - word ID of token
      Returns:
      Part-Of-Speech of the token
    • getPronunciation

      public String getPronunciation(int morphId, char[] surface, int off, int len)
      Description copied from interface: JaMorphData
      Get pronunciation of tokens
      Specified by:
      getPronunciation in interface JaMorphData
      Parameters:
      morphId - word ID of token
      Returns:
      Pronunciation of the token
    • getInflectionType

      public String getInflectionType(int morphId)
      Description copied from interface: JaMorphData
      Get inflection type of tokens
      Specified by:
      getInflectionType in interface JaMorphData
      Parameters:
      morphId - word ID of token
      Returns:
      inflection type, or null
    • getInflectionForm

      public String getInflectionForm(int wordId)
      Description copied from interface: JaMorphData
      Get inflection form of tokens
      Specified by:
      getInflectionForm in interface JaMorphData
      Parameters:
      wordId - word ID of token
      Returns:
      inflection form, or null
    • readingOffset

      private int readingOffset(int wordId)
    • pronunciationOffset

      private int pronunciationOffset(int wordId)
    • baseFormOffset

      private static int baseFormOffset(int wordId)
    • hasBaseFormData

      private boolean hasBaseFormData(int wordId)
    • hasReadingData

      private boolean hasReadingData(int wordId)
    • hasPronunciationData

      private boolean hasPronunciationData(int wordId)
    • readString

      private String readString(int offset, int length, boolean kana)