Class SoraniNormalizer

java.lang.Object
org.apache.lucene.analysis.ckb.SoraniNormalizer

class SoraniNormalizer extends Object
Normalizes the Unicode representation of Sorani text.

Normalization consists of:

  • Alternate forms of 'y' (0064, 0649) are converted to 06CC (FARSI YEH)
  • Alternate form of 'k' (0643) is converted to 06A9 (KEHEH)
  • Alternate forms of vowel 'e' (0647+200C, word-final 0647, 0629) are converted to 06D5 (AE)
  • Alternate (joining) form of 'h' (06BE) is converted to 0647
  • Alternate forms of 'rr' (0692, word-initial 0631) are converted to 0695 (REH WITH SMALL V BELOW)
  • Harakat, tatweel, and formatting characters such as directional controls are removed.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
    (package private) static final char
     
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    (package private) int
    normalize(char[] s, int len)
    Normalize an input buffer of Sorani text

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait