java.lang.Object
org.apache.lucene.util.fst.FST<T>
- All Implemented Interfaces:
Accountable
Represents an finite state machine (FST), using a compact byte[] format.
The format is similar to what's used by Morfologik (https://github.com/morfologik/morfologik-stemming).
See the package documentation
for some simple examples.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final class
Represents a single arc.static class
Reads bytes stored in an FST.static final class
Represents the FST metadata.static enum
Specifies allowed range of each int input label for this FST. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final byte
Value of the arc flags to declare a node with fixed length (sparse) arcs designed for binary search.(package private) static final byte
Value of the arc flags to declare a node with continuous arcs designed for pos the arc directly with labelToPos - firstLabel.(package private) static final byte
Value of the arc flags to declare a node with fixed length dense arcs and bit table designed for direct addressing.private static final long
(package private) static final int
static final int
This flag is set if the arc has an output.(package private) static final int
(package private) static final int
(package private) static final int
(package private) static final int
private static final int
static final int
If arc has this label then that arc is final/acceptedprivate static final String
(package private) static final long
private final FSTReader
The reader of the FST, used to read bytes from the underlying FST storage(package private) final FST.FSTMetadata
<T> (package private) static final long
static final int
Version that was used when releasing Lucene 9.0.static final int
Version that started storing continuous arcs.static final int
Current version.private static final int
static final int
First supported version, this is the version that was used when releasing Lucene 7.0.Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
Constructor Summary
ConstructorsConstructorDescriptionFST
(FST.FSTMetadata<T> metadata, DataInput in) Load a previously saved FST with a DataInput for metdata using anOnHeapFSTStore
with maxBlockBits set toDEFAULT_MAX_BLOCK_BITS
FST
(FST.FSTMetadata<T> metadata, FSTReader fstReader) Create the FST with a metadata object and a FSTReader. -
Method Summary
Modifier and TypeMethodDescriptionfindTargetArc
(int labelToMatch, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) Finds an arc leaving the incoming arc, replacing the arc in place.private static boolean
flag
(int flags, int bit) static <T> FST
<T> fromFSTReader
(FST.FSTMetadata<T> fstMetadata, FSTReader fstReader) Create a FST from aFSTReader
.Returns aFST.BytesReader
for this FST, positioned at position 0.getFirstArc
(FST.Arc<T> arc) Fills virtual 'start' arc, ie, an empty incoming arc to the FST's start node(package private) static int
getNumPresenceBytes
(int labelRange) Gets the number of bytes required to flag the presence of each arc in the given label range, one bit per arc.(package private) boolean
isExpandedTarget
(FST.Arc<T> follow, FST.BytesReader in) Returns whetherarc
's target points to a node in expanded format (fixed length arcs).long
numBytes()
long
Return the memory usage of this object in bytes.static <T> FST
<T> Reads an automaton from a file.readArc
(FST.Arc<T> arc, FST.BytesReader in) Reads an arc.readArcByContinuous
(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex) Reads a Continuous node arc, with the provided index in the label range.readArcByDirectAddressing
(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex) Reads a present direct addressing node arc, with the provided index in the label range.readArcByDirectAddressing
(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex, int presenceIndex) Reads a present direct addressing node arc, with the provided index in the label range and its corresponding presence index (which is the count of presence bits before it).readArcByIndex
(FST.Arc<T> arc, FST.BytesReader in, int idx) (package private) static <T> FST.Arc
<T> readEndArc
(FST.Arc<T> follow, FST.Arc<T> arc) private void
readFirstArcInfo
(long nodeAddress, FST.Arc<T> arc, FST.BytesReader in) readFirstRealTargetArc
(long nodeAddress, FST.Arc<T> arc, FST.BytesReader in) readFirstTargetArc
(FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) Follow thefollow
arc and read the first arc of its target; this changes the providedarc
(2nd arg) in-place and returns it.int
Reads one BYTE1/2/4 label from the providedDataInput
.readLastArcByContinuous
(FST.Arc<T> arc, FST.BytesReader in) Reads the last arc of a continuous node.readLastArcByDirectAddressing
(FST.Arc<T> arc, FST.BytesReader in) Reads the last arc of a direct addressing node.readLastTargetArc
(FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) Follows thefollow
arc and reads the last arc of its target; this changes the providedarc
(2nd arg) in-place and returns it.static <T> FST.FSTMetadata
<T> readMetadata
(DataInput metaIn, Outputs<T> outputs) Read the FST metadata from DataInputreadNextArc
(FST.Arc<T> arc, FST.BytesReader in) In-place read; returns the arc.(package private) int
readNextArcLabel
(FST.Arc<T> arc, FST.BytesReader in) Peeks at next arc's label; does not alter arc.readNextRealArc
(FST.Arc<T> arc, FST.BytesReader in) Never returns null, but you should never call this if arc.isLast() is true.private void
readPresenceBytes
(FST.Arc<T> arc, FST.BytesReader in) Reads the presence bits of a direct-addressing node.private long
void
Writes an automaton to a file.void
save
(DataOutput metaOut, DataOutput out) Save the FST to DataOutput.private void
static <T> boolean
targetHasArcs
(FST.Arc<T> arc) returns true if the node at this address has any outgoing arcstoString()
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.apache.lucene.util.Accountable
getChildResources
-
Field Details
-
metadata
-
BASE_RAM_BYTES_USED
private static final long BASE_RAM_BYTES_USED -
BIT_FINAL_ARC
static final int BIT_FINAL_ARC- See Also:
-
BIT_LAST_ARC
static final int BIT_LAST_ARC- See Also:
-
BIT_TARGET_NEXT
static final int BIT_TARGET_NEXT- See Also:
-
BIT_STOP_NODE
static final int BIT_STOP_NODE- See Also:
-
BIT_ARC_HAS_OUTPUT
public static final int BIT_ARC_HAS_OUTPUTThis flag is set if the arc has an output.- See Also:
-
BIT_ARC_HAS_FINAL_OUTPUT
static final int BIT_ARC_HAS_FINAL_OUTPUT- See Also:
-
ARCS_FOR_BINARY_SEARCH
public static final byte ARCS_FOR_BINARY_SEARCHValue of the arc flags to declare a node with fixed length (sparse) arcs designed for binary search.- See Also:
-
ARCS_FOR_DIRECT_ADDRESSING
static final byte ARCS_FOR_DIRECT_ADDRESSINGValue of the arc flags to declare a node with fixed length dense arcs and bit table designed for direct addressing.- See Also:
-
ARCS_FOR_CONTINUOUS
static final byte ARCS_FOR_CONTINUOUSValue of the arc flags to declare a node with continuous arcs designed for pos the arc directly with labelToPos - firstLabel. likeARCS_FOR_BINARY_SEARCH
we use flag combinations that will not occur at the same time.- See Also:
-
FILE_FORMAT_NAME
- See Also:
-
VERSION_START
public static final int VERSION_STARTFirst supported version, this is the version that was used when releasing Lucene 7.0.- See Also:
-
VERSION_LITTLE_ENDIAN
private static final int VERSION_LITTLE_ENDIAN- See Also:
-
VERSION_CONTINUOUS_ARCS
public static final int VERSION_CONTINUOUS_ARCSVersion that started storing continuous arcs.- See Also:
-
VERSION_CURRENT
public static final int VERSION_CURRENTCurrent version.- See Also:
-
VERSION_90
public static final int VERSION_90Version that was used when releasing Lucene 9.0.- See Also:
-
FINAL_END_NODE
static final long FINAL_END_NODE- See Also:
-
NON_FINAL_END_NODE
static final long NON_FINAL_END_NODE- See Also:
-
END_LABEL
public static final int END_LABELIf arc has this label then that arc is final/accepted- See Also:
-
fstReader
The reader of the FST, used to read bytes from the underlying FST storage -
outputs
-
DEFAULT_MAX_BLOCK_BITS
private static final int DEFAULT_MAX_BLOCK_BITS
-
-
Constructor Details
-
FST
Load a previously saved FST with a DataInput for metdata using anOnHeapFSTStore
with maxBlockBits set toDEFAULT_MAX_BLOCK_BITS
- Throws:
IOException
-
FST
FST(FST.FSTMetadata<T> metadata, FSTReader fstReader) Create the FST with a metadata object and a FSTReader.
-
-
Method Details
-
flag
private static boolean flag(int flags, int bit) -
fromFSTReader
Create a FST from aFSTReader
. Return null if the metadata is null.- Parameters:
fstMetadata
- the metadatafstReader
- the FSTReader- Returns:
- the FST
-
readMetadata
public static <T> FST.FSTMetadata<T> readMetadata(DataInput metaIn, Outputs<T> outputs) throws IOException Read the FST metadata from DataInput- Type Parameters:
T
- the output type- Parameters:
metaIn
- the DataInput of the metadataoutputs
- the FST outputs- Returns:
- the FST metadata
- Throws:
IOException
- if exception occurred during parsing
-
ramBytesUsed
public long ramBytesUsed()Description copied from interface:Accountable
Return the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsed
in interfaceAccountable
-
toString
-
numBytes
public long numBytes() -
getEmptyOutput
-
getMetadata
-
save
Save the FST to DataOutput.- Parameters:
metaOut
- the DataOutput to write the metadata toout
- the DataOutput to write the FST bytes to- Throws:
IOException
-
save
Writes an automaton to a file.- Throws:
IOException
-
read
Reads an automaton from a file.- Throws:
IOException
-
readLabel
Reads one BYTE1/2/4 label from the providedDataInput
.- Throws:
IOException
-
targetHasArcs
returns true if the node at this address has any outgoing arcs -
getNumPresenceBytes
static int getNumPresenceBytes(int labelRange) Gets the number of bytes required to flag the presence of each arc in the given label range, one bit per arc. -
readPresenceBytes
Reads the presence bits of a direct-addressing node. Actually we don't read them here, we just keep the pointer to the bit-table start and we skip them.- Throws:
IOException
-
getFirstArc
Fills virtual 'start' arc, ie, an empty incoming arc to the FST's start node -
readLastTargetArc
FST.Arc<T> readLastTargetArc(FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) throws IOException Follows thefollow
arc and reads the last arc of its target; this changes the providedarc
(2nd arg) in-place and returns it.- Returns:
- Returns the second argument (
arc
). - Throws:
IOException
-
readUnpackedNodeTarget
- Throws:
IOException
-
readFirstTargetArc
public FST.Arc<T> readFirstTargetArc(FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) throws IOException Follow thefollow
arc and read the first arc of its target; this changes the providedarc
(2nd arg) in-place and returns it.- Returns:
- Returns the second argument (
arc
). - Throws:
IOException
-
readFirstArcInfo
private void readFirstArcInfo(long nodeAddress, FST.Arc<T> arc, FST.BytesReader in) throws IOException - Throws:
IOException
-
readFirstRealTargetArc
public FST.Arc<T> readFirstRealTargetArc(long nodeAddress, FST.Arc<T> arc, FST.BytesReader in) throws IOException - Throws:
IOException
-
isExpandedTarget
Returns whetherarc
's target points to a node in expanded format (fixed length arcs).- Throws:
IOException
-
readNextArc
In-place read; returns the arc.- Throws:
IOException
-
readNextArcLabel
Peeks at next arc's label; does not alter arc. Do not call this if arc.isLast()!- Throws:
IOException
-
readArcByIndex
- Throws:
IOException
-
readArcByContinuous
public FST.Arc<T> readArcByContinuous(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex) throws IOException Reads a Continuous node arc, with the provided index in the label range.- Parameters:
rangeIndex
- The index of the arc in the label range. It must be within the label range.- Throws:
IOException
-
readArcByDirectAddressing
public FST.Arc<T> readArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex) throws IOException Reads a present direct addressing node arc, with the provided index in the label range.- Parameters:
rangeIndex
- The index of the arc in the label range. It must be present. The real arc offset is computed based on the presence bits of the direct addressing node.- Throws:
IOException
-
readArcByDirectAddressing
private FST.Arc<T> readArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex, int presenceIndex) throws IOException Reads a present direct addressing node arc, with the provided index in the label range and its corresponding presence index (which is the count of presence bits before it).- Throws:
IOException
-
readLastArcByDirectAddressing
public FST.Arc<T> readLastArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in) throws IOException Reads the last arc of a direct addressing node. This method is equivalent to callreadArcByDirectAddressing(Arc, BytesReader, int)
withrangeIndex
equal toarc.numArcs() - 1
, but it is faster.- Throws:
IOException
-
readLastArcByContinuous
Reads the last arc of a continuous node.- Throws:
IOException
-
readNextRealArc
Never returns null, but you should never call this if arc.isLast() is true.- Throws:
IOException
-
readArc
Reads an arc.
Precondition: The arc flags byte has already been read and set; the given BytesReader is positioned just after the arc flags byte.- Throws:
IOException
-
readEndArc
-
findTargetArc
public FST.Arc<T> findTargetArc(int labelToMatch, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) throws IOException Finds an arc leaving the incoming arc, replacing the arc in place. This returns null if the arc was not found, else the incoming arc.- Throws:
IOException
-
seekToNextNode
- Throws:
IOException
-
getBytesReader
Returns aFST.BytesReader
for this FST, positioned at position 0.
-