From 72231250ed81e10d66bfe70701e64fa5fe50f712 Mon Sep 17 00:00:00 2001 From: Jon Bratseth Date: Wed, 15 Jun 2016 23:09:44 +0200 Subject: Publish --- fsa/doc/fsa_file_format.html | 69 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+) create mode 100644 fsa/doc/fsa_file_format.html (limited to 'fsa/doc/fsa_file_format.html') diff --git a/fsa/doc/fsa_file_format.html b/fsa/doc/fsa_file_format.html new file mode 100644 index 00000000000..077edd627c3 --- /dev/null +++ b/fsa/doc/fsa_file_format.html @@ -0,0 +1,69 @@ + + + +fsa file format + + +

fsa file format

+ + + + + + +
header256 bytes
symbol tablesize bytes
state tablesize*4 bytes
data tabledata_size bytes
perfect hahs table (optional)size*4 bytes
+ +

header

+ + + + + + + + + + + + + +
fieldoffsetsizedescrption
magic04 (uint32)Magic number (0x79832469)
version44 (uint32)Version number of +the fsa library used for building this fsa
checksum84 (uint32)Checksum
size124 (uint32)Size of fsa (cells)
start164 (uint32)Start state
data_size204 (uint32)Size of data (bytes)
data_type244 (uint32)Type of data +items (0=variable size, 1=fixed size)
fixed_data_size284 (uint32)Data item size if fixed
has_perfect_hash324 +(uint32)Indicator for perfect hash (0=no, 1=yes)
serial364 (uint32)Serial number
reserved40216 (54*uint32)Reserved (pads size to 256 bytes)
+ +

symbol table and state table

+The symbol table and state table contain the transitions of the +automaton, each 1-byte entry in the symbol table corresponds to an +uint32 entry in the state table. For each state, a list of at most 254 +transistions is stored, as the symbol set is 8-bit characters, with +0x00 and 0xff reserved. Each state id is in fact an offset into these +tables. For a given state state, there exists a valid +transition for symbol sym if the symbol table contains +sym at offset state+sym. 0x00 means the +cell is empty, 0xff is a special symbol meaning that the given state +is a final state, anything else means invalid transition (i.e. the +cell is in use by some other state). For valid transitions, the +corresponding entry in the state table yields the next state. For 0xff +transitions, the state table entry contains the offset of the date +item within the data store. + +

data store

+The data store contains the data items for the final states. The 'new +state' entry of a final state transition in the state table (corresponding to the +special final state symbol 0xff) contains the data store offset of the data item +corresponding to that final state. If fixed size items are used, each +item takes fixed_data_size bytes as defined in the header. Variable +size items take 4 bytes (uint32 item_size) plus item_size +bytes. The size of the data store is given in the header. + +

perfect hash table

+The perfect hash table has one uint32 entry for each transition in the +symbol/state table, thus the size of the perfect hash table equals the +size of the state table. The perfect hash value for a final state is +calculated by adding all values in this table for the transitions +along the path from the start state to the final state. + + + + -- cgit v1.2.3