makefsa 1 makefsa create finite state automata files from text or binary input makefsa OPTIONS input_file fsa_file Description makefsa creates a finite state automaton file from text or binary input. If is not specified, standard input is used. The input must be sorted and must not contain duplicate input strings (unsorted or duplicate entries will be ignored). Options use text input format, with empty meta info (default) use text input format use binary input format, with base64 encoded meta info use binary input format with raw meta info use text input with numerical meta info data size for numerical meta info (default=4) ignore meta info regardless of input format build the automaton with a perfect hash set serial number of automaton (default=0) be verbose, display progress information and statistics display usage help display version number Input formats Text input format with empty meta info () The input strings are terminated with '\n', and may not contain '\0', '\0xff' or '\n' characters. This is the default. Text input format () Input lines are terminated with '\n', input string and meta info are separated by '\t'. Input and meta strings may not contain '\0', '\0xff', '\n' or '\t' characters. A terminating '\0' is added to the meta info when stored in the automaton. Text input format with numerical info () Input lines are terminated with '\n', input string and meta info are separated by '\t'. Input strings may not contain '\0', '\0xff', '\n' or '\t' characters. Meta strings are unsigned integers ([0-9]+), which will be stored in binary representation in the automaton. The size of the data can be controlled by the option, valid values are 1, 2 or 4 bytes, correcponding to uint8_t, uint16_t and uint32_t, respectively. (Default is 4 bytes.) Binary input format, with base64 encoded meta info () Both the input string and meta info are terminated by '\0'. The input string must not contain the reserved characters '\0' and '\0xff'. The meta info is base64 encoded, as it may contain any character. Binary input format with raw meta info () Both the input string and meta info are terminated by '\0'. The input string must not contain the reserved characters '\0' and '\0xff'. The meta info must not contain '\0'. Perfect hashes Automata built with perfect hash (() will contain an additional data structure which provides a mapping from the strings stored in the automaton to unique integers in the range [0,n-1] where n is the number of accepted strings. The size of the fsa file will increase by up to 80%. Lookup time is slightly longer if the hash value needs to be retrieved (but still O(m), where m is the length of the input). Reverse lookup is also possible, though it is more expensive (also O(m), but with a much higher constant). See also fsainfo, fsadump. Author Written by Peter Boros.