reflex::Pattern Class Reference

updated Tue Oct 29 2024 by Robert van Engelen
 
Classes | Public Types | Public Member Functions | Static Public Member Functions | Protected Member Functions | Private Types | Private Member Functions | Static Private Member Functions | Private Attributes | Friends | List of all members
reflex::Pattern Class Reference

Pattern class holds a regex pattern and its compiled FSM opcode table or code for the reflex::Matcher engine. More...

#include <pattern.h>

Collaboration diagram for reflex::Pattern:
Collaboration graph
[legend]

Classes

struct  Chars
 Set of chars and meta chars. More...
 
struct  Const
 Common constants. More...
 
struct  DFA
 DFA created by subset construction from regex patterns. More...
 
struct  HFA
 Indexing hash finite state automaton for indexed file search. More...
 
struct  ModConst
 Modifiers 'i', 'm', 'q', 's', 'u' (enable) 'I', 'M', 'Q', 'S', 'U' (disable) More...
 
struct  Option
 Global modifier modes, syntax flags, and compiler options. More...
 
struct  Position
 Finite state machine construction position information. More...
 

Public Types

typedef uint8_t Bitap
 bitap bitmask, may change in a future update More...
 
typedef uint8_t Pred
 predict match bits More...
 
typedef uint16_t Hash
 hash value type, max value is Const::HASH More...
 
typedef uint32_t Index
 index into opcodes array Pattern::opc_ and subpattern indexing More...
 
typedef uint32_t Accept
 group capture index More...
 
typedef uint32_t Opcode
 32 bit opcode word More...
 
typedef void(* FSM) (class Matcher &)
 

Public Member Functions

 Pattern ()
 Construct an unset pattern. More...
 
 Pattern (const char *regex, const char *options=NULL)
 Construct a pattern object given a regex string. More...
 
 Pattern (const char *regex, const std::string &options)
 Construct a pattern object given a regex string. More...
 
 Pattern (const std::string &regex, const char *options=NULL)
 Construct a pattern object given a regex string. More...
 
 Pattern (const std::string &regex, const std::string &options)
 Construct a pattern object given a regex string. More...
 
 Pattern (const Opcode *code, const uint8_t *pred=NULL)
 Construct a pattern object given an opcode table. More...
 
 Pattern (FSM fsm, const uint8_t *pred=NULL)
 Construct a pattern object given a function pointer to FSM code. More...
 
 Pattern (const Pattern &pattern)
 Copy constructor. More...
 
virtual ~Pattern ()
 Destructor, deletes internal code array when owned and allocated. More...
 
void clear ()
 Clear and delete pattern data. More...
 
Patternassign (const char *regex, const char *options=NULL)
 Assign a (new) pattern. More...
 
Patternassign (const char *regex, const std::string &options)
 Assign a (new) pattern. More...
 
Patternassign (const std::string &regex, const char *options=NULL)
 Assign a (new) pattern. More...
 
Patternassign (const std::string &regex, const std::string &options)
 Assign a (new) pattern. More...
 
Patternassign (const Opcode *code, const uint8_t *pred=NULL)
 Assign a (new) pattern. More...
 
Patternassign (FSM fsm, const uint8_t *pred=NULL)
 Assign a (new) pattern. More...
 
Patternoperator= (const Pattern &pattern)
 Assign a (new) pattern. More...
 
Patternoperator= (const char *regex)
 Assign a (new) pattern. More...
 
Patternoperator= (const std::string &regex)
 Assign a (new) pattern. More...
 
Patternoperator= (const Opcode *code)
 Assign a (new) pattern. More...
 
Patternoperator= (FSM fsm)
 Assign a (new) pattern. More...
 
Accept size () const
 Get the number of subpatterns of this pattern object. More...
 
bool empty () const
 Return true if this pattern is not assigned. More...
 
const std::string operator[] (Accept choice) const
 Get subpattern regex of this pattern object or the whole regex with index 0. More...
 
bool reachable (Accept choice) const
 Check if subpattern is reachable by a match. More...
 
size_t nodes () const
 Get the number of finite state machine nodes (vertices). More...
 
size_t edges () const
 Get the number of finite state machine edges (transitions on input characters). More...
 
size_t words () const
 Get the code size in number of words. More...
 
size_t hashes () const
 Get the total number of indexing hash tables constructed for the optional HFA. More...
 
float parse_time () const
 Get elapsed regex parsing and analysis time. More...
 
float nodes_time () const
 Get elapsed DFA vertices construction time. More...
 
float edges_time () const
 Get elapsed DFA edges construction time. More...
 
float words_time () const
 Get elapsed code words assembly time. More...
 
float analysis_time () const
 Get elapsed time of DFA analysis to predict matches and construct an optional HFA. More...
 
bool predict_match (const char *s, size_t n) const
 Returns true when match is predicted, based on s[0..3..e-1] (e >= s + 4). More...
 
bool predict_match (const char *s) const
 Returns true when match is predicted using my PM4 logic. More...
 
bool has_hfa () const
 
bool match_hfa (const uint8_t *indexed, size_t size) const
 

Static Public Member Functions

static uint8_t frequency (uint8_t c)
 Relative frequency of English letters with upper/lower-case ratio = 0.0563, punctuation and UTF-8 bytes. More...
 

Protected Member Functions

virtual void error (regex_error_type code, size_t pos=0) const
 Throw an error. More...
 

Private Types

enum  Meta {
  META_MIN = 0x100, META_WBB = 0x101, META_WBE = 0x102, META_NWB = 0x103,
  META_NWE = 0x104, META_BWB = 0x105, META_EWB = 0x106, META_BWE = 0x107,
  META_EWE = 0x108, META_BOL = 0x109, META_EOL = 0x10a, META_BOB = 0x10b,
  META_EOB = 0x10c, META_UND = 0x10d, META_IND = 0x10e, META_DED = 0x10f,
  META_MAX
}
 Meta characters. More...
 
typedef uint8_t Mod
 
typedef uint16_t Char
 
typedef uint8_t Lazy
 
typedef uint16_t Iter
 
typedef uint16_t Lookahead
 
typedef std::set< LookaheadLookaheads
 
typedef uint32_t Location
 
typedef ORanges< LocationLocations
 
typedef std::map< int, LocationsMap
 
typedef Locations Mods[10]
 
typedef std::vector< PositionLazypos
 
typedef std::vector< PositionPositions
 
typedef std::map< Position, PositionsFollow
 
typedef std::pair< Chars, PositionsMove
 
typedef std::list< MoveMoves
 

Private Member Functions

void init (const char *options, const uint8_t *pred=NULL)
 Initialize the pattern at construction. More...
 
void init_options (const char *options)
 
void parse (Positions &startpos, Follow &followpos, Lazypos &lazypos, Mods modifiers, Map &lookahead)
 
void parse1 (bool begin, Location &loc, Positions &firstpos, Positions &lastpos, bool &nullable, Follow &followpos, Lazy &lazyidx, Lazypos &lazypos, Mods modifiers, Locations &lookahead, Iter &iter)
 
void parse2 (bool begin, Location &loc, Positions &firstpos, Positions &lastpos, bool &nullable, Follow &followpos, Lazy &lazyidx, Lazypos &lazypos, Mods modifiers, Locations &lookahead, Iter &iter)
 
void parse3 (bool begin, Location &loc, Positions &firstpos, Positions &lastpos, bool &nullable, Follow &followpos, Lazy &lazyidx, Lazypos &lazypos, Mods modifiers, Locations &lookahead, Iter &iter)
 
void parse4 (bool begin, Location &loc, Positions &firstpos, Positions &lastpos, bool &nullable, Follow &followpos, Lazy &lazyidx, Lazypos &lazypos, Mods modifiers, Locations &lookahead, Iter &iter)
 
Char parse_esc (Location &loc, Chars *chars=NULL) const
 
void compile (DFA::State *start, Follow &followpos, const Lazypos &lazypos, const Mods modifiers, const Map &lookahead)
 
void lazy (const Lazypos &lazypos, Positions &pos) const
 
void lazy (const Lazypos &lazypos, const Positions &pos, Positions &pos1) const
 
void greedy (Positions &pos) const
 
void trim_anchors (Positions &follow) const
 
void trim_lazy (Positions *pos, const Lazypos &lazypos) const
 
void compile_transition (DFA::State *state, Follow &followpos, const Lazypos &lazypos, const Mods modifiers, const Map &lookahead, Moves &moves) const
 
void transition (Moves &moves, Chars &chars, const Positions &follow) const
 
void compile_list (Location loc, Chars &chars, const Mods modifiers) const
 
void posix (size_t index, Chars &chars) const
 
void flip (Chars &chars) const
 
void assemble (DFA::State *start)
 
void compact_dfa (DFA::State *start)
 
void encode_dfa (DFA::State *start)
 
void gencode_dfa (const DFA::State *start) const
 
void check_dfa_closure (const DFA::State *state, int nest, bool &peek) const
 
void gencode_dfa_closure (FILE *fd, const DFA::State *start, int nest, bool peek) const
 
void graph_dfa (const DFA::State *start) const
 
void export_code () const
 
void analyze_dfa (DFA::State *start)
 
void gen_min (std::set< DFA::State * > &states)
 
void gen_predict_match (std::set< DFA::State * > &states)
 
void gen_predict_match_start (std::set< DFA::State * > &states, std::map< DFA::State *, std::pair< ORanges< Hash >, ORanges< Char > > > &first_hashes)
 
void gen_predict_match_transitions (size_t level, DFA::State *state, const std::pair< ORanges< Hash >, ORanges< Char > > &previous, std::map< DFA::State *, std::pair< ORanges< Hash >, ORanges< Char > > > &level_hashes)
 
void gen_match_hfa (DFA::State *start)
 
void gen_match_hfa_start (DFA::State *start, HFA::State &index, HFA::StateHashes &hashes)
 
bool gen_match_hfa_transitions (size_t level, size_t &max_level, DFA::State *state, const HFA::HashRanges &previous, HFA::State &index, HFA::StateHashes &hashes)
 
bool match_hfa_transitions (size_t level, const HFA::Hashes &hashes, const uint8_t *indexed, size_t size, HFA::VisitSet &visit, HFA::VisitSet &next_visit, bool &accept) const
 
void write_predictor (FILE *fd) const
 
void write_namespace_open (FILE *fd) const
 
void write_namespace_close (FILE *fd) const
 
size_t find_at (Location loc, char c) const
 
Char at (Location k) const
 
bool eq_at (Location loc, const char *s) const
 
Char escape_at (Location loc) const
 
Char escapes_at (Location loc, const char *escapes) const
 

Static Private Member Functions

static void pos_insert (Positions &s1, const Positions &s2)
 
static void pos_add (Positions &s, const Position &e)
 
static void lazy_insert (Lazypos &s1, const Lazypos &s2)
 
static void lazy_add (Lazypos &s, const Lazy i, Location p)
 
static bool is_modified (Mod mod, const Mods modifiers, Location loc)
 
static void update_modified (Mod mod, Mods modifiers, Location from, Location to)
 
static uint16_t hash_pos (const Positions *pos)
 
static bool valid_goto_index (Index index)
 
static bool valid_take_index (Index index)
 
static bool valid_lookahead_index (Index index)
 
static bool is_meta (Char c)
 
static Opcode opcode_long (Index index)
 
static Opcode opcode_take (Index index)
 
static Opcode opcode_redo ()
 
static Opcode opcode_tail (Index index)
 
static Opcode opcode_head (Index index)
 
static Opcode opcode_goto (Char lo, Char hi, Index index)
 
static Opcode opcode_halt ()
 
static bool is_opcode_long (Opcode opcode)
 
static bool is_opcode_take (Opcode opcode)
 
static bool is_opcode_redo (Opcode opcode)
 
static bool is_opcode_tail (Opcode opcode)
 
static bool is_opcode_head (Opcode opcode)
 
static bool is_opcode_halt (Opcode opcode)
 
static bool is_opcode_goto (Opcode opcode)
 
static bool is_opcode_meta (Opcode opcode)
 
static bool is_opcode_goto (Opcode opcode, unsigned char c)
 
static Char meta_of (Opcode opcode)
 
static Char lo_of (Opcode opcode)
 
static Char hi_of (Opcode opcode)
 
static Index index_of (Opcode opcode)
 
static Index long_index_of (Opcode opcode)
 
static Lookahead lookahead_of (Opcode opcode)
 
static Char lowercase (Char c)
 convert to lower case if c is a letter a-z, A-Z. More...
 
static Char uppercase (Char c)
 convert to upper case if c is a letter a-z, A-Z. More...
 
static uint32_t hash (uint32_t h, uint8_t b)
 predict match hash 0 <= hash() < Const::HASH. More...
 
static uint32_t bihash (uint8_t a, uint8_t b)
 bitap character pairs hash More...
 
static uint32_t indexhash (Hash h, uint8_t b)
 file indexing hash 0 <= indexhash() < 65536, must be additive: indexhash(x,b+1) = indexhash(x,b)+1 modulo 2^16. More...
 

Private Attributes

Option opt_
 pattern compiler options More...
 
HFA hfa_
 indexing hash finite state automaton More...
 
DFA tfa_
 tree DFA constructed from strings More...
 
DFA dfa_
 DFA constructed from regex with subset construction using firstpos/lastpos/followpos. More...
 
std::string rex_
 regular expression string More...
 
std::vector< Locationend_
 entries point to the subpattern's ending '|' or '\0' More...
 
std::vector< bool > acc_
 true if subpattern n is accepting (state is reachable) More...
 
size_t vno_
 number of finite state machine vertices |V| (nodes) More...
 
size_t eno_
 number of finite state machine edges |E| (arrows) More...
 
size_t hno_
 number of indexing hash tables (HFA edges) More...
 
const Opcodeopc_
 points to the table with compiled finite state machine opcodes More...
 
FSM fsm_
 function pointer to FSM code More...
 
Index nop_
 number of opcodes generated More...
 
Index cut_
 DFA s-t cut to improve predict match and HFA accuracy with lbk_ and cbk_. More...
 
size_t len_
 length of chr_[], less or equal to 255 More...
 
size_t min_
 patterns after the prefix are at least this long but no more than 8 More...
 
size_t pin_
 number of needles, 0 to 16 More...
 
std::bitset< 256 > cbk_
 characters to look back over when lbk_ > 0, never includes
More...
 
std::bitset< 256 > fst_
 the beginning characters of the pattern More...
 
char chr_ [256]
 pattern prefix string or character needles for needle-based search More...
 
Bitap bit_ [256]
 bitsets of characters for the first positions (one position per bit) More...
 
Bitap tap_ [Const::BTAP]
 bitap hashed character pairs array More...
 
Pred pmh_ [Const::HASH]
 predict-match bloom filter hash up to first 8 positions More...
 
Pred pma_ [Const::HASH]
 predict-match 4 (PM4) array More...
 
uint16_t lbk_
 lookback distance or 0xffff unlimited lookback or 0 for no lookback (empty cbk_) More...
 
uint16_t lbm_
 loopback minimum distance when lbk_ > 0 More...
 
uint16_t lcp_
 primary least common character position in the pattern or 0xffff More...
 
uint16_t lcs_
 secondary least common character position in the pattern or 0xffff More...
 
size_t bmd_
 Boyer-Moore jump distance on mismatch, B-M is enabled when bmd_ > 0 (<= 255) More...
 
uint8_t bms_ [256]
 Boyer-Moore skip array. More...
 
float pms_
 ms elapsed time to parse regex More...
 
float vms_
 ms elapsed time to compile DFA vertices More...
 
float ems_
 ms elapsed time to compile DFA edges More...
 
float wms_
 ms elapsed time to assemble code words More...
 
float ams_
 ms elapsed time to analyze DFA for predict match and HFA More...
 
uint16_t npy_
 entropy derived from the bitap array bit_[] More...
 
bool one_
 true if matching one string stored in chr_[] without meta/anchors More...
 
bool bol_
 true if matching all patterns at the begin of a line with anchor ^ More...
 

Friends

class Matcher
 permit access by the reflex::Matcher engine More...
 
class FuzzyMatcher
 permit access by the reflex::FuzzyMatcher engine More...
 

Detailed Description

Pattern class holds a regex pattern and its compiled FSM opcode table or code for the reflex::Matcher engine.

Member Typedef Documentation

typedef uint32_t reflex::Pattern::Accept

group capture index

typedef uint8_t reflex::Pattern::Bitap

bitap bitmask, may change in a future update

typedef uint16_t reflex::Pattern::Char
private
typedef std::map<Position,Positions> reflex::Pattern::Follow
private
typedef void(* reflex::Pattern::FSM) (class Matcher &)

function pointer to FSM code

typedef uint16_t reflex::Pattern::Hash

hash value type, max value is Const::HASH

typedef uint32_t reflex::Pattern::Index

index into opcodes array Pattern::opc_ and subpattern indexing

typedef uint16_t reflex::Pattern::Iter
private
typedef uint8_t reflex::Pattern::Lazy
private
typedef std::vector<Position> reflex::Pattern::Lazypos
private
typedef uint32_t reflex::Pattern::Location
private
typedef uint16_t reflex::Pattern::Lookahead
private
typedef std::set<Lookahead> reflex::Pattern::Lookaheads
private
typedef std::map<int,Locations> reflex::Pattern::Map
private
typedef uint8_t reflex::Pattern::Mod
private
typedef Locations reflex::Pattern::Mods[10]
private
typedef std::pair<Chars,Positions> reflex::Pattern::Move
private
typedef std::list<Move> reflex::Pattern::Moves
private
typedef uint32_t reflex::Pattern::Opcode

32 bit opcode word

typedef std::vector<Position> reflex::Pattern::Positions
private
typedef uint8_t reflex::Pattern::Pred

predict match bits

Member Enumeration Documentation

enum reflex::Pattern::Meta
private

Meta characters.

Enumerator
META_MIN 
META_WBB 

word boundary at begin \bx

META_WBE 

word boundary at end x\b

META_NWB 

non-word boundary at begin \Bx

META_NWE 

non-word boundary at end x\B

META_BWB 

begin of word at begin \<x

META_EWB 

end of word at begin \>x

META_BWE 

begin of word at end x\<

META_EWE 

end of word at end x\>

META_BOL 

begin of line ^

META_EOL 

end of line $

META_BOB 

begin of buffer \A

META_EOB 

end of buffer \Z

META_UND 

undent boundary \k

META_IND 

indent boundary \i (must be one but the largest META code)

META_DED 

dedent boundary \j (must be the largest META code)

META_MAX 

max meta characters

Constructor & Destructor Documentation

reflex::Pattern::Pattern ( )
inline

Construct an unset pattern.

reflex::Pattern::Pattern ( const char *  regex,
const char *  options = NULL 
)
inlineexplicit

Construct a pattern object given a regex string.

reflex::Pattern::Pattern ( const char *  regex,
const std::string &  options 
)
inline

Construct a pattern object given a regex string.

reflex::Pattern::Pattern ( const std::string &  regex,
const char *  options = NULL 
)
inlineexplicit

Construct a pattern object given a regex string.

reflex::Pattern::Pattern ( const std::string &  regex,
const std::string &  options 
)
inline

Construct a pattern object given a regex string.

reflex::Pattern::Pattern ( const Opcode code,
const uint8_t *  pred = NULL 
)
inlineexplicit

Construct a pattern object given an opcode table.

reflex::Pattern::Pattern ( FSM  fsm,
const uint8_t *  pred = NULL 
)
inlineexplicit

Construct a pattern object given a function pointer to FSM code.

reflex::Pattern::Pattern ( const Pattern pattern)
inline

Copy constructor.

Parameters
patternpattern to copy
virtual reflex::Pattern::~Pattern ( )
inlinevirtual

Destructor, deletes internal code array when owned and allocated.

Member Function Documentation

float reflex::Pattern::analysis_time ( ) const
inline

Get elapsed time of DFA analysis to predict matches and construct an optional HFA.

Returns
time in ms
void reflex::Pattern::analyze_dfa ( DFA::State start)
private
void reflex::Pattern::assemble ( DFA::State start)
private
Pattern& reflex::Pattern::assign ( const char *  regex,
const char *  options = NULL 
)
inline

Assign a (new) pattern.

Pattern& reflex::Pattern::assign ( const char *  regex,
const std::string &  options 
)
inline

Assign a (new) pattern.

Pattern& reflex::Pattern::assign ( const std::string &  regex,
const char *  options = NULL 
)
inline

Assign a (new) pattern.

Pattern& reflex::Pattern::assign ( const std::string &  regex,
const std::string &  options 
)
inline

Assign a (new) pattern.

Pattern& reflex::Pattern::assign ( const Opcode code,
const uint8_t *  pred = NULL 
)
inline

Assign a (new) pattern.

Pattern& reflex::Pattern::assign ( FSM  fsm,
const uint8_t *  pred = NULL 
)
inline

Assign a (new) pattern.

Char reflex::Pattern::at ( Location  k) const
inlineprivate
static uint32_t reflex::Pattern::bihash ( uint8_t  a,
uint8_t  b 
)
inlinestaticprivate

bitap character pairs hash

void reflex::Pattern::check_dfa_closure ( const DFA::State state,
int  nest,
bool &  peek 
) const
private
void reflex::Pattern::clear ( )
inline

Clear and delete pattern data.

void reflex::Pattern::compact_dfa ( DFA::State start)
private
void reflex::Pattern::compile ( DFA::State start,
Follow followpos,
const Lazypos lazypos,
const Mods  modifiers,
const Map lookahead 
)
private
void reflex::Pattern::compile_list ( Location  loc,
Chars chars,
const Mods  modifiers 
) const
private
void reflex::Pattern::compile_transition ( DFA::State state,
Follow followpos,
const Lazypos lazypos,
const Mods  modifiers,
const Map lookahead,
Moves moves 
) const
private
size_t reflex::Pattern::edges ( ) const
inline

Get the number of finite state machine edges (transitions on input characters).

Returns
number of edges or 0 when no finite state machine was constructed by this pattern
float reflex::Pattern::edges_time ( ) const
inline

Get elapsed DFA edges construction time.

Returns
time in ms
bool reflex::Pattern::empty ( ) const
inline

Return true if this pattern is not assigned.

Returns
true if this pattern is not assigned
void reflex::Pattern::encode_dfa ( DFA::State start)
private
bool reflex::Pattern::eq_at ( Location  loc,
const char *  s 
) const
inlineprivate
virtual void reflex::Pattern::error ( regex_error_type  code,
size_t  pos = 0 
) const
protectedvirtual

Throw an error.

Parameters
codeerror code
posoptional location of the error in regex string Pattern::rex_
Char reflex::Pattern::escape_at ( Location  loc) const
inlineprivate
Char reflex::Pattern::escapes_at ( Location  loc,
const char *  escapes 
) const
inlineprivate
void reflex::Pattern::export_code ( ) const
private
size_t reflex::Pattern::find_at ( Location  loc,
char  c 
) const
inlineprivate
void reflex::Pattern::flip ( Chars chars) const
private
static uint8_t reflex::Pattern::frequency ( uint8_t  c)
inlinestatic

Relative frequency of English letters with upper/lower-case ratio = 0.0563, punctuation and UTF-8 bytes.

void reflex::Pattern::gen_match_hfa ( DFA::State start)
private
void reflex::Pattern::gen_match_hfa_start ( DFA::State start,
HFA::State index,
HFA::StateHashes hashes 
)
private
bool reflex::Pattern::gen_match_hfa_transitions ( size_t  level,
size_t &  max_level,
DFA::State state,
const HFA::HashRanges previous,
HFA::State index,
HFA::StateHashes hashes 
)
private
void reflex::Pattern::gen_min ( std::set< DFA::State * > &  states)
private
void reflex::Pattern::gen_predict_match ( std::set< DFA::State * > &  states)
private
void reflex::Pattern::gen_predict_match_start ( std::set< DFA::State * > &  states,
std::map< DFA::State *, std::pair< ORanges< Hash >, ORanges< Char > > > &  first_hashes 
)
private
void reflex::Pattern::gen_predict_match_transitions ( size_t  level,
DFA::State state,
const std::pair< ORanges< Hash >, ORanges< Char > > &  previous,
std::map< DFA::State *, std::pair< ORanges< Hash >, ORanges< Char > > > &  level_hashes 
)
private
void reflex::Pattern::gencode_dfa ( const DFA::State start) const
private
void reflex::Pattern::gencode_dfa_closure ( FILE *  fd,
const DFA::State start,
int  nest,
bool  peek 
) const
private
void reflex::Pattern::graph_dfa ( const DFA::State start) const
private
void reflex::Pattern::greedy ( Positions pos) const
private
bool reflex::Pattern::has_hfa ( ) const
inline
static uint32_t reflex::Pattern::hash ( uint32_t  h,
uint8_t  b 
)
inlinestaticprivate

predict match hash 0 <= hash() < Const::HASH.

static uint16_t reflex::Pattern::hash_pos ( const Positions pos)
inlinestaticprivate
size_t reflex::Pattern::hashes ( ) const
inline

Get the total number of indexing hash tables constructed for the optional HFA.

Returns
number of HFA hashes total for all HFA edges
static Char reflex::Pattern::hi_of ( Opcode  opcode)
inlinestaticprivate
static Index reflex::Pattern::index_of ( Opcode  opcode)
inlinestaticprivate
static uint32_t reflex::Pattern::indexhash ( Hash  h,
uint8_t  b 
)
inlinestaticprivate

file indexing hash 0 <= indexhash() < 65536, must be additive: indexhash(x,b+1) = indexhash(x,b)+1 modulo 2^16.

void reflex::Pattern::init ( const char *  options,
const uint8_t *  pred = NULL 
)
private

Initialize the pattern at construction.

void reflex::Pattern::init_options ( const char *  options)
private
static bool reflex::Pattern::is_meta ( Char  c)
inlinestaticprivate
static bool reflex::Pattern::is_modified ( Mod  mod,
const Mods  modifiers,
Location  loc 
)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_goto ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_goto ( Opcode  opcode,
unsigned char  c 
)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_halt ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_head ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_long ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_meta ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_redo ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_tail ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_take ( Opcode  opcode)
inlinestaticprivate
void reflex::Pattern::lazy ( const Lazypos lazypos,
Positions pos 
) const
private
void reflex::Pattern::lazy ( const Lazypos lazypos,
const Positions pos,
Positions pos1 
) const
private
static void reflex::Pattern::lazy_add ( Lazypos s,
const Lazy  i,
Location  p 
)
inlinestaticprivate
static void reflex::Pattern::lazy_insert ( Lazypos s1,
const Lazypos s2 
)
inlinestaticprivate
static Char reflex::Pattern::lo_of ( Opcode  opcode)
inlinestaticprivate
static Index reflex::Pattern::long_index_of ( Opcode  opcode)
inlinestaticprivate
static Lookahead reflex::Pattern::lookahead_of ( Opcode  opcode)
inlinestaticprivate
static Char reflex::Pattern::lowercase ( Char  c)
inlinestaticprivate

convert to lower case if c is a letter a-z, A-Z.

bool reflex::Pattern::match_hfa ( const uint8_t *  indexed,
size_t  size 
) const
bool reflex::Pattern::match_hfa_transitions ( size_t  level,
const HFA::Hashes hashes,
const uint8_t *  indexed,
size_t  size,
HFA::VisitSet visit,
HFA::VisitSet next_visit,
bool &  accept 
) const
private
static Char reflex::Pattern::meta_of ( Opcode  opcode)
inlinestaticprivate
size_t reflex::Pattern::nodes ( ) const
inline

Get the number of finite state machine nodes (vertices).

Returns
number of nodes or 0 when no finite state machine was constructed by this pattern
float reflex::Pattern::nodes_time ( ) const
inline

Get elapsed DFA vertices construction time.

Returns
time in ms
static Opcode reflex::Pattern::opcode_goto ( Char  lo,
Char  hi,
Index  index 
)
inlinestaticprivate
static Opcode reflex::Pattern::opcode_halt ( )
inlinestaticprivate
static Opcode reflex::Pattern::opcode_head ( Index  index)
inlinestaticprivate
static Opcode reflex::Pattern::opcode_long ( Index  index)
inlinestaticprivate
static Opcode reflex::Pattern::opcode_redo ( )
inlinestaticprivate
static Opcode reflex::Pattern::opcode_tail ( Index  index)
inlinestaticprivate
static Opcode reflex::Pattern::opcode_take ( Index  index)
inlinestaticprivate
Pattern& reflex::Pattern::operator= ( const Pattern pattern)
inline

Assign a (new) pattern.

Pattern& reflex::Pattern::operator= ( const char *  regex)
inline

Assign a (new) pattern.

Pattern& reflex::Pattern::operator= ( const std::string &  regex)
inline

Assign a (new) pattern.

Pattern& reflex::Pattern::operator= ( const Opcode code)
inline

Assign a (new) pattern.

Pattern& reflex::Pattern::operator= ( FSM  fsm)
inline

Assign a (new) pattern.

const std::string reflex::Pattern::operator[] ( Accept  choice) const

Get subpattern regex of this pattern object or the whole regex with index 0.

Returns
subpattern string or "" when not set
void reflex::Pattern::parse ( Positions startpos,
Follow followpos,
Lazypos lazypos,
Mods  modifiers,
Map lookahead 
)
private
void reflex::Pattern::parse1 ( bool  begin,
Location loc,
Positions firstpos,
Positions lastpos,
bool &  nullable,
Follow followpos,
Lazy lazyidx,
Lazypos lazypos,
Mods  modifiers,
Locations lookahead,
Iter iter 
)
private
void reflex::Pattern::parse2 ( bool  begin,
Location loc,
Positions firstpos,
Positions lastpos,
bool &  nullable,
Follow followpos,
Lazy lazyidx,
Lazypos lazypos,
Mods  modifiers,
Locations lookahead,
Iter iter 
)
private
void reflex::Pattern::parse3 ( bool  begin,
Location loc,
Positions firstpos,
Positions lastpos,
bool &  nullable,
Follow followpos,
Lazy lazyidx,
Lazypos lazypos,
Mods  modifiers,
Locations lookahead,
Iter iter 
)
private
void reflex::Pattern::parse4 ( bool  begin,
Location loc,
Positions firstpos,
Positions lastpos,
bool &  nullable,
Follow followpos,
Lazy lazyidx,
Lazypos lazypos,
Mods  modifiers,
Locations lookahead,
Iter iter 
)
private
Char reflex::Pattern::parse_esc ( Location loc,
Chars chars = NULL 
) const
private
float reflex::Pattern::parse_time ( ) const
inline

Get elapsed regex parsing and analysis time.

Returns
time in ms
static void reflex::Pattern::pos_add ( Positions s,
const Position e 
)
inlinestaticprivate
static void reflex::Pattern::pos_insert ( Positions s1,
const Positions s2 
)
inlinestaticprivate
void reflex::Pattern::posix ( size_t  index,
Chars chars 
) const
private
bool reflex::Pattern::predict_match ( const char *  s,
size_t  n 
) const
inline

Returns true when match is predicted, based on s[0..3..e-1] (e >= s + 4).

bool reflex::Pattern::predict_match ( const char *  s) const
inline

Returns true when match is predicted using my PM4 logic.

bool reflex::Pattern::reachable ( Accept  choice) const
inline

Check if subpattern is reachable by a match.

Returns
true if subpattern is reachable
Accept reflex::Pattern::size ( ) const
inline

Get the number of subpatterns of this pattern object.

Returns
number of subpatterns
void reflex::Pattern::transition ( Moves moves,
Chars chars,
const Positions follow 
) const
private
void reflex::Pattern::trim_anchors ( Positions follow) const
private
void reflex::Pattern::trim_lazy ( Positions pos,
const Lazypos lazypos 
) const
private
static void reflex::Pattern::update_modified ( Mod  mod,
Mods  modifiers,
Location  from,
Location  to 
)
inlinestaticprivate
static Char reflex::Pattern::uppercase ( Char  c)
inlinestaticprivate

convert to upper case if c is a letter a-z, A-Z.

static bool reflex::Pattern::valid_goto_index ( Index  index)
inlinestaticprivate
static bool reflex::Pattern::valid_lookahead_index ( Index  index)
inlinestaticprivate
static bool reflex::Pattern::valid_take_index ( Index  index)
inlinestaticprivate
size_t reflex::Pattern::words ( ) const
inline

Get the code size in number of words.

Returns
number of words or 0 when no code was generated by this pattern
float reflex::Pattern::words_time ( ) const
inline

Get elapsed code words assembly time.

Returns
time in ms
void reflex::Pattern::write_namespace_close ( FILE *  fd) const
private
void reflex::Pattern::write_namespace_open ( FILE *  fd) const
private
void reflex::Pattern::write_predictor ( FILE *  fd) const
private

Friends And Related Function Documentation

friend class FuzzyMatcher
friend

permit access by the reflex::FuzzyMatcher engine

friend class Matcher
friend

permit access by the reflex::Matcher engine

Member Data Documentation

std::vector<bool> reflex::Pattern::acc_
private

true if subpattern n is accepting (state is reachable)

float reflex::Pattern::ams_
private

ms elapsed time to analyze DFA for predict match and HFA

Bitap reflex::Pattern::bit_[256]
private

bitsets of characters for the first positions (one position per bit)

size_t reflex::Pattern::bmd_
private

Boyer-Moore jump distance on mismatch, B-M is enabled when bmd_ > 0 (<= 255)

uint8_t reflex::Pattern::bms_[256]
private

Boyer-Moore skip array.

bool reflex::Pattern::bol_
private

true if matching all patterns at the begin of a line with anchor ^

std::bitset<256> reflex::Pattern::cbk_
private

characters to look back over when lbk_ > 0, never includes

char reflex::Pattern::chr_[256]
private

pattern prefix string or character needles for needle-based search

Index reflex::Pattern::cut_
private

DFA s-t cut to improve predict match and HFA accuracy with lbk_ and cbk_.

DFA reflex::Pattern::dfa_
private

DFA constructed from regex with subset construction using firstpos/lastpos/followpos.

float reflex::Pattern::ems_
private

ms elapsed time to compile DFA edges

std::vector<Location> reflex::Pattern::end_
private

entries point to the subpattern's ending '|' or '\0'

size_t reflex::Pattern::eno_
private

number of finite state machine edges |E| (arrows)

FSM reflex::Pattern::fsm_
private

function pointer to FSM code

std::bitset<256> reflex::Pattern::fst_
private

the beginning characters of the pattern

HFA reflex::Pattern::hfa_
private

indexing hash finite state automaton

size_t reflex::Pattern::hno_
private

number of indexing hash tables (HFA edges)

uint16_t reflex::Pattern::lbk_
private

lookback distance or 0xffff unlimited lookback or 0 for no lookback (empty cbk_)

uint16_t reflex::Pattern::lbm_
private

loopback minimum distance when lbk_ > 0

uint16_t reflex::Pattern::lcp_
private

primary least common character position in the pattern or 0xffff

uint16_t reflex::Pattern::lcs_
private

secondary least common character position in the pattern or 0xffff

size_t reflex::Pattern::len_
private

length of chr_[], less or equal to 255

size_t reflex::Pattern::min_
private

patterns after the prefix are at least this long but no more than 8

Index reflex::Pattern::nop_
private

number of opcodes generated

uint16_t reflex::Pattern::npy_
private

entropy derived from the bitap array bit_[]

bool reflex::Pattern::one_
private

true if matching one string stored in chr_[] without meta/anchors

const Opcode* reflex::Pattern::opc_
private

points to the table with compiled finite state machine opcodes

Option reflex::Pattern::opt_
private

pattern compiler options

size_t reflex::Pattern::pin_
private

number of needles, 0 to 16

Pred reflex::Pattern::pma_[Const::HASH]
private

predict-match 4 (PM4) array

Pred reflex::Pattern::pmh_[Const::HASH]
private

predict-match bloom filter hash up to first 8 positions

float reflex::Pattern::pms_
private

ms elapsed time to parse regex

std::string reflex::Pattern::rex_
private

regular expression string

Bitap reflex::Pattern::tap_[Const::BTAP]
private

bitap hashed character pairs array

DFA reflex::Pattern::tfa_
private

tree DFA constructed from strings

float reflex::Pattern::vms_
private

ms elapsed time to compile DFA vertices

size_t reflex::Pattern::vno_
private

number of finite state machine vertices |V| (nodes)

float reflex::Pattern::wms_
private

ms elapsed time to assemble code words


The documentation for this class was generated from the following file: