reflex Namespace Reference

updated Tue Oct 29 2024 by Robert van Engelen
 
Namespaces | Classes | Typedefs | Functions | Variables
reflex Namespace Reference

Namespaces

 convert_flag
 
 Posix
 
 Unicode
 

Classes

class  AbstractLexer
 The abstract lexer class template that is the abstract root class of all reflex-generated scanners. More...
 
class  AbstractMatcher
 The abstract matcher base class template defines an interface for all pattern matcher engines. More...
 
class  Bits
 RE/flex Bits class for dynamic bit vectors. More...
 
class  BoostMatcher
 Boost matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators, using the Boost::regex library. More...
 
class  BoostPerlMatcher
 Boost matcher engine class, extends reflex::BoostMatcher for Boost Perl regex matching. More...
 
class  BoostPosixMatcher
 Boost matcher engine class, extends reflex::BoostMatcher for Boost POSIX regex matching. More...
 
class  BufferedInput
 Buffered input. More...
 
class  FlexLexer
 Flex-compatible FlexLexer abstract base class template derived from reflex::AbstractMatcher for the reflex-generated yyFlexLexer scanner class. More...
 
class  FuzzyMatcher
 RE/flex fuzzy matcher engine class, implements reflex::Matcher fuzzy pattern matching interface with scan, find, split functors and iterators. More...
 
class  Input
 Input character sequence class for unified access to sources of input text. More...
 
struct  lazy_intersection
 Intersection of two ordered sets, with an iterator to get elements lazely. More...
 
struct  lazy_union
 Union of two ordered sets, with an iterator to get elements lazely. More...
 
class  LineMatcher
 Line matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators for matching lines only, use option 'A' to include newline with FIND, option 'N' to also FIND empty lines and option 'W' to only FIND empty lines. More...
 
class  Matcher
 RE/flex matcher engine class, implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators. More...
 
class  ORanges
 RE/flex ORanges (open-ended, ordinal value range) template class. More...
 
class  Pattern
 Pattern class holds a regex pattern and its compiled FSM opcode table or code for the reflex::Matcher engine. More...
 
class  PatternMatcher
 The pattern matcher class template extends abstract matcher base class. More...
 
class  PatternMatcher< std::string >
 A specialization of the pattern matcher class template for std::string, extends abstract matcher base class. More...
 
class  PCRE2Matcher
 PCRE2 JIT-optimized matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators, using the PCRE2 library. More...
 
class  PCRE2UTFMatcher
 PCRE2 JIT-optimized native PCRE2_UTF+PCRE2_UCP matcher engine class, extends PCRE2Matcher. More...
 
struct  range_compare
 Functor to define a total order on ranges (intervals) represented by pairs. More...
 
class  Ranges
 RE/flex Ranges template class. More...
 
class  regex_error
 Regex syntax error exceptions. More...
 
class  StdEcmaMatcher
 std matcher engine class, extends reflex::StdMatcher for ECMA std::regex::ECMAScript syntax and regex matching. More...
 
class  StdMatcher
 std matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators, using the C++11 std::regex library. More...
 
class  StdPosixMatcher
 std matcher engine class, extends reflex::StdMatcher for POSIX ERE std::regex::awk syntax and regex matching. More...
 
struct  TypeOp
 TypeOp<T>::Type = T, TypeOp<T>::ConstType = const T, TypeOp<T>::NonConstType = non-const T. More...
 
struct  TypeOp< const T >
 Template specialization of reflex::TypeOp. More...
 

Typedefs

typedef int convert_flag_type
 Conversion flags for reflex::convert. More...
 
typedef int regex_error_type
 Regex syntax error exception error code. More...
 
typedef timeval timer_type
 

Functions

int isword (int c)
 Check ASCII word-like character [A-Za-z0-9_], permitting the character range 0..303 (0x12f) and EOF. More...
 
std::string convert (const char *pattern, const char *signature, convert_flag_type flags=convert_flag::none, bool *multiline=NULL, const std::map< std::string, std::string > *macros=NULL)
 Returns the converted regex string given a regex library signature and conversion flags, throws regex_error. More...
 
std::string convert (const std::string &pattern, const char *signature, convert_flag_type flags=convert_flag::none, bool *multiline=NULL, const std::map< std::string, std::string > *macros=NULL)
 
bool supports_modifier (const char *signature, int modchar)
 
bool supports_escape (const char *signature, int escape)
 
std::string ztoa (size_t n)
 
template<typename S1 , typename S2 >
bool is_disjoint (const S1 &s1, const S2 &s2)
 Check if sets s1 and s2 are disjoint. More...
 
template<typename T , typename S >
bool is_in_set (const T &x, const S &s)
 Check if value x is in set s. More...
 
template<typename S1 , typename S2 >
bool is_subset (const S1 &s1, const S2 &s2)
 Check if set s1 is a subset of set s2. More...
 
template<typename S1 , typename S2 >
void set_insert (S1 &s1, const S2 &s2)
 Insert set s2 into set s1. More...
 
template<typename S , typename E >
void set_add (S &s, const E &e)
 Insert element e into set s. More...
 
template<typename S1 , typename S2 >
void set_delete (S1 &s1, const S2 &s2)
 Delete elements of set s2 from set s1. More...
 
template<typename S , typename E >
void set_erase (S &s, const E &e)
 Remove element e from set s when present. More...
 
size_t nlcount (const char *s, const char *e)
 Count newlines in string s up to position e in the string. More...
 
bool isutf8 (const char *s, const char *e)
 Check if valid UTF-8 encoding and does not include a NUL, but accept surrogates and 3/4 byte overlongs. More...
 
void timer_start (timer_type &t)
 Start timer. More...
 
float timer_elapsed (timer_type &t)
 Return elapsed time in milliseconds (ms) with microsecond precision since the last call up to 1 minute (wraps if elapsed time exceeds 1 minute!) More...
 
std::string latin1 (int a, int b, int esc= 'x', bool brackets=true)
 Convert an 8-bit ASCII + Latin-1 Supplement range [a,b] to a regex pattern. More...
 
std::string utf8 (int a, int b, int esc= 'x', const char *par="(", bool strict=true)
 Convert a UCS-4 range [a,b] to a UTF-8 regex pattern. More...
 
size_t utf8 (int c, char *s)
 Convert UCS-4 to UTF-8, fills with REFLEX_NONCHAR_UTF8 when out of range, or unrestricted UTF-8 with WITH_UTF8_UNRESTRICTED. More...
 
int utf8 (const char *s, const char **r=NULL)
 Convert UTF-8 to UCS, returns REFLEX_NONCHAR for invalid UTF-8 except for MUTF-8 U+0000 and 0xD800-0xDFFF surrogate halves (use WITH_UTF8_UNRESTRICTED to remove any limits on UTF-8 encodings up to 6 bytes). More...
 
std::wstring wcs (const char *s, size_t n)
 Convert UTF-8 string to wide string. More...
 
std::wstring wcs (const std::string &s)
 Convert UTF-8 string to wide string. More...
 

Variables

const unsigned short codepages [][256]
 

Typedef Documentation

Conversion flags for reflex::convert.

Regex syntax error exception error code.

typedef timeval reflex::timer_type

Function Documentation

std::string reflex::convert ( const char *  pattern,
const char *  signature,
convert_flag_type  flags = convert_flag::none,
bool *  multiline = NULL,
const std::map< std::string, std::string > *  macros = NULL 
)

Returns the converted regex string given a regex library signature and conversion flags, throws regex_error.

A regex library signature is a string of the form "decls:escapes?+.".

The optional "decls:" part specifies which modifiers and other special (?...) constructs are supported:

  • non-capturing group (?:...) is supported
  • letters and digits specify which modifiers e.g. (?ismx) are supported:
  • 'i' specifies that (?i...) case-insensitive matching is supported
  • 'm' specifies that (?m...) multiline mode is supported for the ^ and $ anchors
  • 's' specifies that (?s...) dotall mode is supported
  • 'x' specifies that (?x...) freespace mode is supported
  • ... any other letter or digit modifier, where digit modifiers support (?123) for example
  • # specifies that (?#...) comments are supported
  • = specifies that (?=...) lookahead is supported
  • < specifies that `(?'...)` 'name' groups are supported
  • < specifies that (?<...) lookbehind and <name> groups are supported
  • > specifies that (?>...) atomic groups are supported
  • > specifies that (?|...) group resets are supported
  • > specifies that (?&...) subroutines are supported
  • > specifies that (?(...) conditionals are supported
  • ! specifies that (?!=...) and (?!<...) are supported
  • ^ specifies that (?^...) negative (reflex) patterns are supported
  • * specifies that (*VERB) verbs are supported

The "escapes" characters specify which standard escapes are supported:

  • a for \a (BEL U+0007)
  • b for \b the \b word boundary
  • c for \cX control character specified by X modulo 32
  • d for \d digit [0-9] ASCII or Unicode digit
  • e for \e ESC U+001B
  • f for \f FF U+000C
  • j for \g group capture e.g. {1}
  • h for \h ASCII blank [ \t] (SP U+0020 or TAB U+0009)
  • i for \i reflex indent anchor
  • j for \j reflex dedent anchor
  • j for \k reflex undent anchor or group capture e.g. {1}
  • l for \l lower case letter [a-z] ASCII or Unicode letter
  • n for \n LF U+000A
  • o for \o octal ASCII or Unicode code
  • p for \p{C} Unicode character classes, also implies Unicode ., {X}, , , , , , and UTF-8 patterns
  • r for \r CR U+000D
  • s for \s space (SP, TAB, LF, VT, FF, or CR)
  • t for \t TAB U+0009
  • u for \u ASCII upper case letter [A-Z] (when not followed by {XXXX})
  • v for \v VT U+000B
  • w for \w ASCII word-like character [0-9A-Z_a-z]
  • x for \xXX 8-bit character encoding in hexadecimal
  • y for \y word boundary
  • z for \z end of input anchor
  • `for `\ begin of input anchor
  • ' for \' end of input anchor
  • < for \< left word boundary
  • > for \> right word boundary
  • A for \A begin of input anchor
  • B for \B non-word boundary
  • D for \D ASCII non-digit [^0-9]
  • H for \H ASCII non-blank [^ \t]
  • L for \L ASCII non-lower case letter [^a-z]
  • N for \N not a newline
  • P for \P{C} Unicode inverse character classes, see 'p'
  • Q for \Q...\E quotations
  • R for \R Unicode line break
  • S for \S ASCII non-space (no SP, TAB, LF, VT, FF, or CR)
  • U for \U ASCII non-upper case letter [^A-Z]
  • W for \W ASCII non-word-like character [^0-9A-Z_a-z]
  • X for \X any Unicode character
  • Z for \Z end of input anchor, before the final line break
  • 0 for \0nnn 8-bit character encoding in octal requires a leading 0
  • '1' to '9' for backreferences (not applicable to lexer specifications)

Note that 'p' is a special case to support Unicode-based matchers that natively support UTF8 patterns and Unicode classes {C}, {C}, , , , , , , , , , and {X}. Basically, 'p' prevents conversion of Unicode patterns to UTF8. This special case does not support {NAME} expansions in bracket lists such as [a-z||{upper}] and {lower}{+}{upper} used in lexer specifications.

The optional "?+" specify lazy and possessive support:

  • ? lazy quantifiers for repeats are supported
  • + possessive quantifiers for repeats are supported

An optional "." (dot) specifies that dot matches any character except newline. A dot is implied by the presence of the 's' modifier, and can be omitted in that case.

An optional "[" specifies that bracket list union, intersection, and subtraction are supported, i.e. [–[a-z]].

Parameters
patternregex string pattern to convert
signatureregex library signature
flagsconversion flags
multilineset to true if pattern may be multiline
macros{name} macros to expand
std::string reflex::convert ( const std::string &  pattern,
const char *  signature,
convert_flag_type  flags = convert_flag::none,
bool *  multiline = NULL,
const std::map< std::string, std::string > *  macros = NULL 
)
inline
template<typename S1 , typename S2 >
bool reflex::is_disjoint ( const S1 &  s1,
const S2 &  s2 
)

Check if sets s1 and s2 are disjoint.

Returns
true or false
template<typename T , typename S >
bool reflex::is_in_set ( const T &  x,
const S &  s 
)
inline

Check if value x is in set s.

Returns
true or false
template<typename S1 , typename S2 >
bool reflex::is_subset ( const S1 &  s1,
const S2 &  s2 
)

Check if set s1 is a subset of set s2.

Returns
true or false
bool reflex::isutf8 ( const char *  s,
const char *  e 
)

Check if valid UTF-8 encoding and does not include a NUL, but accept surrogates and 3/4 byte overlongs.

int reflex::isword ( int  c)
inline

Check ASCII word-like character [A-Za-z0-9_], permitting the character range 0..303 (0x12f) and EOF.

Returns
nonzero if argument c is in [A-Za-z0-9_], zero otherwise
Parameters
cCharacter to check
std::string reflex::latin1 ( int  a,
int  b,
int  esc = 'x',
bool  brackets = true 
)

Convert an 8-bit ASCII + Latin-1 Supplement range [a,b] to a regex pattern.

Returns
regex string to match the UCS range encoded in UTF-8
Parameters
alower bound of UCS range
bupper bound of UCS range
escescape char 'x' for hex , or '0' or '\0' for octal \0nnn and
bracketsplace in [ brackets ]
size_t reflex::nlcount ( const char *  s,
const char *  e 
)

Count newlines in string s up to position e in the string.

template<typename S , typename E >
void reflex::set_add ( S &  s,
const E &  e 
)
inline

Insert element e into set s.

template<typename S1 , typename S2 >
void reflex::set_delete ( S1 &  s1,
const S2 &  s2 
)

Delete elements of set s2 from set s1.

template<typename S , typename E >
void reflex::set_erase ( S &  s,
const E &  e 
)

Remove element e from set s when present.

template<typename S1 , typename S2 >
void reflex::set_insert ( S1 &  s1,
const S2 &  s2 
)
inline

Insert set s2 into set s1.

bool reflex::supports_escape ( const char *  signature,
int  escape 
)
inline
bool reflex::supports_modifier ( const char *  signature,
int  modchar 
)
inline
float reflex::timer_elapsed ( timer_type t)
inline

Return elapsed time in milliseconds (ms) with microsecond precision since the last call up to 1 minute (wraps if elapsed time exceeds 1 minute!)

Parameters
ttimer to be updated
void reflex::timer_start ( timer_type t)
inline

Start timer.

Parameters
ttimer to be initialized
std::string reflex::utf8 ( int  a,
int  b,
int  esc = 'x',
const char *  par = "(",
bool  strict = true 
)

Convert a UCS-4 range [a,b] to a UTF-8 regex pattern.

Returns
regex string to match the UCS range encoded in UTF-8
Parameters
alower bound of UCS range
bupper bound of UCS range
escescape char 'x' for hex , or '0' or '\0' for octal \0nnn and
parcapturing or non-capturing parenthesis "(?:"
strictreturned regex is strict UTF-8 (true) or permissive and lean UTF-8 (false)
size_t reflex::utf8 ( int  c,
char *  s 
)
inline

Convert UCS-4 to UTF-8, fills with REFLEX_NONCHAR_UTF8 when out of range, or unrestricted UTF-8 with WITH_UTF8_UNRESTRICTED.

Returns
length (in bytes) of UTF-8 character sequence stored in s
Parameters
cUCS-4 character U+0000 to U+10ffff (unless WITH_UTF8_UNRESTRICTED)
spoints to the buffer to populate with UTF-8 (1 to 6 bytes) not NUL-terminated
int reflex::utf8 ( const char *  s,
const char **  r = NULL 
)
inline

Convert UTF-8 to UCS, returns REFLEX_NONCHAR for invalid UTF-8 except for MUTF-8 U+0000 and 0xD800-0xDFFF surrogate halves (use WITH_UTF8_UNRESTRICTED to remove any limits on UTF-8 encodings up to 6 bytes).

Returns
UCS character
Parameters
spoints to the buffer with UTF-8 (1 to 6 bytes)
rpoints to pointer to set to the new position in s after the UTF-8 sequence, optional
std::wstring reflex::wcs ( const char *  s,
size_t  n 
)
inline

Convert UTF-8 string to wide string.

Returns
wide string
Parameters
sstring with UTF-8 to convert
nlength of the string to convert
std::wstring reflex::wcs ( const std::string &  s)
inline

Convert UTF-8 string to wide string.

Returns
wide string
Parameters
sstring with UTF-8 to convert
std::string reflex::ztoa ( size_t  n)
inline

Variable Documentation

const unsigned short reflex::codepages[][256]