Namespaces
	convert_flag

	Posix

	Unicode

Classes
class	AbstractLexer
	The abstract lexer class template that is the abstract root class of all reflex-generated scanners. More...

class	AbstractMatcher
	The abstract matcher base class template defines an interface for all pattern matcher engines. More...

class	Bits
	RE/flex Bits class for dynamic bit vectors. More...

class	BoostMatcher
	Boost matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators, using the Boost::regex library. More...

class	BoostPerlMatcher
	Boost matcher engine class, extends reflex::BoostMatcher for Boost Perl regex matching. More...

class	BoostPosixMatcher
	Boost matcher engine class, extends reflex::BoostMatcher for Boost POSIX regex matching. More...

class	BufferedInput
	Buffered input. More...

class	FlexLexer
	Flex-compatible FlexLexer abstract base class template derived from reflex::AbstractMatcher for the reflex-generated yyFlexLexer scanner class. More...

class	FuzzyMatcher
	RE/flex fuzzy matcher engine class, implements reflex::Matcher fuzzy pattern matching interface with scan, find, split functors and iterators. More...

class	Input
	Input character sequence class for unified access to sources of input text. More...

struct	lazy_intersection
	Intersection of two ordered sets, with an iterator to get elements lazely. More...

struct	lazy_union
	Union of two ordered sets, with an iterator to get elements lazely. More...

class	LineMatcher
	Line matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators for matching lines only, use option 'A' to include newline with FIND, option 'N' to also FIND empty lines and option 'W' to only FIND empty lines. More...

class	Matcher
	RE/flex matcher engine class, implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators. More...

class	ORanges
	RE/flex ORanges (open-ended, ordinal value range) template class. More...

class	Pattern
	Pattern class holds a regex pattern and its compiled FSM opcode table or code for the reflex::Matcher engine. More...

class	PatternMatcher
	The pattern matcher class template extends abstract matcher base class. More...

class	PatternMatcher< std::string >
	A specialization of the pattern matcher class template for std::string, extends abstract matcher base class. More...

class	PCRE2Matcher
	PCRE2 JIT-optimized matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators, using the PCRE2 library. More...

class	PCRE2UTFMatcher
	PCRE2 JIT-optimized native PCRE2_UTF+PCRE2_UCP matcher engine class, extends PCRE2Matcher. More...

struct	range_compare
	Functor to define a total order on ranges (intervals) represented by pairs. More...

class	Ranges
	RE/flex Ranges template class. More...

class	regex_error
	Regex syntax error exceptions. More...

class	StdEcmaMatcher
	std matcher engine class, extends reflex::StdMatcher for ECMA std::regex::ECMAScript syntax and regex matching. More...

class	StdMatcher
	std matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators, using the C++11 std::regex library. More...

class	StdPosixMatcher
	std matcher engine class, extends reflex::StdMatcher for POSIX ERE std::regex::awk syntax and regex matching. More...

struct	TypeOp
	TypeOp<T>::Type = T, TypeOp<T>::ConstType = const T, TypeOp<T>::NonConstType = non-const T. More...

struct	TypeOp< const T >
	Template specialization of reflex::TypeOp. More...

Typedefs
typedef int	convert_flag_type
	Conversion flags for reflex::convert. More...

typedef int	regex_error_type
	Regex syntax error exception error code. More...

typedef timeval	timer_type

Functions
int	isword (int c)
	Check ASCII word-like character `[A-Za-z0-9_]`, permitting the character range 0..303 (0x12f) and EOF. More...

std::string	convert (const char pattern, const char signature, convert_flag_type flags=convert_flag::none, bool multiline=NULL, const std::map< std::string, std::string > macros=NULL)
	Returns the converted regex string given a regex library signature and conversion flags, throws regex_error. More...

std::string	convert (const std::string &pattern, const char signature, convert_flag_type flags=convert_flag::none, bool multiline=NULL, const std::map< std::string, std::string > *macros=NULL)

bool	supports_modifier (const char *signature, int modchar)

bool	supports_escape (const char *signature, int escape)

std::string	ztoa (size_t n)

template<typename S1 , typename S2 >
bool	is_disjoint (const S1 &s1, const S2 &s2)
	Check if sets `s1` and `s2` are disjoint. More...

template<typename T , typename S >
bool	is_in_set (const T &x, const S &s)
	Check if value `x` is in set `s`. More...

template<typename S1 , typename S2 >
bool	is_subset (const S1 &s1, const S2 &s2)
	Check if set `s1` is a subset of set `s2`. More...

template<typename S1 , typename S2 >
void	set_insert (S1 &s1, const S2 &s2)
	Insert set `s2` into set `s1`. More...

template<typename S , typename E >
void	set_add (S &s, const E &e)
	Insert element `e` into set `s`. More...

template<typename S1 , typename S2 >
void	set_delete (S1 &s1, const S2 &s2)
	Delete elements of set `s2` from set `s1`. More...

template<typename S , typename E >
void	set_erase (S &s, const E &e)
	Remove element `e` from set `s` when present. More...

size_t	nlcount (const char s, const char e)
	Count newlines in string s up to position e in the string. More...

bool	isutf8 (const char s, const char e)
	Check if valid UTF-8 encoding and does not include a NUL, but accept surrogates and 3/4 byte overlongs. More...

void	timer_start (timer_type &t)
	Start timer. More...

float	timer_elapsed (timer_type &t)
	Return elapsed time in milliseconds (ms) with microsecond precision since the last call up to 1 minute (wraps if elapsed time exceeds 1 minute!) More...

std::string	latin1 (int a, int b, int esc= 'x', bool brackets=true)
	Convert an 8-bit ASCII + Latin-1 Supplement range [a,b] to a regex pattern. More...

std::string	utf8 (int a, int b, int esc= 'x', const char *par="(", bool strict=true)
	Convert a UCS-4 range [a,b] to a UTF-8 regex pattern. More...

size_t	utf8 (int c, char *s)
	Convert UCS-4 to UTF-8, fills with REFLEX_NONCHAR_UTF8 when out of range, or unrestricted UTF-8 with WITH_UTF8_UNRESTRICTED. More...

int	utf8 (const char s, const char *r=NULL)
	Convert UTF-8 to UCS, returns REFLEX_NONCHAR for invalid UTF-8 except for MUTF-8 U+0000 and 0xD800-0xDFFF surrogate halves (use WITH_UTF8_UNRESTRICTED to remove any limits on UTF-8 encodings up to 6 bytes). More...

std::wstring	wcs (const char *s, size_t n)
	Convert UTF-8 string to wide string. More...

std::wstring	wcs (const std::string &s)
	Convert UTF-8 string to wide string. More...

Variables
const unsigned short	codepages [][256]

Typedef Documentation

typedef int reflex::convert_flag_type

Conversion flags for reflex::convert.

typedef int reflex::regex_error_type

Regex syntax error exception error code.

typedef timeval reflex::timer_type

Function Documentation

std::string reflex::convert	(	const char *	pattern,
		const char *	signature,
		convert_flag_type	flags = `convert_flag::none`,
		bool *	multiline = `NULL`,
		const std::map< std::string, std::string > *	macros = `NULL`
	)

Returns the converted regex string given a regex library signature and conversion flags, throws regex_error.

A regex library signature is a string of the form "decls:escapes?+.".

The optional "decls:" part specifies which modifiers and other special (?...) constructs are supported:

non-capturing group (?:...) is supported
letters and digits specify which modifiers e.g. (?ismx) are supported:
'i' specifies that (?i...) case-insensitive matching is supported
'm' specifies that (?m...) multiline mode is supported for the ^ and $ anchors
's' specifies that (?s...) dotall mode is supported
'x' specifies that (?x...) freespace mode is supported
... any other letter or digit modifier, where digit modifiers support (?123) for example
# specifies that (?#...) comments are supported
= specifies that (?=...) lookahead is supported
< specifies that `(?'...)` 'name' groups are supported
< specifies that (?<...) lookbehind and <name> groups are supported
> specifies that (?>...) atomic groups are supported
> specifies that (?|...) group resets are supported
> specifies that (?&...) subroutines are supported
> specifies that (?(...) conditionals are supported
! specifies that (?!=...) and (?!<...) are supported
^ specifies that (?^...) negative (reflex) patterns are supported
* specifies that (*VERB) verbs are supported

The "escapes" characters specify which standard escapes are supported:

a for \a (BEL U+0007)
b for \b the \b word boundary
c for \cX control character specified by X modulo 32
d for \d digit [0-9] ASCII or Unicode digit
e for \e ESC U+001B
f for \f FF U+000C
j for \g group capture e.g. {1}
h for \h ASCII blank [ \t] (SP U+0020 or TAB U+0009)
i for \i reflex indent anchor
j for \j reflex dedent anchor
j for \k reflex undent anchor or group capture e.g. {1}
l for \l lower case letter [a-z] ASCII or Unicode letter
n for \n LF U+000A
o for \o octal ASCII or Unicode code
p for \p{C} Unicode character classes, also implies Unicode ., {X}, , , , , , and UTF-8 patterns
r for \r CR U+000D
s for \s space (SP, TAB, LF, VT, FF, or CR)
t for \t TAB U+0009
u for \u ASCII upper case letter [A-Z] (when not followed by {XXXX})
v for \v VT U+000B
w for \w ASCII word-like character [0-9A-Z_a-z]
x for \xXX 8-bit character encoding in hexadecimal
y for \y word boundary
z for \z end of input anchor
`for `\ begin of input anchor
' for \' end of input anchor
< for \< left word boundary
> for \> right word boundary
A for \A begin of input anchor
B for \B non-word boundary
D for \D ASCII non-digit [^0-9]
H for \H ASCII non-blank [^ \t]
L for \L ASCII non-lower case letter [^a-z]
N for \N not a newline
P for \P{C} Unicode inverse character classes, see 'p'
Q for \Q...\E quotations
R for \R Unicode line break
S for \S ASCII non-space (no SP, TAB, LF, VT, FF, or CR)
U for \U ASCII non-upper case letter [^A-Z]
W for \W ASCII non-word-like character [^0-9A-Z_a-z]
X for \X any Unicode character
Z for \Z end of input anchor, before the final line break
0 for \0nnn 8-bit character encoding in octal requires a leading 0
'1' to '9' for backreferences (not applicable to lexer specifications)

Note that 'p' is a special case to support Unicode-based matchers that natively support UTF8 patterns and Unicode classes {C}, {C}, , , , , , , , , , and {X}. Basically, 'p' prevents conversion of Unicode patterns to UTF8. This special case does not support {NAME} expansions in bracket lists such as [a-z||{upper}] and {lower}{+}{upper} used in lexer specifications.

The optional "?+" specify lazy and possessive support:

? lazy quantifiers for repeats are supported
+ possessive quantifiers for repeats are supported

An optional "." (dot) specifies that dot matches any character except newline. A dot is implied by the presence of the 's' modifier, and can be omitted in that case.

An optional "[" specifies that bracket list union, intersection, and subtraction are supported, i.e. [–[a-z]].

Parameters

pattern	regex string pattern to convert
signature	regex library signature
flags	conversion flags
multiline	set to true if pattern may be multiline
macros	{name} macros to expand

std::string reflex::convert	(	const std::string &	pattern,
		const char *	signature,
		convert_flag_type	flags = `convert_flag::none`,
		bool *	multiline = `NULL`,
		const std::map< std::string, std::string > *	macros = `NULL`
	)

inline

template<typename S1 , typename S2 >

bool reflex::is_disjoint	(	const S1 &	s1,
		const S2 &	s2
	)

Check if sets s1 and s2 are disjoint.

Returns: true or false

template<typename T , typename S >

bool reflex::is_in_set	(	const T &	x,
		const S &	s
	)

inline

Check if value x is in set s.

Returns: true or false

template<typename S1 , typename S2 >

bool reflex::is_subset	(	const S1 &	s1,
		const S2 &	s2
	)

Check if set s1 is a subset of set s2.

Returns: true or false

bool reflex::isutf8	(	const char *	s,
		const char *	e
	)

Check if valid UTF-8 encoding and does not include a NUL, but accept surrogates and 3/4 byte overlongs.

int reflex::isword ( int c )

inline

Check ASCII word-like character [A-Za-z0-9_], permitting the character range 0..303 (0x12f) and EOF.

Returns: nonzero if argument c is in [A-Za-z0-9_], zero otherwise

Parameters

c	Character to check

std::string reflex::latin1	(	int	a,
		int	b,
		int	esc = `'x'`,
		bool	brackets = `true`
	)

Convert an 8-bit ASCII + Latin-1 Supplement range [a,b] to a regex pattern.

Returns: regex string to match the UCS range encoded in UTF-8

Parameters

a	lower bound of UCS range
b	upper bound of UCS range
esc	escape char 'x' for hex , or '0' or '\0' for octal \0nnn and
brackets	place in [ brackets ]

size_t reflex::nlcount	(	const char *	s,
		const char *	e
	)

Count newlines in string s up to position e in the string.

template<typename S , typename E >

void reflex::set_add	(	S &	s,
		const E &	e
	)

inline

Insert element e into set s.

template<typename S1 , typename S2 >

void reflex::set_delete	(	S1 &	s1,
		const S2 &	s2
	)

Delete elements of set s2 from set s1.

template<typename S , typename E >

void reflex::set_erase	(	S &	s,
		const E &	e
	)

Remove element e from set s when present.

template<typename S1 , typename S2 >

void reflex::set_insert	(	S1 &	s1,
		const S2 &	s2
	)

inline

Insert set s2 into set s1.

bool reflex::supports_escape	(	const char *	signature,
		int	escape
	)

inline

bool reflex::supports_modifier	(	const char *	signature,
		int	modchar
	)

inline

float reflex::timer_elapsed ( timer_type & t )

inline

Return elapsed time in milliseconds (ms) with microsecond precision since the last call up to 1 minute (wraps if elapsed time exceeds 1 minute!)

Parameters

t	timer to be updated

void reflex::timer_start ( timer_type & t )

inline

Start timer.

Parameters

t	timer to be initialized

std::string reflex::utf8	(	int	a,
		int	b,
		int	esc = `'x'`,
		const char *	par = `"("`,
		bool	strict = `true`
	)

Convert a UCS-4 range [a,b] to a UTF-8 regex pattern.

Returns: regex string to match the UCS range encoded in UTF-8

Parameters

a	lower bound of UCS range
b	upper bound of UCS range
esc	escape char 'x' for hex , or '0' or '\0' for octal \0nnn and
par	capturing or non-capturing parenthesis "(?:"
strict	returned regex is strict UTF-8 (true) or permissive and lean UTF-8 (false)

size_t reflex::utf8	(	int	c,
		char *	s
	)

inline

Convert UCS-4 to UTF-8, fills with REFLEX_NONCHAR_UTF8 when out of range, or unrestricted UTF-8 with WITH_UTF8_UNRESTRICTED.

Returns: length (in bytes) of UTF-8 character sequence stored in s without a terminating '\0'

Parameters

c	UCS-4 character U+0000 to U+10ffff (unless WITH_UTF8_UNRESTRICTED)
s	points to the buffer to populate with UTF-8 (1 to 6 bytes) not NUL-terminated

int reflex::utf8	(	const char *	s,
		const char **	r = `NULL`
	)

inline

Convert UTF-8 to UCS, returns REFLEX_NONCHAR for invalid UTF-8 except for MUTF-8 U+0000 and 0xD800-0xDFFF surrogate halves (use WITH_UTF8_UNRESTRICTED to remove any limits on UTF-8 encodings up to 6 bytes).

Returns: UCS character

Parameters

s	points to the buffer with UTF-8 (1 to 6 bytes)
r	points to pointer to set to the new position in s after the UTF-8 sequence, optional

std::wstring reflex::wcs	(	const char *	s,
		size_t	n
	)

inline

Convert UTF-8 string to wide string.

Returns: wide string

Parameters

s	string with UTF-8 to convert
n	length of the string to convert

std::wstring reflex::wcs ( const std::string & s )

inline

Convert UTF-8 string to wide string.

Returns: wide string

Parameters

s	string with UTF-8 to convert

std::string reflex::ztoa ( size_t n )

inline

Variable Documentation

const unsigned short reflex::codepages[][256]

Namespaces

Classes

Typedefs

Functions

Variables

Typedef Documentation

Function Documentation

Variable Documentation