|
class | AbstractLexer |
| The abstract lexer class template that is the abstract root class of all reflex-generated scanners. More...
|
|
class | AbstractMatcher |
| The abstract matcher base class template defines an interface for all pattern matcher engines. More...
|
|
class | Bits |
| RE/flex Bits class for dynamic bit vectors. More...
|
|
class | BoostMatcher |
| Boost matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators, using the Boost::regex library. More...
|
|
class | BoostPerlMatcher |
| Boost matcher engine class, extends reflex::BoostMatcher for Boost Perl regex matching. More...
|
|
class | BoostPosixMatcher |
| Boost matcher engine class, extends reflex::BoostMatcher for Boost POSIX regex matching. More...
|
|
class | BufferedInput |
| Buffered input. More...
|
|
class | FlexLexer |
| Flex-compatible FlexLexer abstract base class template derived from reflex::AbstractMatcher for the reflex-generated yyFlexLexer scanner class. More...
|
|
class | FuzzyMatcher |
| RE/flex fuzzy matcher engine class, implements reflex::Matcher fuzzy pattern matching interface with scan, find, split functors and iterators. More...
|
|
class | Input |
| Input character sequence class for unified access to sources of input text. More...
|
|
struct | lazy_intersection |
| Intersection of two ordered sets, with an iterator to get elements lazely. More...
|
|
struct | lazy_union |
| Union of two ordered sets, with an iterator to get elements lazely. More...
|
|
class | LineMatcher |
| Line matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators for matching lines only, use option 'A' to include newline with FIND, option 'N' to also FIND empty lines and option 'W' to only FIND empty lines. More...
|
|
class | Matcher |
| RE/flex matcher engine class, implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators. More...
|
|
class | ORanges |
| RE/flex ORanges (open-ended, ordinal value range) template class. More...
|
|
class | Pattern |
| Pattern class holds a regex pattern and its compiled FSM opcode table or code for the reflex::Matcher engine. More...
|
|
class | PatternMatcher |
| The pattern matcher class template extends abstract matcher base class. More...
|
|
class | PatternMatcher< std::string > |
| A specialization of the pattern matcher class template for std::string, extends abstract matcher base class. More...
|
|
class | PCRE2Matcher |
| PCRE2 JIT-optimized matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators, using the PCRE2 library. More...
|
|
class | PCRE2UTFMatcher |
| PCRE2 JIT-optimized native PCRE2_UTF+PCRE2_UCP matcher engine class, extends PCRE2Matcher. More...
|
|
struct | range_compare |
| Functor to define a total order on ranges (intervals) represented by pairs. More...
|
|
class | Ranges |
| RE/flex Ranges template class. More...
|
|
class | regex_error |
| Regex syntax error exceptions. More...
|
|
class | StdEcmaMatcher |
| std matcher engine class, extends reflex::StdMatcher for ECMA std::regex::ECMAScript syntax and regex matching. More...
|
|
class | StdMatcher |
| std matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators, using the C++11 std::regex library. More...
|
|
class | StdPosixMatcher |
| std matcher engine class, extends reflex::StdMatcher for POSIX ERE std::regex::awk syntax and regex matching. More...
|
|
struct | TypeOp |
| TypeOp<T>::Type = T, TypeOp<T>::ConstType = const T, TypeOp<T>::NonConstType = non-const T. More...
|
|
struct | TypeOp< const T > |
| Template specialization of reflex::TypeOp. More...
|
|
|
int | isword (int c) |
| Check ASCII word-like character [A-Za-z0-9_] , permitting the character range 0..303 (0x12f) and EOF. More...
|
|
std::string | convert (const char *pattern, const char *signature, convert_flag_type flags=convert_flag::none, bool *multiline=NULL, const std::map< std::string, std::string > *macros=NULL) |
| Returns the converted regex string given a regex library signature and conversion flags, throws regex_error. More...
|
|
std::string | convert (const std::string &pattern, const char *signature, convert_flag_type flags=convert_flag::none, bool *multiline=NULL, const std::map< std::string, std::string > *macros=NULL) |
|
bool | supports_modifier (const char *signature, int modchar) |
|
bool | supports_escape (const char *signature, int escape) |
|
std::string | ztoa (size_t n) |
|
template<typename S1 , typename S2 > |
bool | is_disjoint (const S1 &s1, const S2 &s2) |
| Check if sets s1 and s2 are disjoint. More...
|
|
template<typename T , typename S > |
bool | is_in_set (const T &x, const S &s) |
| Check if value x is in set s . More...
|
|
template<typename S1 , typename S2 > |
bool | is_subset (const S1 &s1, const S2 &s2) |
| Check if set s1 is a subset of set s2 . More...
|
|
template<typename S1 , typename S2 > |
void | set_insert (S1 &s1, const S2 &s2) |
| Insert set s2 into set s1 . More...
|
|
template<typename S , typename E > |
void | set_add (S &s, const E &e) |
| Insert element e into set s . More...
|
|
template<typename S1 , typename S2 > |
void | set_delete (S1 &s1, const S2 &s2) |
| Delete elements of set s2 from set s1 . More...
|
|
template<typename S , typename E > |
void | set_erase (S &s, const E &e) |
| Remove element e from set s when present. More...
|
|
size_t | nlcount (const char *s, const char *e) |
| Count newlines in string s up to position e in the string. More...
|
|
bool | isutf8 (const char *s, const char *e) |
| Check if valid UTF-8 encoding and does not include a NUL, but accept surrogates and 3/4 byte overlongs. More...
|
|
void | timer_start (timer_type &t) |
| Start timer. More...
|
|
float | timer_elapsed (timer_type &t) |
| Return elapsed time in milliseconds (ms) with microsecond precision since the last call up to 1 minute (wraps if elapsed time exceeds 1 minute!) More...
|
|
std::string | latin1 (int a, int b, int esc= 'x', bool brackets=true) |
| Convert an 8-bit ASCII + Latin-1 Supplement range [a,b] to a regex pattern. More...
|
|
std::string | utf8 (int a, int b, int esc= 'x', const char *par="(", bool strict=true) |
| Convert a UCS-4 range [a,b] to a UTF-8 regex pattern. More...
|
|
size_t | utf8 (int c, char *s) |
| Convert UCS-4 to UTF-8, fills with REFLEX_NONCHAR_UTF8 when out of range, or unrestricted UTF-8 with WITH_UTF8_UNRESTRICTED. More...
|
|
int | utf8 (const char *s, const char **r=NULL) |
| Convert UTF-8 to UCS, returns REFLEX_NONCHAR for invalid UTF-8 except for MUTF-8 U+0000 and 0xD800-0xDFFF surrogate halves (use WITH_UTF8_UNRESTRICTED to remove any limits on UTF-8 encodings up to 6 bytes). More...
|
|
std::wstring | wcs (const char *s, size_t n) |
| Convert UTF-8 string to wide string. More...
|
|
std::wstring | wcs (const std::string &s) |
| Convert UTF-8 string to wide string. More...
|
|
std::string reflex::convert |
( |
const char * |
pattern, |
|
|
const char * |
signature, |
|
|
convert_flag_type |
flags = convert_flag::none , |
|
|
bool * |
multiline = NULL , |
|
|
const std::map< std::string, std::string > * |
macros = NULL |
|
) |
| |
Returns the converted regex string given a regex library signature and conversion flags, throws regex_error.
A regex library signature is a string of the form "decls:escapes?+."
.
The optional "decls:"
part specifies which modifiers and other special (?...)
constructs are supported:
- non-capturing group
(?:...)
is supported
- letters and digits specify which modifiers e.g. (?ismx) are supported:
- 'i' specifies that
(?i...)
case-insensitive matching is supported
- 'm' specifies that
(?m...)
multiline mode is supported for the ^ and $ anchors
- 's' specifies that
(?s...)
dotall mode is supported
- 'x' specifies that
(?x...)
freespace mode is supported
- ... any other letter or digit modifier, where digit modifiers support
(?123)
for example
#
specifies that (?#...)
comments are supported
=
specifies that (?=...)
lookahead is supported
<
specifies that `(?'...)` 'name' groups are supported
<
specifies that (?<...)
lookbehind and <name> groups are supported
>
specifies that (?>...)
atomic groups are supported
>
specifies that (?|...)
group resets are supported
>
specifies that (?&...)
subroutines are supported
>
specifies that (?(...)
conditionals are supported
!
specifies that (?!=...)
and (?!<...)
are supported
^
specifies that (?^...)
negative (reflex) patterns are supported
*
specifies that (*VERB)
verbs are supported
The "escapes"
characters specify which standard escapes are supported:
a
for \a
(BEL U+0007)
b
for \b
the \b
word boundary
c
for \cX
control character specified by X
modulo 32
d
for \d
digit [0-9]
ASCII or Unicode digit
e
for \e
ESC U+001B
f
for \f
FF U+000C
j
for \g
group capture e.g. {1}
h
for \h
ASCII blank [ \t]
(SP U+0020 or TAB U+0009)
i
for \i
reflex indent anchor
j
for \j
reflex dedent anchor
j
for \k
reflex undent anchor or group capture e.g. {1}
l
for \l
lower case letter [a-z]
ASCII or Unicode letter
n
for \n
LF U+000A
o
for \o
octal ASCII or Unicode code
p
for \p{C}
Unicode character classes, also implies Unicode ., {X}, , , , , , and UTF-8 patterns
r
for \r
CR U+000D
s
for \s
space (SP, TAB, LF, VT, FF, or CR)
t
for \t
TAB U+0009
u
for \u
ASCII upper case letter [A-Z]
(when not followed by {XXXX}
)
v
for \v
VT U+000B
w
for \w
ASCII word-like character [0-9A-Z_a-z]
x
for \xXX
8-bit character encoding in hexadecimal
y
for \y
word boundary
z
for \z
end of input anchor
- `
for `\
begin of input anchor
'
for \'
end of input anchor
<
for \<
left word boundary
>
for \>
right word boundary
A
for \A
begin of input anchor
B
for \B
non-word boundary
D
for \D
ASCII non-digit [^0-9]
H
for \H
ASCII non-blank [^ \t]
L
for \L
ASCII non-lower case letter [^a-z]
N
for \N
not a newline
P
for \P{C}
Unicode inverse character classes, see 'p'
Q
for \Q...\E
quotations
R
for \R
Unicode line break
S
for \S
ASCII non-space (no SP, TAB, LF, VT, FF, or CR)
U
for \U
ASCII non-upper case letter [^A-Z]
W
for \W
ASCII non-word-like character [^0-9A-Z_a-z]
X
for \X
any Unicode character
Z
for \Z
end of input anchor, before the final line break
0
for \0nnn
8-bit character encoding in octal requires a leading 0
- '1' to '9' for backreferences (not applicable to lexer specifications)
Note that 'p' is a special case to support Unicode-based matchers that natively support UTF8 patterns and Unicode classes
{C}, {C}, , , , , , , , , , and {X}. Basically, 'p' prevents conversion of Unicode patterns to UTF8. This special case does not support {NAME} expansions in bracket lists such as [a-z||{upper}] and {lower}{+}{upper} used in lexer specifications.
The optional "?+"
specify lazy and possessive support:
?
lazy quantifiers for repeats are supported
+
possessive quantifiers for repeats are supported
An optional "."
(dot) specifies that dot matches any character except newline. A dot is implied by the presence of the 's' modifier, and can be omitted in that case.
An optional "["
specifies that bracket list union, intersection, and subtraction are supported, i.e. [–[a-z]].
- Parameters
-
pattern | regex string pattern to convert |
signature | regex library signature |
flags | conversion flags |
multiline | set to true if pattern may be multiline |
macros | {name} macros to expand |