utf8.h File Reference

updated Thu Jan 9 2025 by Robert van Engelen
 
Namespaces | Macros | Functions
utf8.h File Reference

RE/flex UCS to UTF-8 converters. More...

#include <cstddef>
#include <cstring>
#include <string>
Include dependency graph for utf8.h:
This graph shows which files directly or indirectly include this file:

Namespaces

 reflex
 

Macros

#define REFLEX_NONCHAR   (0x200000)
 Replace invalid UTF-8 with the non-character U+200000 code point for guaranteed error detection (the U+FFFD code point makes error detection harder and possible to miss). More...
 
#define REFLEX_NONCHAR_UTF8   "\xf8\x88\x80\x80\x80"
 

Functions

std::string reflex::latin1 (int a, int b, int esc= 'x', bool brackets=true)
 Convert an 8-bit ASCII + Latin-1 Supplement range [a,b] to a regex pattern. More...
 
std::string reflex::utf8 (int a, int b, int esc= 'x', const char *par="(", bool strict=true)
 Convert a UCS-4 range [a,b] to a UTF-8 regex pattern. More...
 
size_t reflex::utf8 (int c, char *s)
 Convert UCS-4 to UTF-8, fills with REFLEX_NONCHAR_UTF8 when out of range, or unrestricted UTF-8 with WITH_UTF8_UNRESTRICTED. More...
 
int reflex::utf8 (const char *s, const char **r=NULL)
 Convert UTF-8 to UCS, returns REFLEX_NONCHAR for invalid UTF-8 except for MUTF-8 U+0000 and 0xD800-0xDFFF surrogate halves (use WITH_UTF8_UNRESTRICTED to remove any limits on UTF-8 encodings up to 6 bytes). More...
 
std::wstring reflex::wcs (const char *s, size_t n)
 Convert UTF-8 string to wide string. More...
 
std::wstring reflex::wcs (const std::string &s)
 Convert UTF-8 string to wide string. More...
 

Detailed Description

RE/flex UCS to UTF-8 converters.

Author
Robert van Engelen - engel.nosp@m.en@g.nosp@m.enivi.nosp@m.a.co.nosp@m.m

Macro Definition Documentation

#define REFLEX_NONCHAR   (0x200000)

Replace invalid UTF-8 with the non-character U+200000 code point for guaranteed error detection (the U+FFFD code point makes error detection harder and possible to miss).

#define REFLEX_NONCHAR_UTF8   "\xf8\x88\x80\x80\x80"