std::regex_token_iterator

在头文件 `<regex>` 中定义
template< class BidirIt, class CharT = typename std::iterator_traits<BidirIt>::value_type, class Traits = std::regex_traits<CharT> > class regex_token_iterator		(自 C++11 起)

std::regex_token_iterator 是一个只读的 LegacyForwardIterator，它访问基础字符序列中正则表达式每个匹配的各个子匹配项。它还可以用来访问未被给定正则表达式匹配的序列部分（例如，作为标记器）。

在构造时，它构造一个 std::regex_iterator，并且在每次递增时，它会遍历当前 match_results 中请求的子匹配项，并在递增到最后一个子匹配项之外时递增基础 std::regex_iterator。

默认构造的 std::regex_token_iterator 是序列结束迭代器。当有效的 std::regex_token_iterator 在到达最后一个匹配的最后一个子匹配项之后递增时，它将变为等于序列结束迭代器。对它进行进一步的解引用或递增会导致未定义的行为。

在变为序列结束迭代器之前，std::regex_token_iterator 可能会变为一个 *后缀迭代器*，如果索引 -1（未匹配片段）出现在请求的子匹配索引列表中。如果对这种迭代器进行解引用，它将返回一个 match_results，对应于最后一个匹配和序列结束之间的字符序列。

std::regex_token_iterator 的典型实现包含基础 std::regex_iterator、一个容器（例如，std::vector<int>) 包含请求的子匹配索引、内部计数器等于子匹配索引、指向 std::sub_match 的指针，指向当前匹配的当前子匹配项，以及一个包含最后一个未匹配字符序列的 std::match_results 对象（用于标记器模式）。

[edit] 类型要求

-

BidirIt 必须满足 LegacyBidirectionalIterator 的要求。

[edit] 特化

为常见字符序列类型定义了一些特化

在头文件 `<regex>` 中定义
类型	定义
`std::cregex_token_iterator`	std::regex_token_iterator<const char*>
`std::wcregex_token_iterator`	std::regex_token_iterator<const wchar_t*>
`std::sregex_token_iterator`	std::regex_token_iterator<std::string::const_iterator>
`std::wsregex_token_iterator`	std::regex_token_iterator<std::wstring::const_iterator>

[edit] 成员类型

成员类型	定义
`value_type`	std::sub_match<BidirIt>
`difference_type`	std::ptrdiff_t
`pointer`	const value_type*
`reference`	const value_type&
`iterator_category`	std::forward_iterator_tag
`iterator_concept` (C++20)	std::input_iterator_tag
`regex_type`	std::basic_regex<CharT, Traits>

[edit] 成员函数

(构造函数)	构造一个新的 `regex_token_iterator` (公共成员函数) [edit]
(析构函数) (隐式声明)	析构 `regex_token_iterator`，包括缓存的值 (公共成员函数) [edit]
operator=	分配内容 (公共成员函数) [edit]
operator==operator!= (已在 C++20 中移除)	比较两个 `regex_token_iterator` (公共成员函数) [edit]
operator*operator->	访问当前子匹配项 (公共成员函数) [edit]
operator++operator++(int)	将迭代器推进到下一个子匹配项 (公共成员函数) [edit]

[edit] 注释

程序员有责任确保传递给迭代器构造函数的 std::basic_regex 对象的生存期比迭代器长。因为迭代器存储一个 std::regex_iterator，该迭代器存储一个指向正则表达式的指针，因此在正则表达式被销毁后递增迭代器会导致未定义的行为。

[edit] 示例

运行此代码

#include <algorithm>
#include <fstream>
#include <iostream>
#include <iterator>
#include <regex>
 
int main()
{
    // Tokenization (non-matched fragments)
    // Note that regex is matched only two times; when the third value is obtained
    // the iterator is a suffix iterator.
    const std::string text = "Quick brown fox.";
    const std::regex ws_re("\\s+"); // whitespace
    std::copy(std::sregex_token_iterator(text.begin(), text.end(), ws_re, -1),
              std::sregex_token_iterator(),
              std::ostream_iterator<std::string>(std::cout, "\n"));
 
    std::cout << '\n';
 
    // Iterating the first submatches
    const std::string html = R"(<p><a href="http://google.com">google</a> )"
                             R"(< a HREF ="http://cppreference.com">cppreference</a>\n</p>)";
    const std::regex url_re(R"!!(<\s*A\s+[^>]*href\s*=\s*"([^"]*)")!!", std::regex::icase);
    std::copy(std::sregex_token_iterator(html.begin(), html.end(), url_re, 1),
              std::sregex_token_iterator(),
              std::ostream_iterator<std::string>(std::cout, "\n"));
}

输出

Quick
brown
fox.
 
http://google.com
http://cppreference.com

[edit] 缺陷报告

以下行为更改缺陷报告被追溯地应用于先前发布的 C++ 标准。

DR	应用于	发布的行为	正确行为
LWG 3698 (P2770R0)	C++20	`regex_token_iterator` 是一个 `forward_iterator` 虽然是存储迭代器	已改为 `input_iterator`^[1]

↑ iterator_category 在决议中保持不变，因为将其更改为 std::input_iterator_tag 可能会破坏太多现有的代码。

[1] iterator_category 在决议中保持不变，因为将其更改为 std::input_iterator_tag 可能会破坏太多现有的代码。

[1]

编译器支持
独立和托管
语言
标准库
标准库头文件
命名需求
功能测试宏 (C++20)
语言支持库
概念库 (C++20)
元编程库 (C++11)
诊断库
通用实用程序库
字符串库
容器库
迭代器库
范围库 (C++20)
算法库
数值库
本地化库
输入/输出库
文件系统库 (C++17)
正则表达式库 (C++11)
并发支持库 (C++11)
执行支持库 (C++26)
技术规范
符号索引
外部库

类
basic_regex (C++11)
sub_match (C++11)
match_results (C++11)
算法
regex_match (C++11)
regex_search (C++11)
regex_replace (C++11)
迭代器
regex_iterator (C++11)
regex_token_iterator (C++11)
异常
regex_error (C++11)
特征
regex_traits (C++11)
常量
syntax_option_type (C++11)
match_flag_type (C++11)
error_type (C++11)
正则表达式语法
修改后的 ECMAScript-262 (C++11)

成员函数
regex_token_iterator::regex_token_iterator
regex_token_iterator::operator=
比较
regex_token_iterator::operator==regex_token_iterator::operator!= (直到 C++20)
观察器
regex_token_iterator::operator*regex_token_iterator::operator->
修饰符
regex_token_iterator::operator++regex_token_iterator::operator++(int)

cppreference.com

命名空间

变体

视图

操作