Skip to content

Tokenization

rrahn edited this page Apr 4, 2017 · 15 revisions

This document provides technical specifications for the tokenization

Function overview

split_by

template <typename input_t, typename delimiter_t, typename config_t = std::ignore>
    requires forward_range_concept<input_t> && predicate_concept<delimiter_t>
inline auto
split_by(input_t const & input,
         delimiter_t && delimiter,
         config_t && config)  // optional parameter
{
    /* implementation detail*/
    return // optional<view<view<sequence_type>>>
}

This function operates on a forward_range and returns an std::optional. The optional can be empty if the sequence could not be split because the input might be empty. Otherwise the optional holds a view-of-views, so that no copying of sequence data is needed until the user explicitly assigns the return value to a proper container type to hold the data. This is also the reason, why input_range_concept is not applicable, as there is no guarantee that the seen data for tokenization is still present, when the iteration through the input continues.

crop_outer

Clone this wiki locally