-
Notifications
You must be signed in to change notification settings - Fork 82
Tokenization
template <typename input_range_t, typename output_iterator_t, typename stop_predicate_t, typename ignore_predicate_t>
requires input_range_concept<input_range_t> &&
output_iterator_concept<output_iterator_t> &&
predicate_concept<stop_predicate_t> &&
predicate_concept<ignore_predicate_t>
inline void
read_until(input_range_t &in, output_iterator_t &out, stop_predicate_t &stopFunctor, ignore_predicate_t &ignoreFunctor)
{
if constexpr(is_chunkable_v<output_iterator_t> && is_chunkable_v< input_range_t>)
{
// chunk-wise read
}
else
{
// element-wise read
}
}
- [???] should it be an output range?
- can be expressed as write_until function.
- complexity: O(n) over the number of elements in input.
- throws: possibly alloc_error or stream_error?
shortcut for read_until with is_newline
as delimiter predicate.
template <typename input_range_t, typename output_iterator_t, typename assert_predicate_t>
requires input_range_concept<input_range_t> &&
output_iterator_concept<output_iterator_t> &&
predicate_concept<assert_predicate_t>
inline void
read_until(input_range_t &in, output_iterator_t &out, stop_predicate_t &stopFunctor, ignore_predicate_t &ignoreFunctor)
{
// unspecified
}
Reads just one single element.
- reads at most n characters
- if n not specified reads the whole input range.
Writes a wrapped line. But might be modelled as simple write with an additional functor?
template <typename input_t, typename delimiter_t, typename config_t = std::ignore>
requires forward_range_concept<input_t> && predicate_concept<delimiter_t>
inline auto
split_by(input_t const & input,
delimiter_t && delimiter,
config_t && config) // optional parameter
{
/* implementation detail*/
return // optional<view<view<sequence_type>>>
}
This function operates on a forward_range and returns view of views. The views can be empty if the sequence could not be split because the input might be empty. Otherwise the optional holds a view-of-views, so that no copying of sequence data is needed until the user explicitly assigns the return value to a proper container type to hold the data. This is also the reason, why input_range_concept is not applicable, as there is no guarantee that the seen data for tokenization is still present, when the iteration through the input continues.
namespace seqan3::action
{
constexpr ranges::action< crop_outer_fn > crop_outer { /* unspecified */ }
}
namespace seqan3::view
{
constexpr ranges::view< crop_outer_fn > crop_outer { /* unspecified */ }
}
Modeling this kind of functions as either views or actions would be desirable. How exactly this has to be implemented remains to be seenβοΈ
namespace seqan3::action
{
constexpr ranges::action< crop_before_last_fn > crop_before_last { /* unspecified */ }
}
namespace seqan3::view
{
constexpr ranges::view< crop_before_last_fn > crop_before_last { /* unspecified */ }
}
Similar to crop_outer.
namespace seqan3::action
{
constexpr ranges::action< crop_before_first_fn > crop_before_first { /* unspecified */ }
}
namespace seqan3::view
{
constexpr ranges::view< crop_before_first_fn > crop_before_first { /* unspecified */ }
}
similar to crop_outer.
namespace seqan3::action
{
constexpr ranges::action< crop_after_last_fn > crop_after_last { /* unspecified */ }
}
namespace seqan3::view
{
constexpr ranges::view< crop_after_last_fn > crop_after_last { /* unspecified */ }
}
namespace seqan3::action
{
constexpr ranges::action< crop_after_first_fn > crop_after_first { /* unspecified */ }
}
namespace seqan3::view
{
constexpr ranges::view< crop_after_first_fn > crop_after_first { /* unspecified */ }
}
template <typename input_t, typename predicate_t>
requires forward_range_concept<input_t> && predicate_concept<predicate_t>
inline auto
find_last(input_t const & input,
predicate_t && p)
{
/* unspecified */
return iterator_t<input_t>{begin(input)};
}
The find_last is just an algorithm, that can be optimised when working on buffered streams, as chunking might be more efficient on streams. However, right now it is nowhere used in seqan For standard containers this could be simply replaced with:
view::find_if(view::reverse(buffer), seqan3::equals_char<','>());
template <typename input_t, typename predicate_t>
requires forward_range_concept<input_t> && predicate_concept<predicate_t>
inline auto
find_first(input_t const & input,
predicate_t && p)
{
/* unspecified */
return iterator_t<input_t>{begin(input)};
}
The find_first is just an algorithm, that can be optimised when working on buffered streams, as chunking might be more efficient on streams. However, right now it is only used in one place of seqan, which does it on a simple CharString buffer. For standard containers this could be simply replaced with:
view::find_if(buffer, seqan3::equals_char<','>());
template <typename iterator_t, typename predicate_t>
requires input_iterator_concept<iterator_t> && predicate_concept<iterator_t>
inline void
skip_until(iterator_t it, predicate_t && p)
{
if constexpr (is_chunkable_v<iterator_t>)
/* unspecified */
else
/* unspecified */
}
Modeled via input iterator. Can be chunked and element-wise.
template <typename iterator_t>
requires input_iterator_concept<iterator_t>
inline void
skip_line(iterator_t & it)
{
skip_line(it, is_new_line());
// consume platform dependent line ending
}
Delegates to skip_until and consumes line_ending. This must be platform dependent, i.e. differ between \n
and \r\n
.
should throw if
template <typename iterator_t, typename predicate_t>
requires input_iterator_concept<iterator_t> && predicate_concept<predicate_t>
inline void
skip(iterator_t & it, predicate_t && p)
{
// requires p(*it) == true;
++it;
}
template <typename iterator_t>
requires input_iterator_concept<iterator_t>
inline void
skip(iterator_t & it)
{
skip_one(it, [](auto const & val){ return true; });
}
Should throw parse_error if predicate is not fulfilled.
TODO