Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast ANSI string operations in C #316

Merged
merged 54 commits into from
Aug 4, 2021
Merged

Fast ANSI string operations in C #316

merged 54 commits into from
Aug 4, 2021

Conversation

gaborcsardi
Copy link
Member

@gaborcsardi gaborcsardi commented Jul 17, 2021

Implemented:

  • ansi_simplify()
  • ansi_substr()
  • ansi_has_any()
  • ansi_strip()
  • ansi_html()
  • utf8_nchar()

TODO:

  • Support ANSI hyperlinks in the C ANSI iterator.
  • Better UTF-8 display width, consider combining characters.
  • Make new ANSI functions work for NAs.

Might also implement other ansi_*() functions in C, to make them faster.

Closes #316.

* Omit empty tags now.
* Better color representation.
Plus the output classes as well.
Especially broken color sequences.
We only emit it if there was an unkown SGR tag
in the input. Otherwise we use the more specific
closing tags.
* Now we always close a color tag, before we
  start another one.
* 0m is now only emitted when there was an
  unknown SGR tag before.
Also simplify data structure of colors, just store
the color number directly, we are using 254 and 255
for 256 and RGB color, anyway.
I.e. <t1><t2>...</t2></t1> instead of
<t1><t2>...</t1></t2>.
Plus helper function for CSS and also ansi_simplify().
When there are no tags.
It does not currently find hyperlink seqs...
Coming to the other functions as well...
They convert to UTF-8 on input, always return UTF-8.
`LENGTH(NULL)` does not seems to work well on
R 3.4 and before.
Also add `utf8_nchar()`.

Towards #317.
The second might happen after the DLL was unloaded,
e.g. in testthatlabs, I am not sure why.
It is nice to see that it is happening.
To avoid name clash with testthat.
`ansi_substr()` and `ansi_substring()` uses this now,
and the new `utf8_substr()` function as well.
The latter is slightly faster, but it does not work
for ANSI styled strings.
Some chunks in the docs create this, and
it is very small, so it is simpler to just
include it.
For simplified ANSI sequences.
IDK how and when they were removed...
And that ansi_* functions do not handle it.
@gaborcsardi gaborcsardi merged commit 84dedca into master Aug 4, 2021
@gaborcsardi gaborcsardi deleted the fix/fast-ansi branch September 20, 2021 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant