Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example for documentation #106

Open
Stargateur opened this issue Jun 6, 2017 · 9 comments
Open

Add example for documentation #106

Stargateur opened this issue Jun 6, 2017 · 9 comments

Comments

@Stargateur
Copy link

I didn't find any example.

This would be possible to add a folder example with all necessary example to print, iterate, create, read a UTF-8 string with this library ?

@stevengj
Copy link
Member

stevengj commented Jun 6, 2017

Examples would be welcome, but sounds like they should go into the README or into the manual?

@stevengj
Copy link
Member

stevengj commented Jun 6, 2017

Note that utf8proc does not handle printing of UTF-8 strings. To print a UTF-8 string you can just use printf (caveat: on Windows, you need to set the terminal to the UTF8 codepage). UTF-8 strings can be created in any decent text editors (since most text editors can be set to edit in UTF-8 mode). And reading a UTF-8 string is also something that you can do with standard C library functions. UTF-8 strings are just bytes read from a e.g. file in the UTF-8 encoding.

@stevengj
Copy link
Member

stevengj commented Jun 6, 2017

The main purpose of this library is for things like Unicode normalization, case-folding, etcetera, that require Unicode data tables. There are also functions to encode/decode Unicode codepoints to/from UTF-8, as described in the manual — maybe that is what you mean by "creating" and "reading" UTF-8 strings?

@Stargateur
Copy link
Author

Stargateur commented Jun 6, 2017

Examples would be welcome, but sounds like they should go into the README or into the manual?

Would be perfect too.

Note that utf8proc does not handle printing of UTF-8 strings. To print a UTF-8 string you can just use printf (caveat: on Windows, you need to set the terminal to the UTF8 codepage). UTF-8 strings can be created in any decent text editors (since most text editors can be set to edit in UTF-8 mode). And reading a UTF-8 string is also something that you can do with standard C library functions. UTF-8 strings are just bytes read from a e.g. file in the UTF-8 encoding.

I know but this could really help beginner to understand basic use of the library. Like you said that for example the user has to read and write string him/herself.

My issue come from a question in stack overflow, this one. I have been unable to provide an answer because I didn't understand how to use this library.

I try this but I'm sure that it's not the way to do it:

#include <stdio.h>
#include <utf8proc.h>
#include <unistd.h>

int main(void) {
  utf8proc_uint8_t const string[6] = "\xe4\xb8\xad\xe6\x96\x87"; // or this u8"ايه الاخبار"
  utf8proc_ssize_t size = sizeof string / sizeof *string;
  utf8proc_int32_t data;
  utf8proc_ssize_t n;

  utf8proc_uint8_t const *pstring = string;
  while ((n = utf8proc_iterate(pstring, size, &data)) > 0) {
    printf("%.*s\n", (int)n, pstring);
    pstring += n;
    size -= n;
  }
}

@cesss
Copy link

cesss commented Jan 7, 2018

@Stargateur : First of all, your code has an important error that will make it fail no matter the libraries you use for UTF-8: You statically allocate 6 bytes for a string made of 6 bytes. That's not correct. Strings in the C language are null-terminated: They need a zero byte at the end. So, you need to allocate 7 bytes for a string that has 6 bytes of data. For static allocation, the compiler can do this automatically for you, if you leave empty the string length between brackets. Read any good chapter about strings in a good C language book, and you'll learn all of this.

Second, you don't need utf8proc for declaring a UTF-8 string and printing it.

In your case, your code could be reduced to something as simple as this: only two lines:

const char string[]="ايه الاخبار"; /* no need to prepend "u8" if the file is encoded as UTF-8 with no BOM */
printf("%s\n",string);

As simple as that.

@giampaolo
Copy link

giampaolo commented Jan 25, 2018

I agree a directory of examples would be great to have!
https://julialang.org/utf8proc/doc/ is an API reference, which is very different from a documentation, a tutorial or an "example usage" section in the README, which is probably the most immediate way to get something working ASAP.

@niblo
Copy link

niblo commented Jan 3, 2021

(I was about to create a new issue, but this one seems to be a good fit.)

I'm also looking for examples.

What I'm trying to do is implement an iterator function that iterates over graphemes (in C). I'm implementing it as a patch to utf8proc to piggyback on the test infrastructure, but I'm seeing some odd results.

One reference that I haven't looked at yet is the graphemes function in Julia, but I don't know the Julia language.

Do you know of any other references, or perhaps an existing implementation?

@jamesBrosnahan
Copy link

I agree a directory of examples would be great to have!
https://julialang.org/utf8proc/doc/ is an API reference, which is very different from a documentation, a tutorial or an "example usage" section in the README, which is probably the most immediate way to get something working ASAP.

The link 404s.

@wolfield
Copy link

Bump to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants