Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fixed-width integers and clarify encoding of C strings. #394

Merged
merged 9 commits into from
Jan 23, 2023

Conversation

fangerer
Copy link
Contributor

Resolves #343 and #355 .

As we decided in the last HPy dev call, we want to use fixed-width integers (where appropriate) instead of int/long in the API/ABI because the sizes of int/long could vary depending on several properties.

Since int is actually mostly used as a flag, status code, or small enum values (like in HPy_RichCompare argument int op), this PR is mainly about HPyLong_From(Long|LongLong).

I did some research about the guarantees and found this: https://en.cppreference.com/w/c/language/arithmetic_types

Based on this document, I assume that int has at least 2 bytes, long at least 4 bytes, and long long at least 8 bytes.
I came to the conclusion that instead of long and long long, I'll use int32_t and int64_t because a smaller type could still be beneficial in some cases.

I then implemented the previous APIs like HPyLong_FromLong as inline helpers delegating to HPyLong_From(Int32|Int64) depending on sizeof(long). Since the size of a type is compile-time constant, the dispatching overhead should be zero.
In case of CPython, HPyLong_From(Int32|Int64) again delegates to PyLong_From(Long|LongLong) depending on the type sizes (again, compile-time constant).

The second change in this PR makes the encoding of C strings in the HPy API clear. We use UTF-8 everywhere. I checked if the const char * arguments are then really consumed by some UTF-8 conversion which is in most cases HPyUnicode_FromString.

@mattip
Copy link
Contributor

mattip commented Dec 29, 2022

On what systems are the macros #if SIZEOF_LONG >= SIZEOF_INT32 and #if SIZEOF_LONG_LONG >= SIZEOF_INT64 true? Or in other words, where will CPython go via the slower c-api calls?

@fangerer
Copy link
Contributor Author

fangerer commented Dec 29, 2022

On what systems are the macros #if SIZEOF_LONG >= SIZEOF_INT32 and #if SIZEOF_LONG_LONG >= SIZEOF_INT64 true? Or in other words, where will CPython go via the slower c-api calls?

You mean: on what system are conditions SIZEOF_LONG >= SIZEOF_INT32 and SIZEOF_LONG_LONG >= SIZEOF_INT64 false ? Because they are true on, e.g. my system since sizeof(long) == sizeof(long long) == 8 these days.

TBH, I don't know. I just tried to cover all possible cases according to the minimal sizes guaranteed by https://en.cppreference.com/w/c/language/arithmetic_types. The slower C API calls will go to e.g. ctx_Long_FromUInt32_t (in ctx_long.c) and currently, it will also just delegate to PyLong_FromLong but the difference is: if SIZEOF_LONG >= SIZEOF_INT32 is false, then you can't even build this function (see ctx_long.c:13). We could, ofc, also do a range check and that case but I thought that can be future work if we ever need it.

@mattip
Copy link
Contributor

mattip commented Dec 29, 2022

OK, thanks. Even though this is a big renaming, I like it. The names are now much more self-explanatory.

@fangerer fangerer force-pushed the fa/fixed-width-long_cstring-enc branch from 4efcdf3 to f8bb7f4 Compare December 29, 2022 14:43
HPy_ID(83)
HPy HPyLong_FromUnsignedLongLong(HPyContext *ctx, unsigned long long v);
HPy HPyLong_FromUInt64_t(HPyContext *ctx, uint64_t v);
HPy_ID(84)
HPy HPyLong_FromSize_t(HPyContext *ctx, size_t value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since size_t is implementation-defined, do we want to do the same for it as for int/long?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a point and we probably should. This basically means that we would remove ctx_Long_FromSize_t from the context and just have an inline helper for it (which, I think, is fine).

HPy_ID(83)
HPy HPyLong_FromUnsignedLongLong(HPyContext *ctx, unsigned long long v);
HPy HPyLong_FromUInt64_t(HPyContext *ctx, uint64_t v);
HPy_ID(84)
HPy HPyLong_FromSize_t(HPyContext *ctx, size_t value);
HPy_ID(85)
HPy HPyLong_FromSsize_t(HPyContext *ctx, HPy_ssize_t value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe unlike CPython we should define HPy_ssize_t to be always an alias to, e.g., uint64_t?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do this in pieces. Since int/long and ascii/utf-8 are the most common places for confusion, let's just stick with those for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe unlike CPython we should define HPy_ssize_t to be always an alias to, e.g., uint64_t?

This would be fine for me but just to consider the original reason for having Py_ssize_t in CPython (https://peps.python.org/pep-0353/): As far as I understand, there are two reasons. (1) CPython previously used int for indexing which is just too small on 64-bit systems, and (2) because the signed integral type ssize_t is not available everywhere.

From my point of view, point (1) implies that the type used for indexing should fit into one register (certainly for performance reasons). Consequently, we should maybe also have such a type but as far as I know, we don't run any tests on 32-bit systems right now. I doubt that HPy even works on 32-bit systems. This indicates that we should probably just remove ctx_Long_FromSsize_t from the context and provide an inline helper.

The more interesting part then is: how should we deal with all the other API functions that use HPy_ssize_t?
Anyway, this looks like a larger topic which I wouldn't tackle in this PR.

@@ -154,7 +154,7 @@ HPy_ID(97)
double HPyFloat_AsDouble(HPyContext *ctx, HPy h);

HPy_ID(98)
HPy HPyBool_FromLong(HPyContext *ctx, long v);
HPy HPyBool_FromBool(HPyContext *ctx, bool v);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should have some HPy_bool_t to also remove the compiler-specific bool. Not sure how far we want to take this -- do we want to remove all sources of potential compile-specifics from ABI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the compiler-specific bool

As far as I know, the Boolean type (i.e. _Bool or bool) is part of the C standard since C99 (see, e.g., https://en.cppreference.com/w/c/types). That's why I thought that we could just have it in the ABI, in particular because true and false are now also defined to expand to 1 and 0, respectively.

What we should definitively do is to explicitly make C99 the minimum required standard for HPy (right now, this is just an implicit requirement).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should specify C99 as a minimum required standard.

@mattip
Copy link
Contributor

mattip commented Jan 2, 2023

There is a failure in cppcheck, which I think is causing the other CI runs to abort. Maybe we should be using fail-fast: false in all the non-essential CI runs.

@fangerer
Copy link
Contributor Author

fangerer commented Jan 9, 2023

There is a failure in cppcheck, which I think is causing the other CI runs to abort. Maybe we should be using fail-fast: false in all the non-essential CI runs.

@mattip: I just don't understand why cppcheck is failing. I tried to reproduce locally but it doesn't print me any useful message. There are a lot of messages with level information and then there is a message that not all includes could be resolved. Running with --check-config (as suggested by cppcheck) doesn't give any useful output. I already played around and added include paths and such but nothing helped.
Do you know how to debug that?

@mattip
Copy link
Contributor

mattip commented Jan 9, 2023

I just don't understand why cppcheck is failing

I have no idea. I suggest that we do not try to debug it, rather we make passing cppcheck optional since it is flaky and difficult to control. It is meant to help us, but turns out obstructing progress.

@thedrow
Copy link
Contributor

thedrow commented Jan 9, 2023

I just don't understand why cppcheck is failing

I have no idea. I suggest that we do not try to debug it, rather we make passing cppcheck optional since it is flaky and difficult to control. It is meant to help us, but turns out obstructing progress.

#395 attempts to fix the CPPCheck failures.
We can disable it if it's not useful but it did catch a few bugs related to varargs before so I think it is somewhat useful at the very least.

@fangerer
Copy link
Contributor Author

fangerer commented Jan 9, 2023

#395 attempts to fix the CPPCheck failures.

Well, that's my PR where I made it pass on my local box but it still fails in the GitHub CI. Hence, #395 is not (yet) the solution. I'll do some more investigations ...

@mattip
Copy link
Contributor

mattip commented Jan 9, 2023

I would strongly suggest not spending the very limited time we have on the cppcheck CI run. Let it fail, and let the rest of CI continue. The problem right now is that failing cppcheck run stops the rest of the CI, which then means we have to fix cppcheck before we can do anything else. By adding fail-fast: false to all the non-essential checks, we can make faster progress.

@fangerer
Copy link
Contributor Author

fangerer commented Jan 9, 2023

Agreed. I'll do that.

@mattip
Copy link
Contributor

mattip commented Jan 10, 2023

Thanks, now CI runs to completion. The failures do not seem to be related to this PR. Are they failing elsewhere as well?

@fangerer
Copy link
Contributor Author

fangerer commented Jan 10, 2023

Thanks, now CI runs to completion. The failures do not seem to be related to this PR. Are they failing elsewhere as well?

Yes, in #392 . I've seen those problems on my local machine (Mac OS 12) as well. I thought that the GitHub upgrade to Mac OS 12 is happening in March. I already started to look into the problem. I probably pin to Mac OS 11 for now.

@fangerer fangerer merged commit e8c7c1d into hpyproject:master Jan 23, 2023
@fangerer fangerer deleted the fa/fixed-width-long_cstring-enc branch January 23, 2023 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use fixed width integers (at least in ABI)
4 participants