Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial strings: not representable characters #267

Closed
UWN opened this issue Feb 21, 2020 · 19 comments
Closed

Partial strings: not representable characters #267

UWN opened this issue Feb 21, 2020 · 19 comments
Labels

Comments

@UWN
Copy link

UWN commented Feb 21, 2020

?- char_code(C,0), partial_string([C],Xs0,Xs).
caught: error(type_error(character,'\x0\'),partial_string/3)

I believe this should succeed, even if Xs0 will not be a partial string.

And the error should be rather a representation error.

@UWN UWN changed the title Partial string: not representable characters Partial strings: not representable characters Feb 21, 2020
@UWN
Copy link
Author

UWN commented Mar 3, 2020

Not sure, I think it should rather succeed with Xs0 = .('\0\',Xs).

@UWN
Copy link
Author

UWN commented May 6, 2020

In the meantime it's:

?- char_code(C,0), partial_string([C],Xs0,[]).
   C = '\x0\', Xs0 = [].

Expected: Xs0 = "\0\" which is the same as Xs0 = ['\0\']

Further:

?- char_code(C,0), partial_string([C,b,c],Xs0,[]).
   C = '\x0\', Xs0 = "bc".

Expected: Xs0 = "\0\bc"

@UWN
Copy link
Author

UWN commented May 6, 2020

@mthom : [bug] since it must be true

for all Xs being a list of chars: partial_string(Xs, Xs0, []), Xs == Xs0.

Counterexample: Xs = "a\0\b".

@UWN
Copy link
Author

UWN commented May 6, 2020

But also

?- Xs = "a\0\b".
   Xs = "ab".

is incorrect.

@ghost
Copy link

ghost commented May 18, 2020

@mthom , what is a terminator for PartialString?

@mthom
Copy link
Owner

mthom commented May 18, 2020

I thought it was '\0'. The 0-byte.

@ghost
Copy link

ghost commented May 18, 2020

But in "a\x0\b" there is the '\0', wouldn't it cause issue?

@mthom
Copy link
Owner

mthom commented May 18, 2020

That's what I thought! I'm not sure how to reconcile the two.

@ghost
Copy link

ghost commented May 19, 2020

Not sure if it is possible, C-string have that issue. The terminator is the issue. Right now:

pstr.append_chars("\x00ab") == None

Why use a terminator?

@mthom
Copy link
Owner

mthom commented May 19, 2020

I think the general idea is that \0 still causes partial strings to split, as it originally did. "a\x0\b" will write "a\x0" being to the first string/segment, and "\x0b\x0" to the second string/the first string's tail. See #95 for how the terminators are intended to work.

The catch is that no partial string (at least, none represented by HeapCellValue::PartialString) can be entirely empty, ie. each must contain at least one character. We have [] for empty strings.

Why use a terminator?

Eventually strings will be stored directly in the heap. The WAM needs to know when they terminate by scanning them since the length of the string won't be stored anywhere. That's not currently the case. Strings are currently stored to dedicated buffers pointed to from within the heap, and are deallocated via Rust's RAII.

@ghost
Copy link

ghost commented May 19, 2020

This comment seems to state that it isn't possible.

If the length can't be stored then the terminator is required but it doesn't seem possible to distinguish a terminator and '\0'.

When allocate_pstr is called with "\x00ab" is the allocation done for "\x0\ab"?

@mthom
Copy link
Owner

mthom commented May 19, 2020

It's written to a second string, the tail of "\x0".

@ghost
Copy link

ghost commented May 19, 2020

write_pstr is returning None for "\x00\x00". Will do some tests later.

@mthom
Copy link
Owner

mthom commented May 19, 2020

OK, I have it done, according to the above interpretation. That is, a '\x0\' can occur as the first character of a partial string segment, where it will be interpreted as just another character. Anywhere else, it will be interpreted as a null terminator. This is to say that no partial string segment may be empty. This query still succeeds as expected however:

?- partial_string("", Xs, Xs0).
   Xs = Xs0.

I will commit the change and we can hopefully close the issue.

mthom added a commit that referenced this issue May 19, 2020
@triska
Copy link
Contributor

triska commented May 19, 2020

The test cases work perfectly now, thank you a lot!

@triska
Copy link
Contributor

triska commented May 19, 2020

However, I now incorrectly get:

?- Ls = "\x2124\".
   Ls = "\2124\".

Expected answer: Ls = "\x2124\". Note the x.

@mthom
Copy link
Owner

mthom commented May 19, 2020

'\x0' instead of '\0' is acceptable, right?

@triska
Copy link
Contributor

triska commented May 19, 2020

I suppose you mean '\x0\' and '\0\', i.e., with the trailing \?

Yes, absolutely!

As I wanted to try the equivalence with GNU Prolog, I got:

| ?- X = '\0\'.
uncaught exception: error(syntax_error('user_input:9 (char:34) invalid character code in \\constant\\ sequence'),read_term/3)

So, this seems to be a shortcoming in GNU Prolog...

Definitely '\x0\' is acceptable to denote the character with code 0.

@UWN
Copy link
Author

UWN commented Jul 12, 2020

So, this seems to be a shortcoming in GNU Prolog...

Not sure what you mean by shortcoming, but there is no requirement in 13211-1 that '\0\' is a character of the Processor character set (PCS, 6.5). Including '\0' means that the implementation defined PCS contains an extended character. In GNU it is not part of the PCS, thus either syntax errors or representation errors occur:

| ?- char_code(C,0).
uncaught exception: error(representation_error(character_code),char_code/2)

@UWN UWN closed this as completed Jul 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants