Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial strings: #95

Closed
UWN opened this issue Apr 7, 2019 · 31 comments
Closed

Partial strings: #95

UWN opened this issue Apr 7, 2019 · 31 comments

Comments

@UWN
Copy link

UWN commented Apr 7, 2019

?- X = [a,b,c|_], X = [a,b,c|Y].
X = [a, b, c | _1], Y = _1.                             %%% OK
?- partial_string("abc",X), X = [a,b,c|Y].
X = [a, b, c | _], Y = _.                                 %%% ???

Why this difference?

@mthom
Copy link
Owner

mthom commented Apr 7, 2019

In the second instance, Y points to the end of a singly allocated string buffer, which is not part of the heap, so there is no heap cell to refer to.

@UWN
Copy link
Author

UWN commented Apr 7, 2019

That does not make sense to me. The end must be a real variable.

@UWN
Copy link
Author

UWN commented Apr 7, 2019

On the heap, Xs0 = [a,b,c|Xs], Xs0 = [_|Xs1] would look like (byte-by-byte)"

a <---- Xs0 points here
b <---- Xs1 points here
c
\0\ % zero to terminate the actual string
\0\ % further zeroes for padding, if needed
Xs_b1 % four-byte address on a 32bit system (eight on a 64)
Xs_b2
Xs_b3
Xs_b4

@UWN
Copy link
Author

UWN commented Apr 7, 2019

See #24

@UWN
Copy link
Author

UWN commented Apr 8, 2019

chars_partialstring(Chs, Cs0,Cs) :-
    must_be(chars, Chs),
    can_be(chars, Cs0),
    can_be(chars, Cs),
   /* now the hard part */

@UWN
Copy link
Author

UWN commented Apr 14, 2019

Currently I get:

?- partial_string(_,_).
error: exception thrown: error(syntax_error(inadmissible_query_term), repl/0)

But this is not a syntax error. And further, "inadmissible_query_term" is what is found and not what is expected. See N226 for a lengthy analysis of what errors are about. TL;DR: They signal primarily what is expected and not what is found. In the long run, you will find yourself going through that table of error classification quite frequently.

An existence_error or a permission_error is preferable in this situation. OK, you can also signal a system_error.

@UWN
Copy link
Author

UWN commented Apr 16, 2019

Do you think you will be able to continue work on partial strings?

@mthom
Copy link
Owner

mthom commented Apr 16, 2019

I do. Things are sluggish right now because I'm preparing to leave in two weeks.

@UWN
Copy link
Author

UWN commented May 5, 2019

How do you intend to proceed? It seems in spite of this, you have still these _-variables. There are no such variables in Prolog. The underscore is just a unique variable name. But the variable itself has an identity

@mthom
Copy link
Owner

mthom commented May 5, 2019

I will change the _-variables to something else, probably using a specialized convention to denote that the variable points to the end of a character buffer.

@UWN
Copy link
Author

UWN commented Oct 8, 2019

Here is a related problem:

?- partial_string("abc", L), length(L, N), N = 4.
false.

Expected: success.

@UWN
Copy link
Author

UWN commented Oct 8, 2019

Yet,

?- partial_string("abc", L), L = [_,_,_,d|_],length(L,N), N = 4.
L = [a,b,c,d|_], N = 4, _180 = a, _189 = b, _192 = c, _196 = _.

@mthom
Copy link
Owner

mthom commented Oct 13, 2019

I don't understand why the partial string "abc" should have 4 elements.

@UWN
Copy link
Author

UWN commented Oct 14, 2019

Because partial_string("abc",L) should be equivalent (but not internally equivalent) to L = [a,b,c|_]. And now,

?- L = [a,b,c|_], length(L, N), N = 4.

succeeds as well.

@UWN
Copy link
Author

UWN commented Oct 14, 2019

In other words:

?- partial_string("abc",L), length(L, N).
L = [a,b,c|_], N = 3.
% Expected: More answers
?- L = [a,b,c|_], length(L, N).
L = [a,b,c], N = 3, _181 = [] ;
L = [a,b,c,_10], N = 4, _181 = [_10] ;
L = [a,b,c,_10,_14], N = 5, _181 = [_10,_14] ;
L = [a,b,c,_10,_14,_18], N = 6, _181 = [_10,_14,_18] ...

( In any case: very nice ... )

@UWN
Copy link
Author

UWN commented Oct 14, 2019

Alternatively:

ulrich@p0:~$ export RUST_BACKTRACE=1
ulrich@p0:~$ echo $RUST_BACKTRACE
1
ulrich@p0:~$ /opt/gupu/scryer-prolog/target/debug/scryer-prolog 
?- [user].
l([],0).
l([_|L], s(X)) :- l(L,X).
end_of_file.
?- partial_string("abc",L),l(L,N).
thread 'main' panicked at 'index out of bounds: the len is 18 but the index is 18', libcore/slice/mod.rs:2448:10
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
   1: std::sys_common::backtrace::print
   2: std::panicking::default_hook::{{closure}}
   3: std::panicking::default_hook
   4: std::panicking::rust_panic_with_hook
   5: std::panicking::continue_panic_fmt
   6: rust_begin_unwind
   7: core::panicking::panic_fmt
   8: core::panicking::panic_bounds_check
   9: <usize as core::slice::SliceIndex<[T]>>::index
             at libcore/slice/mod.rs:2448
  10: core::slice::<impl core::ops::index::Index<I> for [T]>::index
             at libcore/slice/mod.rs:2316
  11: <alloc::vec::Vec<T> as core::ops::index::Index<I>>::index
             at liballoc/vec.rs:1653
  12: <scryer_prolog::prolog::machine::heap::Heap as core::ops::index::Index<usize>>::index
             at src/prolog/machine/heap.rs:83
  13: scryer_prolog::prolog::machine::machine_state_impl::<impl scryer_prolog::prolog::machine::machine_state::MachineState>::execute_fact_instr
             at src/prolog/machine/machine_state_impl.rs:1745
  14: scryer_prolog::prolog::machine::<impl scryer_prolog::prolog::machine::machine_state::MachineState>::dispatch_instr
             at src/prolog/machine/mod.rs:705
  15: scryer_prolog::prolog::machine::<impl scryer_prolog::prolog::machine::machine_state::MachineState>::execute_instr
             at src/prolog/machine/mod.rs:727
  16: scryer_prolog::prolog::machine::<impl scryer_prolog::prolog::machine::machine_state::MachineState>::query_stepper
             at src/prolog/machine/mod.rs:819
  17: scryer_prolog::prolog::machine::Machine::run_query
             at src/prolog/machine/mod.rs:546
  18: scryer_prolog::prolog::machine::Machine::submit_query
             at src/prolog/machine/mod.rs:317
  19: scryer_prolog::prolog::machine::compile::compile_term
             at src/prolog/machine/compile.rs:201
  20: scryer_prolog::prolog::machine::Machine::handle_toplevel_command
             at src/prolog/machine/mod.rs:395
  21: scryer_prolog::prolog::machine::Machine::run_query
             at src/prolog/machine/mod.rs:552
  22: scryer_prolog::prolog::machine::Machine::run_toplevel
             at src/prolog/machine/mod.rs:212
  23: scryer_prolog::main
             at src/main.rs:28
  24: std::rt::lang_start::{{closure}}
             at libstd/rt.rs:74
  25: std::panicking::try::do_call
  26: __rust_maybe_catch_panic
  27: std::rt::lang_start_internal
  28: std::rt::lang_start
             at libstd/rt.rs:74
  29: main
  30: __libc_start_main
  31: <unknown>
ulrich@p0:~$ 

@UWN
Copy link
Author

UWN commented Oct 14, 2019

@triska, can you reproduce this?

@triska
Copy link
Contributor

triska commented Oct 15, 2019

Yes, I can reproduce this, but unfortunately I cannot help with this issue!

@mthom
Copy link
Owner

mthom commented Oct 15, 2019

I will have a look tomorrow.

@mthom
Copy link
Owner

mthom commented Oct 15, 2019

In other words:

?- partial_string("abc",L), length(L, N).
L = [a,b,c|], N = 3.
% Expected: More answers
?- L = [a,b,c|
], length(L, N).
L = [a,b,c], N = 3, _181 = [] ;
L = [a,b,c,_10], N = 4, _181 = [_10] ;
L = [a,b,c,_10,_14], N = 5, _181 = [_10,_14] ;
L = [a,b,c,_10,_14,_18], N = 6, _181 = [_10,_14,_18] ...

( In any case: very nice ... )

I'm not sure how to make sense of that. Internally, a partial string is a buffer of UTF-8 characters. If the buffer is expanded to accommodate new characters, how would that work with respect to unification? The variables beyond c could only be characters, supposing that they are indices into the partial string (or character codes if double_quotes is set to codes). Unless the partial string is continued as an ordinary list somehow, but then, it ceases to be a partial string, doesn't it?

@UWN
Copy link
Author

UWN commented Oct 16, 2019

What about talking about it?

@mthom
Copy link
Owner

mthom commented Oct 16, 2019

Sure, we can work out a time over email.

@mthom
Copy link
Owner

mthom commented Dec 2, 2019

I'm about ready to take a second crack at this from Unsafe Rust. You gave this example layout a few months back:

a <---- Xs0 points here
b <---- Xs1 points here
c
\0\ % zero to terminate the actual string
\0\ % further zeroes for padding, if needed
Xs_b1 % four-byte address on a 32bit system (eight on a 64)
Xs_b2
Xs_b3
Xs_b4

To be sure I understand the example properly, the final four/eight bytes are self-referential until bound, correct? This is how unbound variables are implemented elsewhere.

@UWN
Copy link
Author

UWN commented Dec 2, 2019

Yes. Let's consider a built_in chars_partialstring(Chars, Xs0, Xs):

?- partial_string([a,b,c], Xs0, Xs).

creates above term. Please keep in mind that

?- partial_string([a,b,c], Xs0, []).

might now have just the code for [ ] at the place of Xs_b1 ... Xs_b4/8

@mthom
Copy link
Owner

mthom commented Dec 2, 2019

Alright. And the pointer at Xs could also point into the WAM heap or stack?

@UWN
Copy link
Author

UWN commented Dec 3, 2019

Xs in this example resides on the heap and therefore can only point to the heap. It can never point to the stack, since only stackelements (local variables) can point to other stackelements.

edit
should have read: since only stackelements and (argument and temporary) registers can point to other stackelements.

And thus, strictly speaking, also choicepoints point to other stackelements

@UWN
Copy link
Author

UWN commented Dec 3, 2019

Here is an extreme example, "ab\0\d". This is a four-element list with '\0\' at the third position. How can we represent zero within a zero-terminated UTF-string? Well, we can't!

?- chars_partiallist("ab\0\de", Xs0,Xs).

And these are the resulting terms on the heap of a 64 bit machine:

0'a <------------ Xs0
0'b
0'\0\ to terminate the string
0'\0\ up to 7 for padding, if needed
[8 bytes list pointer to next cell]
[8 bytes for atom '\0\'] this is the car of the list
[8 bytes partial list pointer to next byte] this is the cdr of the list
0'd
0'e
0'\0\ to terminate the string
0'\0\ up to 7 for padding, if needed
[8 bytes self-referential variable] <------- Xs

In another notation we have:

    Xs0 = [a,b|_I1],            % partial string
    _I1 = '.'('\0\',_I2),       % normal dotted pair
    _I2 = [d,e|Xs]              % partial string

It should be noted that zeroes in strings are extremely rare, so it is not worth doing any extra optimization for them.

@UWN
Copy link
Author

UWN commented Jan 6, 2020

Use case

@triska
Copy link
Contributor

triska commented Jan 6, 2020

Another request that calls for this feature:

https://news.ycombinator.com/item?id=21937272

@mthom
Copy link
Owner

mthom commented Feb 8, 2020

I suppose if we have an example a bit more general than your last:

?- chars_partiallist([a,b,X,d,e], Xs0,Xs).

with the uninstantiated X in place of the null character, something similar would occur in the heap:

0'a <------------ Xs0
0'b
0'\0\ to terminate the string
0'\0\ up to 7 for padding, if needed
[8 bytes list pointer to next cell]
[8 byte self-referential pointer to X] this is the car of the list
[8 bytes partial list pointer to next byte] this is the cdr of the list
0'd
0'e
0'\0\ to terminate the string
0'\0\ up to 7 for padding, if needed
[8 bytes self-referential variable] <------- Xs

@UWN
Copy link
Author

UWN commented Feb 8, 2020

See #95 (comment)
The first argument must be a list of characters. And a variable there will thus produce an instantiation error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants