Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text Index. All text queries fail on a tiny test data. #1404

Open
aindlq opened this issue Jul 17, 2024 · 2 comments
Open

Text Index. All text queries fail on a tiny test data. #1404

aindlq opened this issue Jul 17, 2024 · 2 comments

Comments

@aindlq
Copy link

aindlq commented Jul 17, 2024

I've been trying to better understand the way how current text index works using very tiny test data, but all queries fail when data is too simple.

query:

SELECT ?subject ?text WHERE {
  ?text <http://qlever.cs.uni-freiburg.de/builtin-functions/contains-entity> ?subject .
  ?text <http://qlever.cs.uni-freiburg.de/builtin-functions/contains-word> "madon*" .
  ?subject a <http://www.cidoc-crm.org/cidoc-crm/E53_Place> .
}

error message:

Assertion nofBytes > 0 failed. Please report this to the developers. In file "/app/src/index/IndexImpl.Text.cpp " at line 895

test.wordsfile.tsv:

<http://example.com/thuringen>	1	1	1
madonna	0	1	1
suffragio	0	1	1
heiligen	0	1	1
franziskus	0	1	1
klara	0	1	1
ludwig	0	1	1
frankreich	0	1	1
elisabeth	0	1	1

test.docsfile.tsv:

1	Madonna del Suffragio mit den heiligen Franziskus und Klara, Ludwig von Frankreich und Elisabeth

test.nt:

<http://example.com/thuringen> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.cidoc-crm.org/cidoc-crm/E53_Place> .
@azaroth42
Copy link

azaroth42 commented Jul 22, 2024

I get this error also:

PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>
PREFIX lux: <https://lux.collections.yale.edu/ns/>
SELECT DISTINCT ?what ?txt ?atxt WHERE {
  ?what a crm:E22_Human-Made_Object ; lux:primaryName ?txt ; lux:agentOfProduction ?artist .
  ?artist lux:primaryName ?atxt .
  ?tt ql:contains-entity ?atxt ; ql:contains-word "van gogh" .
  ?t ql:contains-entity ?txt ; ql:contains-word "nuit" .
}

Error: Assertion nofBytes > 0 failed. Please report this to the developers. In file "/app/src/index/IndexImpl.Text.cpp " at line 848

But if I change van to vincent it works as expected.

Update: For me, any token with three or fewer characters causes the exception. so "de nuit" is bad but "night nuit" is fine.

@joka921
Copy link
Member

joka921 commented Jul 26, 2024

Thanks for reporting all the issues with the Text Index.

It is one of the features that hasn't been under really active development in the last years. It is good that you show interest in this feature so we can prioritize it.
I have a plan for a complete rewrite of the text index which should mitigate most of the current limitations, but I think the
support for Named Graphs (which people are also asking for) has some priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants