Initialization of TagReader #165

silvanheller · 2020-12-10T07:57:28Z

silvanheller
Dec 10, 2020
Collaborator

Currently, the TagReader reads at every startup all available tags and puts them into the cache. This takes for V3C1 approximately 300ms when cottontail is available locally and you are on a fast node (Purple Nodes), 2s when cottontail is available locally and you are on a slow node (dmi-vitrivr), and when accessing cottontail on another machine, it takes significantly longer (11 seconds for me at home). This is not a bug or anything but makes every restart when developing on V3C1 slightly more annoying.
I thought the new Discussions feature of Github would be a good place to brainstorm on whether this is (a) even a problem, (b) if there are any sensible solutions and (c) if we want to implement those.

One simple solution would be that we cache the available tags cineast-side per db-host in a file and then ask cottontail for the current state of the table at startup (e.g. hash / version / last change timestamp - such a feature does not yet exist AFAIK but would maybe be a good addition to cottontail anyway?).

Thoughts @ppanopticon @lucaro @sauterl ?

lucaro · 2020-12-10T08:28:15Z

lucaro
Dec 10, 2020
Maintainer

This is a valid point. While I don't think that this is a large problem now, it is certainly something to keep in mind in the context the tag search. This caching mechanism is used for to speed up the suggestion mechanism of tags when entering them on the UI side. With the large number of tags that are available for the datasets used in the recent past, I'd argue for a more powerful mechanism to specify them on the UI side, since the simple suggestion of every element which contains a matching substring gets unwieldy if you don't know exactly what you are looking for (in which case, it is unnecessary). The caching mechanism currently also lacks a rebuild trigger, based on the assumption that no new tags will be added during operation. This has usually been the case in past use cases, but is not actually a given.

1 reply

silvanheller Dec 10, 2020
Collaborator Author

The rebuild trigger consideration would be another strong argument for a simple way to ask cottontail for the current version of a table - one can imagine either asking performing a request to cottontail on every API request or periodically.
In general, I agree that the current way to specify tags is underwhelming, especially since there is no way to see which tags are linked semantically.

sauterl · 2020-12-10T08:46:44Z

sauterl
Dec 10, 2020
Maintainer

Contrary to @lucaro I would suggest to expand the existing endpoint in the backend (cineast). In doing so, any UI could benefit from an improved behaviour and thus, ultimately be faster.

Also, let's not forget that in the current implementation in the UI, we cannot get three letter tags easily (i.e. car is almost unobtainable, as the suggestions start to come after the third letter and first, all capital suggestions are displayed)

2 replies

silvanheller Dec 10, 2020
Collaborator Author

I think this is fixable easily, no? The three-letter constraint is arbitrary and comes from performance concerns - but i see no reason that we can’t just send the first n tags to the UI when already typing 1 / 2 letters. Additionally, the car tag issue is probably more a sorting issue on the UI.

lucaro Dec 10, 2020
Maintainer

I don't think it's a problem of just completing based on substrings. We currently also don't support synonyms and such, which would probably be quite useful in this case.

ppanopticon · 2020-12-10T08:51:37Z

ppanopticon
Dec 10, 2020
Maintainer

I like the idea and on a Cottontail DB side, this should not be a huge change. We're already tracking metadata such as the last changes to an entity. All we really need is a way to query this information. I guess from a functionality perspective, the only real question is whether this should be implemented as a "normal" query to some kind of special entity (as for most DBMS) or whether we should add dedicated endpoints for this.

1 reply

lucaro Dec 10, 2020
Maintainer

Even though I'm not sure that this is the best approach in this particular case, I think it would be very useful to access metadata and maybe even statistics (in so far as they are collected) on cottontail entities.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initialization of TagReader #165

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Initialization of TagReader #165

silvanheller Dec 10, 2020 Collaborator

Replies: 3 comments · 4 replies

lucaro Dec 10, 2020 Maintainer

silvanheller Dec 10, 2020 Collaborator Author

sauterl Dec 10, 2020 Maintainer

silvanheller Dec 10, 2020 Collaborator Author

lucaro Dec 10, 2020 Maintainer

ppanopticon Dec 10, 2020 Maintainer

lucaro Dec 10, 2020 Maintainer

silvanheller
Dec 10, 2020
Collaborator

Replies: 3 comments 4 replies

lucaro
Dec 10, 2020
Maintainer

silvanheller Dec 10, 2020
Collaborator Author

sauterl
Dec 10, 2020
Maintainer

silvanheller Dec 10, 2020
Collaborator Author

lucaro Dec 10, 2020
Maintainer

ppanopticon
Dec 10, 2020
Maintainer

lucaro Dec 10, 2020
Maintainer