Speedup __add_triple_context. #1271

rchateauneu · 2021-03-03T16:40:39Z

Related to #1261.

(cherry picked from commit 85e1cfe)

This eliminates between two and five lookups in the container self.__tripleContexts (depending on the execution) by storing its reference in the variable triple_context = self.__tripleContexts[triple], and reusing it.

(cherry picked from commit 85e1cfe)

coveralls · 2021-03-03T16:57:09Z

Coverage increased (+0.07%) to 75.48% when pulling 8086068 on rchateauneu:memory_only into 9f0f296 on RDFLib:master.

coveralls · 2021-03-03T16:57:10Z

Coverage increased (+0.07%) to 75.48% when pulling 8086068 on rchateauneu:memory_only into 9f0f296 on RDFLib:master.

coveralls · 2021-03-03T16:57:10Z

Coverage increased (+0.07%) to 75.48% when pulling 8086068 on rchateauneu:memory_only into 9f0f296 on RDFLib:master.

coveralls · 2021-03-03T16:57:10Z

Coverage increased (+0.08%) to 75.492% when pulling 53164b2 on rchateauneu:memory_only into 9f0f296 on RDFLib:master.

nicholascar

Passing all tests and seems to make sense to me

white-gecko · 2021-03-05T13:22:28Z

rdflib/plugins/stores/memory.py

-        if self.__tripleContexts[triple] == self.__defaultContexts:
-            del self.__tripleContexts[triple]
+        if triple_context == self.__defaultContexts:
+            del triple_context


Shouldn't this delete the context from self.__tripleContexts? It appears to me, that del triple_context just does nothing.

Yes, it strongly seems you are right.

>>> x={"a","b"} >>> y=x >>> del y`` >>> x`` set(['a', 'b'])`` >>> y`` Traceback (most recent call last):`` File "<stdin>", line 1, in <module>`` NameError: name 'y' is not defined``

It should probably be:

del self.__tripleContexts[triple]

The nasty thing is that it does not appear in the results.

I did the fix in eb5e025

Yes you're right, thanks @white-gecko

white-gecko · 2021-03-05T16:08:20Z

For the wordnet file (cf. #1261) I could see an improvement of 2% for the total time and for some file of mine with just 807682 triples it was 4% faster (the lower table). My setup was not dedicated to performing the test but ran daily things like video conference next to the tests, this explains the high variance. The measurements were executed alternating (once improved, then master, then again improved, …) to provide equal chances.

branch	user	system	cpu	total
improved	244,75s	7,96s	99,00 %	04:14,95
improved	263,58s	3,91s	99,00 %	04:28,22
improved	240,33s	3,85s	99,00 %	04:04,50
improved	226,38s	3,04s	99,00 %	03:49,66
improved	224,06s	2,69s	99,00 %	03:46,95
improved	220,00s	2,74s	99,00 %	03:43,05
improved	213,12s	2,37s	99,00 %	03:35,66

master	266,15s	3,06s	99,00 %	04:29,43
master	291,91s	3,39s	99,00 %	04:55,52
master	241,72s	2,91s	99,00 %	04:04,85
master	222,25s	2,52s	99,00 %	03:45,02
master	208,31s	2,28s	99,00 %	03:30,64
master	225,60s	2,14s	99,00 %	03:47,79
master	218,31s	5,93s	99,00 %	03:44,31

branch	user	system	cpu	total
improved	60,59s	0,96s	99,00 %	01:01,60
improved	60,04s	0,88s	99,00 %	01:00,93

master	64,03s	0,68s	99,00 %	01:04,73
master	62,26s	0,65s	99,00 %	01:03,06

So if the code is corrected, still this is missing: #1271 (comment), it can be merged.

white-gecko · 2021-03-05T16:36:40Z

The tests fail now, I can reproduce this locally, tests pass for 8086068 but fail for eb5e025. Looking at the code eb5e025 should be correct, but … @ashleysommer can you explain us, what is happening?

rchateauneu · 2021-03-06T10:34:22Z

The tests fail now, I can reproduce this locally, tests pass for 8086068 but fail for eb5e025. Looking at the code eb5e025 should be correct, but … @ashleysommer can you explain us, what is happening?

It looks promising performance-wise but if you do not mind, I will restart from scratch to understand where this failures come from. What do you think ? Could we put this PR in draft mode until it is properly tested and understood ?

white-gecko · 2021-03-08T10:57:08Z

I've converted it to draft mode, as requested. But as it seems the test pass now. Raising KeyError instead of IndexError might be part of understanding the situation. So you agree @rchateauneu ?
@nicholascar does your approval still hold? @ashleysommer do you want to take a look?

rchateauneu · 2021-03-08T12:33:48Z

KeyError instead of IndexError

Yes. This could not work.

ashleysommer · 2021-03-09T00:17:56Z

Hi, I'm looking into this now.
On the surface this looks like a great change. And after fixing the KeyError mixup, looks like tests are passing normally.

I did have in mind to try this kind of optimization when I wrote the new MemoryStore implementation, but I didn't get around to it.
Thanks @rchateauneu this is great work.

ashleysommer · 2021-03-09T00:21:09Z

KeyError instead of IndexError

Lately I've been using LookupError it catches both KeyError and IndexError, and means you don't have to think about or guess what kind of structure you're accessing.

rchateauneu · 2021-03-09T17:15:23Z

I have done more tests to check the memory behaviour using tracemalloc. The test script, using english-wordnet-2020.ttl, is attached:
import.py.txt

The result are identical, with about three runs each. Surprisingly, the same code might give slightly different number if the machine has a different load.

NEW VERSION SHA-1: 53164b2168e950364ee704c1ace9bb6cfd7817f5 (branch memory_only)

#1: rdflib/store.py:216: 866369.5 KiB
    self.dispatcher.dispatch(TripleAddedEvent(triple=triple, context=context))
#2: rdflib/term.py:236: 702543.7 KiB
    rt = str.__new__(cls, value)
#3: stores/memory.py:257: 395230.7 KiB
    p = sp[subject] = {}
282 other: 1763466.8 KiB
Total allocated size: 3727610.6 KiB

ORIGINAL SHA-1: dca850e9f649689b73c685148f7f01d7448bf4d8 (branch master)

#1: rdflib/store.py:216: 866369.5 KiB
    self.dispatcher.dispatch(TripleAddedEvent(triple=triple, context=context))
#2: rdflib/term.py:236: 702543.7 KiB
    rt = str.__new__(cls, value)
#3: stores/memory.py:257: 395230.7 KiB
    p = sp[subject] = {}
282 other: 1763466.7 KiB
Total allocated size: 3727610.6 KiB

white-gecko · 2021-03-09T18:34:57Z

Now we are done :-) Thank you.

Speedup __add_triple_context.

8086068

(cherry picked from commit 85e1cfe)

nicholascar requested review from ashleysommer, nicholascar and white-gecko March 5, 2021 11:05

nicholascar approved these changes Mar 5, 2021

View reviewed changes

white-gecko reviewed Mar 5, 2021

View reviewed changes

correctly delete default triple context info

eb5e025

white-gecko approved these changes Mar 5, 2021

View reviewed changes

KeyError instead of IndexError.

53164b2

rchateauneu requested a review from white-gecko March 6, 2021 16:51

white-gecko marked this pull request as draft March 8, 2021 10:52

ashleysommer approved these changes Mar 9, 2021

View reviewed changes

white-gecko approved these changes Mar 9, 2021

View reviewed changes

white-gecko marked this pull request as ready for review March 9, 2021 18:34

white-gecko merged commit 83a6fda into RDFLib:master Mar 9, 2021

white-gecko added this to the rdflib 6.0.0 milestone Mar 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup __add_triple_context. #1271

Speedup __add_triple_context. #1271

rchateauneu commented Mar 3, 2021 •

edited by white-gecko

Loading

coveralls commented Mar 3, 2021

coveralls commented Mar 3, 2021

coveralls commented Mar 3, 2021

coveralls commented Mar 3, 2021 •

edited

Loading

nicholascar left a comment

white-gecko Mar 5, 2021

rchateauneu Mar 5, 2021 •

edited

Loading

white-gecko Mar 5, 2021

ashleysommer Mar 9, 2021

white-gecko commented Mar 5, 2021

white-gecko commented Mar 5, 2021

rchateauneu commented Mar 6, 2021

white-gecko commented Mar 8, 2021

rchateauneu commented Mar 8, 2021

ashleysommer commented Mar 9, 2021

ashleysommer commented Mar 9, 2021

rchateauneu commented Mar 9, 2021

white-gecko commented Mar 9, 2021 •

edited

Loading

Speedup __add_triple_context. #1271

Speedup __add_triple_context. #1271

Conversation

rchateauneu commented Mar 3, 2021 • edited by white-gecko Loading

coveralls commented Mar 3, 2021

coveralls commented Mar 3, 2021

coveralls commented Mar 3, 2021

coveralls commented Mar 3, 2021 • edited Loading

nicholascar left a comment

Choose a reason for hiding this comment

white-gecko Mar 5, 2021

Choose a reason for hiding this comment

rchateauneu Mar 5, 2021 • edited Loading

Choose a reason for hiding this comment

white-gecko Mar 5, 2021

Choose a reason for hiding this comment

ashleysommer Mar 9, 2021

Choose a reason for hiding this comment

white-gecko commented Mar 5, 2021

white-gecko commented Mar 5, 2021

rchateauneu commented Mar 6, 2021

white-gecko commented Mar 8, 2021

rchateauneu commented Mar 8, 2021

ashleysommer commented Mar 9, 2021

ashleysommer commented Mar 9, 2021

rchateauneu commented Mar 9, 2021

white-gecko commented Mar 9, 2021 • edited Loading

rchateauneu commented Mar 3, 2021 •

edited by white-gecko

Loading

coveralls commented Mar 3, 2021 •

edited

Loading

rchateauneu Mar 5, 2021 •

edited

Loading

white-gecko commented Mar 9, 2021 •

edited

Loading