Optimization for adding new vertex and updating vertex properties #4292

ntisseyre · 2024-03-04T23:45:11Z

Improving operations to detect vertex changes during transaction-commit from O(N) to O(1);
Optimizing properties search by key for newly-created vertex;
Optimizing previous property/edge search during vertex update.
Adding logging for large transactions to track the progress and time-spent on each step.

With this change we have observed a significant improvment for a large transaction commit, especially when lots of new vertex have been created.

porunov

Thank you @ntisseyre ! This is a great contribution!
I didn't have enough time to check the whole PR yet, but I will be reviewing it this week and adding my comments if I have some.
Below are my comments so far.

...aph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/cache/CacheTransaction.java

porunov · 2024-03-05T04:08:14Z

janusgraph-core/src/main/java/org/janusgraph/graphdb/database/StandardJanusGraph.java

+        long startTimeStamp = System.currentTimeMillis();
+        int totalMutations = addedRelations.size() + deletedRelations.size();
+        boolean isBigDataSet = totalMutations >= backend.getBufferSize();
+        if(isBigDataSet) {
+            log.info("0. Prepare commit for mutations count=" + totalMutations);
+        }
+
+        if(isBigDataSet) {
+            log.info("1. Collect deleted edges and their index updates and acquire edge locks");
+        }
        prepareCommitDeletes(deletedRelations, filter, mutator, tx, acquireLocks, mutations, mutatedProperties, indexUpdates);
+
+        if(isBigDataSet) {
+            log.info("2. Collect added edges and their index updates and acquire edge locks");
+        }
        prepareCommitAdditions(addedRelations, filter, mutator, tx, acquireLocks, mutations, mutatedProperties, indexUpdates);
+
+        if(isBigDataSet) {
+            log.info("3. Collect all index update for vertices");
+        }
        prepareCommitVertexIndexUpdates(mutatedProperties, indexUpdates);
+
+        if(isBigDataSet) {
+            log.info("4. Acquire index locks (deletions first)");
+        }
        prepareCommitAcquireIndexLocks(indexUpdates, mutator, acquireLocks);
+
+        if(isBigDataSet) {
+            log.info("5. Add relation mutations");
+        }
        prepareCommitAddRelationMutations(mutations, mutator, tx);
+
+        if(isBigDataSet) {
+            log.info("6. Add index updates");
+        }
        boolean has2iMods = prepareCommitIndexUpdatesAndCheckIfAnyMixedIndexUsed(indexUpdates, mutator);

+        if(isBigDataSet) {
+            long duration = System.currentTimeMillis() - startTimeStamp;
+            log.info("7. Prepare commit is done with mutated vertex count=" + mutations.size() + " in duration=" + duration);
+        }


I have several thoughts about it and I will split them in sections.

Thought 1:

I would rather not populate such processing logs as INFO as it could be quite noisy . On the other side, users can configure their logging level for StandardJanusGraph class and increase it to WARN if they don't want to see such info logs.
Perhaps it would be better to keep this logs as debug level logs instead of info level logs. Moreover, we may even move them to their own logger, so that users can optionally increase or decrease logging level for this specific logger without innervating to main log of this class. For example, you could create the following logger and reuse it instead of log:

private static final Logger logForPrepareCommit = LoggerFactory.getLogger(StandardJanusGraph.class.getName()+".prepareCommit");

I would prefer still keeping the usage of these logs at debug level because users without experience in configuring different logging levels for different logger names via log4J configuration may find it complicated to disable this log if they want other info logs to present.
If user want to keep this debug log enabled but using only info level for all other logs then the user can simply configure log level as INFO for root level logger and DEBUG for this logger org.janusgraph.graphdb.database.StandardJanusGraph.prepareCommit.

Thought 2:

Generally having logs instead of metrics may be convenient to analyse specific chain of evens. However, it will require to retrieve logs in specific structure, configure logger to include thread id for each log (otherwise it won't be obvious to what calls what log is related to), and then feed it to some logging processing systems and configure those logs processing system to properly monitor those logs (I guess using Elasticsearch Kibana or DataDog or else). However, it seems more complex and generally when this is done the recommended approach to use structured logging and / or logging tags.

So, in this example I think I would prefer moving logging parameters out into the log parameters.
I.e. instead of the following log

log.info("7. Prepare commit is done with mutated vertex count=" + mutations.size() + " in duration=" + duration);

I would do something like:

MDC.put("duration", String.valueOf(duration)); MDC.put("mutationsSize", String.valueOf("mutations.size()")); log.info("7. Prepare commit is done"); MDC.remove("duration"); MDC.remove("mutationsSize");

Thought 3:

Generally I prefer using spans opentracing tracing via spans to analyze hot spots in the code. In such case we could enable profiling for some time and see all the chain all code execution and how much time did we spend at each place for each call. Profiling is usually more resource hungry, so typically it makes sense to enable it short term when we suspect something isn't right. I didn't check if work with the current Dropwizard version, and used it only with Micrometer, but it seems it should work as described here:
https://github.com/opentracing-contrib/java-dropwizard/blob/master/dropwizard-opentracing/README.rst

I don't ask to integrate profiling in this PR, but I just wanted to leave my thoughts in case we want to integrate profiling into JanusGraph later. I assume it may help us in the long run to see hot parts in JanusGraph and fix them as we go.

Thought 4:

Collecting metrics via logs would work to represent a chain of execution, but it's not that useful for analysis over a longer period of time (unless metrics are derived by logs processing systems which are basically normal metrics).
As such, if the purpose of these logs is to monitor behavior of prepareCommit over extended period of time then the normal metrics might work better in this situation. Of course, logs can be useful as well and maybe it's just uncommon to see big data sets here and thus, logging of such events make sense to trigger necessary actions after we detected a big dataset. However, if the big dataset commits is a typical and expected flow and the only reason is to monitor performance over extended period of time, then metrics may suit better. Again, it could be even a combination of both logs + metrics if that's necessary.

JanusGraph has a default MetricManager.INSTANCE which provides some utility method and a reference to the main Dropwizard MetricRegistry. It can be reused to populate metrics to different systems (for example, can be used to make JMX reports).

In this situation, if we want to collect amount of used mutations, timing for the method execution, and timing for each step execution, as well as average timing of execution per single mutation then something like this could be used (didn't test it):

final Timer.Context timeContext = MetricManager.INSTANCE.getTimer(tx.getConfiguration().getGroupName(), "tx", "prepareCommit", "time").time(); try { try (final Timer.Context tc = MetricManager.INSTANCE.getTimer(tx.getConfiguration().getGroupName(), "tx", "prepareCommit", "deletes", "time").time()) { // 1. Collect deleted edges and their index updates and acquire edge locks prepareCommitDeletes(deletedRelations, filter, mutator, tx, acquireLocks, mutations, mutatedProperties, indexUpdates); } try (final Timer.Context tc = MetricManager.INSTANCE.getTimer(tx.getConfiguration().getGroupName(), "tx", "prepareCommit", "additions", "time").time()) { // 2. Collect added edges and their index updates and acquire edge locks prepareCommitAdditions(addedRelations, filter, mutator, tx, acquireLocks, mutations, mutatedProperties, indexUpdates); } // etc... } finally { long elapsedTime = timeContext.stop(); // This returns the elapsed time in nanoseconds and tracks how much in total the processing took. if(!mutations.isEmpty()){ // prevent division on 0 double averageTimePerElement = (double) elapsedTime / mutations.size(); // Get average processing time per single mutation MetricManager.INSTANCE.getCounter(tx.getConfiguration().getGroupName(), "tx", "prepareCommit", "mutations", "count").inc(mutations.size()); // Can be useful to track total calls for a specific time window. MetricManager.INSTANCE.getHistogram(tx.getConfiguration().getGroupName(), "tx", "prepareCommit", "mutations", "histogram").update(mutations.size()); // Probably optional, but may be useful to see statistical distribution of mutation amount per `prepareCommit` call MetricManager.INSTANCE.getHistogram(tx.getConfiguration().getGroupName(), "tx", "prepareCommit", "mutation", "avg", "time").update((long) averageTimePerElement); // Track statistical distribution of average processing time per single mutation. } }

Conclusion:

Depending on the purpose of these logs, we may chose different approaches to improve user experience. I added my thoughts of some solutions I see, but I welcome any other solutions as well.
Also, it's not necessary to resolve all the issues in one PR. We can always move outstanding work to separate issue / issues and complete it later.

Thank you for your response!
Initially, I have been using these log lines to track which step was taking too long for a big transaction.-commit
This log is not very noisy since it is only enabled for large transactions and was used for analytics to understand the bottlenecks in prepareCommit functionality.

So, I agree that this log is helpful only for some time and we can move it to the DEBUG level. I also like the idea of

private static final Logger logForPrepareCommit = LoggerFactory.getLogger(StandardJanusGraph.class.getName()+".prepareCommit");

+1 on adding tracing to the logs. I wanted to propose it in different PRs. In our applications we use a lot of Zipking tracing, I have never used OpenTracing before. Will be happy to investigate this question further.

+1 on adding tracing to the logs. I wanted to propose it in different PRs. In our applications we use a lot of Zipking tracing, I have never used OpenTracing before. Will be happy to investigate this question further.

Definitely. It's not necessary to introduce tracing in this PR. I think it will better suit in a separate PR / PRs.
For me it's the opposite, I used OpenTracing but not Zipking. I don't have strong opinion on which one to use for tracing, so I'm good with any viable solution.

I believe here it would be good to simply switch to a separate logger usage (Logger logForPrepareCommit) and change it from info to debug. We can then open separate issues to track regarding tracing or metrics improvement.

porunov · 2024-03-05T04:17:48Z

BTW the failing jobs are only because of linter complaining about unused imports in some classes:

Error:  /home/runner/work/janusgraph/janusgraph/janusgraph-core/src/main/java/org/janusgraph/graphdb/relations/CacheVertexProperty.java:29:8: Unused import - java.util.ArrayList. [UnusedImports]
Error:  /home/runner/work/janusgraph/janusgraph/janusgraph-core/src/main/java/org/janusgraph/graphdb/relations/CacheVertexProperty.java:30:8: Unused import - java.util.List. [UnusedImports]
Error:  /home/runner/work/janusgraph/janusgraph/janusgraph-core/src/main/java/org/janusgraph/graphdb/relations/CacheEdge.java:32:8: Unused import - java.util.ArrayList. [UnusedImports]
Error:  /home/runner/work/janusgraph/janusgraph/janusgraph-core/src/main/java/org/janusgraph/graphdb/relations/CacheEdge.java:33:8: Unused import - java.util.List. [UnusedImports]
Error:  /home/runner/work/janusgraph/janusgraph/janusgraph-core/src/main/java/org/janusgraph/graphdb/transaction/addedrelations/SimpleAddedRelations.java:29:17: Using the '.*' form of import should be avoided - java.util.*. [AvoidStarImport]

janusgraph-core/src/main/java/org/janusgraph/graphdb/database/StandardJanusGraph.java

...src/main/java/org/janusgraph/graphdb/transaction/addedrelations/AddedPropertiesSetValue.java

porunov

I had some questions during review, but for each question I had I checked the code and was like "Ah, that's why" 😅 . Thank you @ntisseyre for this improvement!
I have only one commend to add a comment regarding a collector usage and a reminder to remove unused property. That's said, both comments are nitpicks and can be closed separately if you want. LGTM.

@JanusGraph/committers if someone wants to review this PR, please do. Otherwise it will be merged using lazy consensus on March 14.

janusgraph-core/src/main/java/org/janusgraph/graphdb/database/StandardJanusGraph.java

...src/main/java/org/janusgraph/graphdb/transaction/addedrelations/AddedPropertiesSetValue.java

li-boxuan

This is great and look forward to more contributions from you team!

li-boxuan · 2024-03-07T05:27:49Z

janusgraph-core/src/main/java/org/janusgraph/graphdb/transaction/StandardJanusGraphTx.java

    }

-    public JanusGraphVertexProperty addProperty(VertexProperty.Cardinality cardinality, JanusGraphVertex vertex, PropertyKey key, Object value, Long id) {
+    public JanusGraphVertexProperty addProperty(VertexProperty.Cardinality cardinality, JanusGraphVertex vertex, PropertyKey key, Object value, Long id, long previousId) {


I prefer Long to long. Three reasons: 1) In the future, we might want to support property id of other non-primitive types. 2) Existing code is using Long. 3) Using "0" as "undefined" is not a well-established contract in JanusGraph.

Fixed to Long

li-boxuan · 2024-03-07T05:31:38Z

...rc/main/java/org/janusgraph/graphdb/transaction/addedrelations/AddedPropertiesListValue.java

@@ -0,0 +1,69 @@
+// Copyright 2022 Unified Catalog Team


I am not sure if this is a legal requirement of your company - I presume not because in that case this shall be your company's name rather than a team's name which only makes sense within the company, but the common practice in JanusGraph is to put JanusGraph in the copyright.

sorry, it is by mistake when I was cherry-picking from my forked version.

li-boxuan · 2024-03-07T05:34:47Z

...aph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/cache/CacheTransaction.java

-            for (Map<StaticBuffer, KCVEntryMutation> store : mutations.values()) {
-                for (KCVEntryMutation mut : store.values()) mut.consolidate();
-            }
+            mutations.values().parallelStream().forEach(store -> store.values().forEach(KCVEntryMutation::consolidate));


I have no objection but have you considered the negative implications to small transactions?

Perhaps we could introduce some threshold limit. In case the mutations amount is less then N then we kick the old sequential approach. Otherwise, we use parallel approach. I guess this N could be different on different servers, but if we add a configuration parameter to that then users could configure their own N.
As the default threshold limit we could use something that is good enough on different servers.
I didn't test it myself, so also interesting to know when one approach is better than another.

Sounds good to me. I will add a configuration property

Done.

public static final ConfigOption<Integer> NUM_MUTATIONS_PARALLEL_THRESHOLD = new ConfigOption<>(STORAGE_NS,"num-mutations-parallel-threshold", "Number of mutations after which use parallel processing for transaction aggregations.", ConfigOption.Type.MASKABLE, 100, ConfigOption.positiveInt());

mbrukman · 2024-03-07T16:59:17Z

Thank you for this and prior contributions, @ntisseyre!

Please add yourself to CONTRIBUTORS.txt and add the name of your employer (who I presume owns the copyright to these contributions, given your use of your work email) to AUTHORS.txt.

porunov · 2024-03-07T17:42:43Z

Thank you for this and prior contributions, @ntisseyre!

Please add yourself to CONTRIBUTORS.txt and add the name of your employer (who I presume owns the copyright to these contributions, given your use of your work email) to AUTHORS.txt.

Those lists are not maintained as far as I know. Most of JanusGraph contributors are not in those lists.

porunov · 2024-03-07T17:46:13Z

Thank you for this and prior contributions, @ntisseyre!

Please add yourself to CONTRIBUTORS.txt and add the name of your employer (who I presume owns the copyright to these contributions, given your use of your work email) to AUTHORS.txt.

If we want to fix that than I propose to do that in a separate PR and not mixing up with a functionality PR.

...rc/main/java/org/janusgraph/graphdb/transaction/addedrelations/AddedPropertiesListValue.java

...src/main/java/org/janusgraph/graphdb/transaction/addedrelations/AddedPropertiesSetValue.java

.../main/java/org/janusgraph/graphdb/transaction/addedrelations/AddedPropertiesSingleValue.java

...re/src/main/java/org/janusgraph/graphdb/transaction/addedrelations/AddedPropertiesValue.java

...raph-core/src/main/java/org/janusgraph/graphdb/configuration/GraphDatabaseConfiguration.java

docs/configs/janusgraph-cfg.md

Signed-off-by: ntisseyre <ntisseyre@apple.com>

porunov reviewed Mar 5, 2024

View reviewed changes

FlorianHockmann added kind/performance area/core labels Mar 5, 2024

porunov added this to the Release v1.1.0 milestone Mar 5, 2024

ntisseyre force-pushed the update_add_optimization branch 7 times, most recently from 527fdfa to d3f2975 Compare March 6, 2024 12:41

porunov reviewed Mar 6, 2024

View reviewed changes

janusgraph-core/src/main/java/org/janusgraph/graphdb/database/StandardJanusGraph.java Outdated Show resolved Hide resolved

porunov reviewed Mar 6, 2024

View reviewed changes

...src/main/java/org/janusgraph/graphdb/transaction/addedrelations/AddedPropertiesSetValue.java Outdated Show resolved Hide resolved

porunov added the backport/skip label Mar 6, 2024

ntisseyre force-pushed the update_add_optimization branch from d3f2975 to f2a0d83 Compare March 6, 2024 15:12

porunov approved these changes Mar 7, 2024

View reviewed changes

janusgraph-core/src/main/java/org/janusgraph/graphdb/database/StandardJanusGraph.java Show resolved Hide resolved

...src/main/java/org/janusgraph/graphdb/transaction/addedrelations/AddedPropertiesSetValue.java Outdated Show resolved Hide resolved

li-boxuan reviewed Mar 7, 2024

View reviewed changes

ntisseyre force-pushed the update_add_optimization branch 2 times, most recently from f1a1acd to 2f8c26f Compare March 7, 2024 16:37

ntisseyre force-pushed the update_add_optimization branch from 2f8c26f to c7cd4f7 Compare March 7, 2024 17:31

porunov reviewed Mar 7, 2024

View reviewed changes

ntisseyre force-pushed the update_add_optimization branch from c7cd4f7 to 1d75ed1 Compare March 7, 2024 18:19

porunov reviewed Mar 7, 2024

View reviewed changes

docs/configs/janusgraph-cfg.md Outdated Show resolved Hide resolved

Optimization for adding new vertex and updating vertex properties

9d463b1

Signed-off-by: ntisseyre <ntisseyre@apple.com>

ntisseyre force-pushed the update_add_optimization branch from 1d75ed1 to 9d463b1 Compare March 7, 2024 23:13

porunov approved these changes Mar 7, 2024

View reviewed changes

li-boxuan approved these changes Mar 8, 2024

View reviewed changes

porunov merged commit f3700c4 into JanusGraph:master Mar 8, 2024
174 checks passed

porunov mentioned this pull request Nov 7, 2024

JanusGraph release 1.1.0 [cql-tests] [tp-tests] #4723

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization for adding new vertex and updating vertex properties #4292

Optimization for adding new vertex and updating vertex properties #4292

ntisseyre commented Mar 4, 2024

porunov left a comment

porunov Mar 5, 2024

ntisseyre Mar 5, 2024

porunov Mar 5, 2024

porunov commented Mar 5, 2024

porunov left a comment

li-boxuan left a comment

li-boxuan Mar 7, 2024

ntisseyre Mar 7, 2024

li-boxuan Mar 7, 2024

ntisseyre Mar 7, 2024

ntisseyre Mar 7, 2024

li-boxuan Mar 7, 2024

porunov Mar 7, 2024

ntisseyre Mar 7, 2024

ntisseyre Mar 7, 2024

mbrukman commented Mar 7, 2024

porunov commented Mar 7, 2024

porunov commented Mar 7, 2024

Optimization for adding new vertex and updating vertex properties #4292

Optimization for adding new vertex and updating vertex properties #4292

Conversation

ntisseyre commented Mar 4, 2024

porunov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Thought 1:

Thought 2:

Thought 3:

Thought 4:

Conclusion:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

porunov commented Mar 5, 2024

porunov left a comment

Choose a reason for hiding this comment

li-boxuan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbrukman commented Mar 7, 2024

porunov commented Mar 7, 2024

porunov commented Mar 7, 2024