Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization for adding new vertex and updating vertex properties #4292

Merged
merged 1 commit into from
Mar 8, 2024

Conversation

ntisseyre
Copy link
Contributor

  • Improving operations to detect vertex changes during transaction-commit from O(N) to O(1);
  • Optimizing properties search by key for newly-created vertex;
  • Optimizing previous property/edge search during vertex update.
  • Adding logging for large transactions to track the progress and time-spent on each step.

With this change we have observed a significant improvment for a large transaction commit, especially when lots of new vertex have been created.

Copy link
Member

@porunov porunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ntisseyre ! This is a great contribution!
I didn't have enough time to check the whole PR yet, but I will be reviewing it this week and adding my comments if I have some.
Below are my comments so far.

Comment on lines 700 to 742
long startTimeStamp = System.currentTimeMillis();
int totalMutations = addedRelations.size() + deletedRelations.size();
boolean isBigDataSet = totalMutations >= backend.getBufferSize();
if(isBigDataSet) {
log.info("0. Prepare commit for mutations count=" + totalMutations);
}

if(isBigDataSet) {
log.info("1. Collect deleted edges and their index updates and acquire edge locks");
}
prepareCommitDeletes(deletedRelations, filter, mutator, tx, acquireLocks, mutations, mutatedProperties, indexUpdates);

if(isBigDataSet) {
log.info("2. Collect added edges and their index updates and acquire edge locks");
}
prepareCommitAdditions(addedRelations, filter, mutator, tx, acquireLocks, mutations, mutatedProperties, indexUpdates);

if(isBigDataSet) {
log.info("3. Collect all index update for vertices");
}
prepareCommitVertexIndexUpdates(mutatedProperties, indexUpdates);

if(isBigDataSet) {
log.info("4. Acquire index locks (deletions first)");
}
prepareCommitAcquireIndexLocks(indexUpdates, mutator, acquireLocks);

if(isBigDataSet) {
log.info("5. Add relation mutations");
}
prepareCommitAddRelationMutations(mutations, mutator, tx);

if(isBigDataSet) {
log.info("6. Add index updates");
}
boolean has2iMods = prepareCommitIndexUpdatesAndCheckIfAnyMixedIndexUsed(indexUpdates, mutator);

if(isBigDataSet) {
long duration = System.currentTimeMillis() - startTimeStamp;
log.info("7. Prepare commit is done with mutated vertex count=" + mutations.size() + " in duration=" + duration);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have several thoughts about it and I will split them in sections.

Thought 1:

I would rather not populate such processing logs as INFO as it could be quite noisy . On the other side, users can configure their logging level for StandardJanusGraph class and increase it to WARN if they don't want to see such info logs.
Perhaps it would be better to keep this logs as debug level logs instead of info level logs. Moreover, we may even move them to their own logger, so that users can optionally increase or decrease logging level for this specific logger without innervating to main log of this class. For example, you could create the following logger and reuse it instead of log:

private static final Logger logForPrepareCommit =
    LoggerFactory.getLogger(StandardJanusGraph.class.getName()+".prepareCommit");

I would prefer still keeping the usage of these logs at debug level because users without experience in configuring different logging levels for different logger names via log4J configuration may find it complicated to disable this log if they want other info logs to present.
If user want to keep this debug log enabled but using only info level for all other logs then the user can simply configure log level as INFO for root level logger and DEBUG for this logger org.janusgraph.graphdb.database.StandardJanusGraph.prepareCommit.

Thought 2:

Generally having logs instead of metrics may be convenient to analyse specific chain of evens. However, it will require to retrieve logs in specific structure, configure logger to include thread id for each log (otherwise it won't be obvious to what calls what log is related to), and then feed it to some logging processing systems and configure those logs processing system to properly monitor those logs (I guess using Elasticsearch Kibana or DataDog or else). However, it seems more complex and generally when this is done the recommended approach to use structured logging and / or logging tags.

So, in this example I think I would prefer moving logging parameters out into the log parameters.
I.e. instead of the following log

log.info("7. Prepare commit is done with mutated vertex count=" + mutations.size() + " in duration=" + duration);

I would do something like:

MDC.put("duration", String.valueOf(duration));
MDC.put("mutationsSize", String.valueOf("mutations.size()"));
log.info("7. Prepare commit is done");
MDC.remove("duration");
MDC.remove("mutationsSize");

Thought 3:

Generally I prefer using spans opentracing tracing via spans to analyze hot spots in the code. In such case we could enable profiling for some time and see all the chain all code execution and how much time did we spend at each place for each call. Profiling is usually more resource hungry, so typically it makes sense to enable it short term when we suspect something isn't right. I didn't check if work with the current Dropwizard version, and used it only with Micrometer, but it seems it should work as described here:
https://github.com/opentracing-contrib/java-dropwizard/blob/master/dropwizard-opentracing/README.rst

I don't ask to integrate profiling in this PR, but I just wanted to leave my thoughts in case we want to integrate profiling into JanusGraph later. I assume it may help us in the long run to see hot parts in JanusGraph and fix them as we go.

Thought 4:

Collecting metrics via logs would work to represent a chain of execution, but it's not that useful for analysis over a longer period of time (unless metrics are derived by logs processing systems which are basically normal metrics).
As such, if the purpose of these logs is to monitor behavior of prepareCommit over extended period of time then the normal metrics might work better in this situation. Of course, logs can be useful as well and maybe it's just uncommon to see big data sets here and thus, logging of such events make sense to trigger necessary actions after we detected a big dataset. However, if the big dataset commits is a typical and expected flow and the only reason is to monitor performance over extended period of time, then metrics may suit better. Again, it could be even a combination of both logs + metrics if that's necessary.

JanusGraph has a default MetricManager.INSTANCE which provides some utility method and a reference to the main Dropwizard MetricRegistry. It can be reused to populate metrics to different systems (for example, can be used to make JMX reports).

In this situation, if we want to collect amount of used mutations, timing for the method execution, and timing for each step execution, as well as average timing of execution per single mutation then something like this could be used (didn't test it):

final Timer.Context timeContext = MetricManager.INSTANCE.getTimer(tx.getConfiguration().getGroupName(), "tx", "prepareCommit", "time").time();
try {
    try (final Timer.Context tc = MetricManager.INSTANCE.getTimer(tx.getConfiguration().getGroupName(), "tx", "prepareCommit", "deletes", "time").time()) {
        // 1. Collect deleted edges and their index updates and acquire edge locks
        prepareCommitDeletes(deletedRelations, filter, mutator, tx, acquireLocks, mutations, mutatedProperties, indexUpdates);
    }
    try (final Timer.Context tc = MetricManager.INSTANCE.getTimer(tx.getConfiguration().getGroupName(), "tx", "prepareCommit", "additions", "time").time()) {
        // 2. Collect added edges and their index updates and acquire edge locks
        prepareCommitAdditions(addedRelations, filter, mutator, tx, acquireLocks, mutations, mutatedProperties, indexUpdates);
    }
    // etc...
} finally {
    long elapsedTime = timeContext.stop(); // This returns the elapsed time in nanoseconds and tracks how much in total the processing took.
    if(!mutations.isEmpty()){ // prevent division on 0
        double averageTimePerElement = (double) elapsedTime / mutations.size(); // Get average processing time per single mutation
        MetricManager.INSTANCE.getCounter(tx.getConfiguration().getGroupName(), "tx", "prepareCommit", "mutations", "count").inc(mutations.size()); // Can be useful to track total calls for a specific time window.
        MetricManager.INSTANCE.getHistogram(tx.getConfiguration().getGroupName(), "tx", "prepareCommit", "mutations", "histogram").update(mutations.size()); // Probably optional, but may be useful to see statistical distribution of mutation amount per `prepareCommit` call
        MetricManager.INSTANCE.getHistogram(tx.getConfiguration().getGroupName(), "tx", "prepareCommit", "mutation", "avg", "time").update((long) averageTimePerElement); // Track statistical distribution of average processing time per single mutation.
    }
}

Conclusion:

Depending on the purpose of these logs, we may chose different approaches to improve user experience. I added my thoughts of some solutions I see, but I welcome any other solutions as well.
Also, it's not necessary to resolve all the issues in one PR. We can always move outstanding work to separate issue / issues and complete it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your response!
Initially, I have been using these log lines to track which step was taking too long for a big transaction.-commit
This log is not very noisy since it is only enabled for large transactions and was used for analytics to understand the bottlenecks in prepareCommit functionality.

So, I agree that this log is helpful only for some time and we can move it to the DEBUG level. I also like the idea of

private static final Logger logForPrepareCommit =
    LoggerFactory.getLogger(StandardJanusGraph.class.getName()+".prepareCommit");

+1 on adding tracing to the logs. I wanted to propose it in different PRs. In our applications we use a lot of Zipking tracing, I have never used OpenTracing before. Will be happy to investigate this question further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on adding tracing to the logs. I wanted to propose it in different PRs. In our applications we use a lot of Zipking tracing, I have never used OpenTracing before. Will be happy to investigate this question further.

Definitely. It's not necessary to introduce tracing in this PR. I think it will better suit in a separate PR / PRs.
For me it's the opposite, I used OpenTracing but not Zipking. I don't have strong opinion on which one to use for tracing, so I'm good with any viable solution.

I believe here it would be good to simply switch to a separate logger usage (Logger logForPrepareCommit) and change it from info to debug. We can then open separate issues to track regarding tracing or metrics improvement.

@porunov
Copy link
Member

porunov commented Mar 5, 2024

BTW the failing jobs are only because of linter complaining about unused imports in some classes:

Error:  /home/runner/work/janusgraph/janusgraph/janusgraph-core/src/main/java/org/janusgraph/graphdb/relations/CacheVertexProperty.java:29:8: Unused import - java.util.ArrayList. [UnusedImports]
Error:  /home/runner/work/janusgraph/janusgraph/janusgraph-core/src/main/java/org/janusgraph/graphdb/relations/CacheVertexProperty.java:30:8: Unused import - java.util.List. [UnusedImports]
Error:  /home/runner/work/janusgraph/janusgraph/janusgraph-core/src/main/java/org/janusgraph/graphdb/relations/CacheEdge.java:32:8: Unused import - java.util.ArrayList. [UnusedImports]
Error:  /home/runner/work/janusgraph/janusgraph/janusgraph-core/src/main/java/org/janusgraph/graphdb/relations/CacheEdge.java:33:8: Unused import - java.util.List. [UnusedImports]
Error:  /home/runner/work/janusgraph/janusgraph/janusgraph-core/src/main/java/org/janusgraph/graphdb/transaction/addedrelations/SimpleAddedRelations.java:29:17: Using the '.*' form of import should be avoided - java.util.*. [AvoidStarImport]

@porunov porunov added this to the Release v1.1.0 milestone Mar 5, 2024
@ntisseyre ntisseyre force-pushed the update_add_optimization branch 7 times, most recently from 527fdfa to d3f2975 Compare March 6, 2024 12:41
@ntisseyre ntisseyre force-pushed the update_add_optimization branch from d3f2975 to f2a0d83 Compare March 6, 2024 15:12
Copy link
Member

@porunov porunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some questions during review, but for each question I had I checked the code and was like "Ah, that's why" 😅 . Thank you @ntisseyre for this improvement!
I have only one commend to add a comment regarding a collector usage and a reminder to remove unused property. That's said, both comments are nitpicks and can be closed separately if you want. LGTM.

@JanusGraph/committers if someone wants to review this PR, please do. Otherwise it will be merged using lazy consensus on March 14.

Copy link
Member

@li-boxuan li-boxuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great and look forward to more contributions from you team!

}

public JanusGraphVertexProperty addProperty(VertexProperty.Cardinality cardinality, JanusGraphVertex vertex, PropertyKey key, Object value, Long id) {
public JanusGraphVertexProperty addProperty(VertexProperty.Cardinality cardinality, JanusGraphVertex vertex, PropertyKey key, Object value, Long id, long previousId) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer Long to long. Three reasons: 1) In the future, we might want to support property id of other non-primitive types. 2) Existing code is using Long. 3) Using "0" as "undefined" is not a well-established contract in JanusGraph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to Long

@@ -0,0 +1,69 @@
// Copyright 2022 Unified Catalog Team
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is a legal requirement of your company - I presume not because in that case this shall be your company's name rather than a team's name which only makes sense within the company, but the common practice in JanusGraph is to put JanusGraph in the copyright.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, it is by mistake when I was cherry-picking from my forked version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

for (Map<StaticBuffer, KCVEntryMutation> store : mutations.values()) {
for (KCVEntryMutation mut : store.values()) mut.consolidate();
}
mutations.values().parallelStream().forEach(store -> store.values().forEach(KCVEntryMutation::consolidate));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no objection but have you considered the negative implications to small transactions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could introduce some threshold limit. In case the mutations amount is less then N then we kick the old sequential approach. Otherwise, we use parallel approach. I guess this N could be different on different servers, but if we add a configuration parameter to that then users could configure their own N.
As the default threshold limit we could use something that is good enough on different servers.
I didn't test it myself, so also interesting to know when one approach is better than another.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I will add a configuration property

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

    public static final ConfigOption<Integer> NUM_MUTATIONS_PARALLEL_THRESHOLD = new ConfigOption<>(STORAGE_NS,"num-mutations-parallel-threshold",
        "Number of mutations after which use parallel processing for transaction aggregations.",
    ConfigOption.Type.MASKABLE, 100, ConfigOption.positiveInt());

@ntisseyre ntisseyre force-pushed the update_add_optimization branch 2 times, most recently from f1a1acd to 2f8c26f Compare March 7, 2024 16:37
@mbrukman
Copy link
Member

mbrukman commented Mar 7, 2024

Thank you for this and prior contributions, @ntisseyre!

Please add yourself to CONTRIBUTORS.txt and add the name of your employer (who I presume owns the copyright to these contributions, given your use of your work email) to AUTHORS.txt.

@ntisseyre ntisseyre force-pushed the update_add_optimization branch from 2f8c26f to c7cd4f7 Compare March 7, 2024 17:31
@porunov
Copy link
Member

porunov commented Mar 7, 2024

Thank you for this and prior contributions, @ntisseyre!

Please add yourself to CONTRIBUTORS.txt and add the name of your employer (who I presume owns the copyright to these contributions, given your use of your work email) to AUTHORS.txt.

Those lists are not maintained as far as I know. Most of JanusGraph contributors are not in those lists.

@porunov
Copy link
Member

porunov commented Mar 7, 2024

Thank you for this and prior contributions, @ntisseyre!

Please add yourself to CONTRIBUTORS.txt and add the name of your employer (who I presume owns the copyright to these contributions, given your use of your work email) to AUTHORS.txt.

If we want to fix that than I propose to do that in a separate PR and not mixing up with a functionality PR.

@ntisseyre ntisseyre force-pushed the update_add_optimization branch from c7cd4f7 to 1d75ed1 Compare March 7, 2024 18:19
Signed-off-by: ntisseyre <ntisseyre@apple.com>
@ntisseyre ntisseyre force-pushed the update_add_optimization branch from 1d75ed1 to 9d463b1 Compare March 7, 2024 23:13
@porunov porunov merged commit f3700c4 into JanusGraph:master Mar 8, 2024
174 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants