-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization for adding new vertex and updating vertex properties #4292
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ntisseyre ! This is a great contribution!
I didn't have enough time to check the whole PR yet, but I will be reviewing it this week and adding my comments if I have some.
Below are my comments so far.
...aph-core/src/main/java/org/janusgraph/diskstorage/keycolumnvalue/cache/CacheTransaction.java
Outdated
Show resolved
Hide resolved
long startTimeStamp = System.currentTimeMillis(); | ||
int totalMutations = addedRelations.size() + deletedRelations.size(); | ||
boolean isBigDataSet = totalMutations >= backend.getBufferSize(); | ||
if(isBigDataSet) { | ||
log.info("0. Prepare commit for mutations count=" + totalMutations); | ||
} | ||
|
||
if(isBigDataSet) { | ||
log.info("1. Collect deleted edges and their index updates and acquire edge locks"); | ||
} | ||
prepareCommitDeletes(deletedRelations, filter, mutator, tx, acquireLocks, mutations, mutatedProperties, indexUpdates); | ||
|
||
if(isBigDataSet) { | ||
log.info("2. Collect added edges and their index updates and acquire edge locks"); | ||
} | ||
prepareCommitAdditions(addedRelations, filter, mutator, tx, acquireLocks, mutations, mutatedProperties, indexUpdates); | ||
|
||
if(isBigDataSet) { | ||
log.info("3. Collect all index update for vertices"); | ||
} | ||
prepareCommitVertexIndexUpdates(mutatedProperties, indexUpdates); | ||
|
||
if(isBigDataSet) { | ||
log.info("4. Acquire index locks (deletions first)"); | ||
} | ||
prepareCommitAcquireIndexLocks(indexUpdates, mutator, acquireLocks); | ||
|
||
if(isBigDataSet) { | ||
log.info("5. Add relation mutations"); | ||
} | ||
prepareCommitAddRelationMutations(mutations, mutator, tx); | ||
|
||
if(isBigDataSet) { | ||
log.info("6. Add index updates"); | ||
} | ||
boolean has2iMods = prepareCommitIndexUpdatesAndCheckIfAnyMixedIndexUsed(indexUpdates, mutator); | ||
|
||
if(isBigDataSet) { | ||
long duration = System.currentTimeMillis() - startTimeStamp; | ||
log.info("7. Prepare commit is done with mutated vertex count=" + mutations.size() + " in duration=" + duration); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have several thoughts about it and I will split them in sections.
Thought 1:
I would rather not populate such processing logs as INFO as it could be quite noisy . On the other side, users can configure their logging level for StandardJanusGraph
class and increase it to WARN
if they don't want to see such info logs.
Perhaps it would be better to keep this logs as debug level logs instead of info level logs. Moreover, we may even move them to their own logger, so that users can optionally increase or decrease logging level for this specific logger without innervating to main log
of this class. For example, you could create the following logger and reuse it instead of log
:
private static final Logger logForPrepareCommit =
LoggerFactory.getLogger(StandardJanusGraph.class.getName()+".prepareCommit");
I would prefer still keeping the usage of these logs at debug level because users without experience in configuring different logging levels for different logger names via log4J configuration may find it complicated to disable this log if they want other info logs to present.
If user want to keep this debug log enabled but using only info level for all other logs then the user can simply configure log level as INFO
for root level logger and DEBUG
for this logger org.janusgraph.graphdb.database.StandardJanusGraph.prepareCommit
.
Thought 2:
Generally having logs instead of metrics may be convenient to analyse specific chain of evens. However, it will require to retrieve logs in specific structure, configure logger to include thread id for each log (otherwise it won't be obvious to what calls what log is related to), and then feed it to some logging processing systems and configure those logs processing system to properly monitor those logs (I guess using Elasticsearch Kibana or DataDog or else). However, it seems more complex and generally when this is done the recommended approach to use structured logging and / or logging tags.
So, in this example I think I would prefer moving logging parameters out into the log parameters.
I.e. instead of the following log
log.info("7. Prepare commit is done with mutated vertex count=" + mutations.size() + " in duration=" + duration);
I would do something like:
MDC.put("duration", String.valueOf(duration));
MDC.put("mutationsSize", String.valueOf("mutations.size()"));
log.info("7. Prepare commit is done");
MDC.remove("duration");
MDC.remove("mutationsSize");
Thought 3:
Generally I prefer using spans opentracing tracing via spans to analyze hot spots in the code. In such case we could enable profiling for some time and see all the chain all code execution and how much time did we spend at each place for each call. Profiling is usually more resource hungry, so typically it makes sense to enable it short term when we suspect something isn't right. I didn't check if work with the current Dropwizard version, and used it only with Micrometer, but it seems it should work as described here:
https://github.com/opentracing-contrib/java-dropwizard/blob/master/dropwizard-opentracing/README.rst
I don't ask to integrate profiling in this PR, but I just wanted to leave my thoughts in case we want to integrate profiling into JanusGraph later. I assume it may help us in the long run to see hot parts in JanusGraph and fix them as we go.
Thought 4:
Collecting metrics via logs would work to represent a chain of execution, but it's not that useful for analysis over a longer period of time (unless metrics are derived by logs processing systems which are basically normal metrics).
As such, if the purpose of these logs is to monitor behavior of prepareCommit
over extended period of time then the normal metrics might work better in this situation. Of course, logs can be useful as well and maybe it's just uncommon to see big data sets here and thus, logging of such events make sense to trigger necessary actions after we detected a big dataset. However, if the big dataset commits is a typical and expected flow and the only reason is to monitor performance over extended period of time, then metrics may suit better. Again, it could be even a combination of both logs + metrics if that's necessary.
JanusGraph has a default MetricManager.INSTANCE
which provides some utility method and a reference to the main Dropwizard MetricRegistry
. It can be reused to populate metrics to different systems (for example, can be used to make JMX reports).
In this situation, if we want to collect amount of used mutations, timing for the method execution, and timing for each step execution, as well as average timing of execution per single mutation then something like this could be used (didn't test it):
final Timer.Context timeContext = MetricManager.INSTANCE.getTimer(tx.getConfiguration().getGroupName(), "tx", "prepareCommit", "time").time();
try {
try (final Timer.Context tc = MetricManager.INSTANCE.getTimer(tx.getConfiguration().getGroupName(), "tx", "prepareCommit", "deletes", "time").time()) {
// 1. Collect deleted edges and their index updates and acquire edge locks
prepareCommitDeletes(deletedRelations, filter, mutator, tx, acquireLocks, mutations, mutatedProperties, indexUpdates);
}
try (final Timer.Context tc = MetricManager.INSTANCE.getTimer(tx.getConfiguration().getGroupName(), "tx", "prepareCommit", "additions", "time").time()) {
// 2. Collect added edges and their index updates and acquire edge locks
prepareCommitAdditions(addedRelations, filter, mutator, tx, acquireLocks, mutations, mutatedProperties, indexUpdates);
}
// etc...
} finally {
long elapsedTime = timeContext.stop(); // This returns the elapsed time in nanoseconds and tracks how much in total the processing took.
if(!mutations.isEmpty()){ // prevent division on 0
double averageTimePerElement = (double) elapsedTime / mutations.size(); // Get average processing time per single mutation
MetricManager.INSTANCE.getCounter(tx.getConfiguration().getGroupName(), "tx", "prepareCommit", "mutations", "count").inc(mutations.size()); // Can be useful to track total calls for a specific time window.
MetricManager.INSTANCE.getHistogram(tx.getConfiguration().getGroupName(), "tx", "prepareCommit", "mutations", "histogram").update(mutations.size()); // Probably optional, but may be useful to see statistical distribution of mutation amount per `prepareCommit` call
MetricManager.INSTANCE.getHistogram(tx.getConfiguration().getGroupName(), "tx", "prepareCommit", "mutation", "avg", "time").update((long) averageTimePerElement); // Track statistical distribution of average processing time per single mutation.
}
}
Conclusion:
Depending on the purpose of these logs, we may chose different approaches to improve user experience. I added my thoughts of some solutions I see, but I welcome any other solutions as well.
Also, it's not necessary to resolve all the issues in one PR. We can always move outstanding work to separate issue / issues and complete it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your response!
Initially, I have been using these log lines to track which step was taking too long for a big transaction.-commit
This log is not very noisy since it is only enabled for large transactions and was used for analytics to understand the bottlenecks in prepareCommit functionality.
So, I agree that this log is helpful only for some time and we can move it to the DEBUG
level. I also like the idea of
private static final Logger logForPrepareCommit =
LoggerFactory.getLogger(StandardJanusGraph.class.getName()+".prepareCommit");
+1 on adding tracing to the logs. I wanted to propose it in different PRs. In our applications we use a lot of Zipking tracing, I have never used OpenTracing before. Will be happy to investigate this question further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on adding tracing to the logs. I wanted to propose it in different PRs. In our applications we use a lot of Zipking tracing, I have never used OpenTracing before. Will be happy to investigate this question further.
Definitely. It's not necessary to introduce tracing in this PR. I think it will better suit in a separate PR / PRs.
For me it's the opposite, I used OpenTracing but not Zipking. I don't have strong opinion on which one to use for tracing, so I'm good with any viable solution.
I believe here it would be good to simply switch to a separate logger usage (Logger logForPrepareCommit
) and change it from info
to debug
. We can then open separate issues to track regarding tracing or metrics improvement.
BTW the failing jobs are only because of linter complaining about unused imports in some classes:
|
527fdfa
to
d3f2975
Compare
janusgraph-core/src/main/java/org/janusgraph/graphdb/database/StandardJanusGraph.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/janusgraph/graphdb/transaction/addedrelations/AddedPropertiesSetValue.java
Outdated
Show resolved
Hide resolved
d3f2975
to
f2a0d83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some questions during review, but for each question I had I checked the code and was like "Ah, that's why" 😅 . Thank you @ntisseyre for this improvement!
I have only one commend to add a comment regarding a collector usage and a reminder to remove unused property. That's said, both comments are nitpicks and can be closed separately if you want. LGTM.
@JanusGraph/committers if someone wants to review this PR, please do. Otherwise it will be merged using lazy consensus on March 14.
janusgraph-core/src/main/java/org/janusgraph/graphdb/database/StandardJanusGraph.java
Show resolved
Hide resolved
...src/main/java/org/janusgraph/graphdb/transaction/addedrelations/AddedPropertiesSetValue.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great and look forward to more contributions from you team!
} | ||
|
||
public JanusGraphVertexProperty addProperty(VertexProperty.Cardinality cardinality, JanusGraphVertex vertex, PropertyKey key, Object value, Long id) { | ||
public JanusGraphVertexProperty addProperty(VertexProperty.Cardinality cardinality, JanusGraphVertex vertex, PropertyKey key, Object value, Long id, long previousId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer Long
to long
. Three reasons: 1) In the future, we might want to support property id of other non-primitive types. 2) Existing code is using Long
. 3) Using "0" as "undefined" is not a well-established contract in JanusGraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed to Long
@@ -0,0 +1,69 @@ | |||
// Copyright 2022 Unified Catalog Team |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is a legal requirement of your company - I presume not because in that case this shall be your company's name rather than a team's name which only makes sense within the company, but the common practice in JanusGraph is to put JanusGraph in the copyright.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, it is by mistake when I was cherry-picking from my forked version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
for (Map<StaticBuffer, KCVEntryMutation> store : mutations.values()) { | ||
for (KCVEntryMutation mut : store.values()) mut.consolidate(); | ||
} | ||
mutations.values().parallelStream().forEach(store -> store.values().forEach(KCVEntryMutation::consolidate)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no objection but have you considered the negative implications to small transactions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could introduce some threshold limit. In case the mutations amount is less then N
then we kick the old sequential approach. Otherwise, we use parallel approach. I guess this N
could be different on different servers, but if we add a configuration parameter to that then users could configure their own N
.
As the default threshold limit we could use something that is good enough on different servers.
I didn't test it myself, so also interesting to know when one approach is better than another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. I will add a configuration property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
public static final ConfigOption<Integer> NUM_MUTATIONS_PARALLEL_THRESHOLD = new ConfigOption<>(STORAGE_NS,"num-mutations-parallel-threshold",
"Number of mutations after which use parallel processing for transaction aggregations.",
ConfigOption.Type.MASKABLE, 100, ConfigOption.positiveInt());
f1a1acd
to
2f8c26f
Compare
Thank you for this and prior contributions, @ntisseyre! Please add yourself to |
2f8c26f
to
c7cd4f7
Compare
Those lists are not maintained as far as I know. Most of JanusGraph contributors are not in those lists. |
If we want to fix that than I propose to do that in a separate PR and not mixing up with a functionality PR. |
...rc/main/java/org/janusgraph/graphdb/transaction/addedrelations/AddedPropertiesListValue.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/janusgraph/graphdb/transaction/addedrelations/AddedPropertiesSetValue.java
Outdated
Show resolved
Hide resolved
.../main/java/org/janusgraph/graphdb/transaction/addedrelations/AddedPropertiesSingleValue.java
Outdated
Show resolved
Hide resolved
...re/src/main/java/org/janusgraph/graphdb/transaction/addedrelations/AddedPropertiesValue.java
Outdated
Show resolved
Hide resolved
...raph-core/src/main/java/org/janusgraph/graphdb/configuration/GraphDatabaseConfiguration.java
Outdated
Show resolved
Hide resolved
c7cd4f7
to
1d75ed1
Compare
Signed-off-by: ntisseyre <ntisseyre@apple.com>
1d75ed1
to
9d463b1
Compare
With this change we have observed a significant improvment for a large transaction commit, especially when lots of new vertex have been created.