Skip to content
This repository has been archived by the owner on Nov 9, 2020. It is now read-only.

optimizations, bug fixes from private fork #26

Open
georgechrysanthakopoulos opened this issue Nov 17, 2017 · 4 comments
Open

optimizations, bug fixes from private fork #26

georgechrysanthakopoulos opened this issue Nov 17, 2017 · 4 comments

Comments

@georgechrysanthakopoulos
Copy link
Contributor

georgechrysanthakopoulos commented Nov 17, 2017

i am no loner tied to xenon main line but i will occasionally open Issues for things i have fixed/improved on my xenon variant:

  1. NodeGroupService merge logic has a flaw: it ignores update from remote node, if it has marked it UNAVAILABLE more recently than it, itself report available. this can happen due to clock drift and has a trivial fix: do this in the merge function:

the key is the new remoteEntry.equals check, which will accept the change if the reporting node is the owner for that NodeStatus entry

            if (remoteEntry.documentVersion == currentEntry.documentVersion && needsUpdate) {
                // pick update with most recent time, even if that is prone to drift and jitter
                // between nodes, except, if the remote entry is the owner for this node status
                if (!remoteEntry.id.equals(remotePeerState.documentOwner)
                        && remoteEntry.documentUpdateTimeMicros < currentEntry.documentUpdateTimeMicros) {
                    logWarning(
                            "Ignoring update for %s from peer %s. Local status: %s, remote status: %s",
                            remoteEntry.id, remotePeerState.documentOwner, currentEntry.status,
                            remoteEntry.status);
                    continue;
                }
            }
  1. Change the ServiceHost executors to use async mode for fork join pool. Gives a small perf boost (5->10%)

  2. Instead of using pragmas to express NO_INDEX_UPDATE or FORWARDING_DISABLED, i expanded OperationOption and added INDEXING_DISABLED, FORWARDING_DISABLED and then modified all places that check for the pragmas to also check for options. This removes lots of alllocation for operations that will be in the same host anyway (during service stop for example)

  3. Factory service minor change to NOT forward, ever, for a direct client POST, if the child service does NOT have owner selection, but is just replicated instead. This works well for pure replication services, and an external load balancer, that already load balanced to different nodes

  4. new ServiceOption.CUSTOM_INSTRUMENTATION. 10% boost in throughput in stateful services that do not need the core operation tracking stats, but DO need custom stats. It requires changes all over, but udner the covers, and its backwards compatible (since it is a new option). AVAILABLE, CREATE_COUNT, etc all core stats use the CUSTOM_INSTRUMENTATION option so they dont force stats on all services, by accident, which is what happens today

  5. removed web socket support. Not used now that we have SSE

  6. reduction in allocations in FactoryService, by removing the PARENT replication header, just using a parentUri field in Operation.remoteCOntext

  7. moved SSE handler fields to Operation.remoteContent so we dont bloat the size of ALL operations, even just internal ones!!

this is all just fyi. i cant create pull requests anymore since my version of xenon has diverged (faster, smaller, not backwards compatible due to method removals and renames).

I still pull manually changes from main xenon, but i cant push back my changes.

fyi @sufiand @asafka @gbelur @toliaqat @ttddyy

you can leave this Issue open, and i can post updates on occasion. Feel free to create pivotal tracker items for individual items I report

@georgechrysanthakopoulos
Copy link
Contributor Author

another one:
LuceneDocumentIndexService, on bulk document expiration sends DELETE requests, without the NO_FORWARDING pragma. This means, that when the same document expires across N nodes in a node group, we issue DELETEs from all nodes, to all nodes, causing a lot of wasted churn

@georgechrysanthakopoulos
Copy link
Contributor Author

@jvassev fyi on the above

@asafka
Copy link
Contributor

asafka commented Nov 19, 2017

Thanks Geoge, we will take a look.

@asafka
Copy link
Contributor

asafka commented Nov 19, 2017

Sorry for the typo George.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants