Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BREAK] Remove cache layer and internal calculated property room.usernames #10749

Merged
merged 62 commits into from
Jul 11, 2018

Conversation

rodrigok
Copy link
Member

@rodrigok rodrigok commented May 12, 2018

This is a test case, could be or could not be merged depending on tests results.

Since we introduced our cache layer we have been testing how to improve our performance in edge scenarios and one of the most important part that could be making our performance bad could be the cache layer, that's why we have this PR removing it to be able to run custom builds in certain deployments and track the performance.

Why we added the cache layer

Some time ago we did some small tests and figure out that the most of our queries would run fast or equal if we do them in memory, that would reduce the stress over the database and would allow us to depend less on oplog since we could reproduce the changes in memory and send the events to the clients without wait by the database oplog or even require the oplog been enabled.

Another positive part was the ability to remove the room.usernames from database, which was na array that at that time was of around 150k strings in our room general at our open/demo server. Using the cache we was able to remount that array in memory only and keep using it wherever we need.

Yet another good part was the ability to have relations in memory where we could find all room of a single user without need to send a query containing an array of thousands of ids to the database since we need to find all the subscriptions, extract the rid properties and then find by them.

We had some good results using that approach.

Why we decide to test the cache removal

Every time our instances crash we need to wait several minutes to them be restarted since the cache load will move 200k users records to memory, 4.7mi of subscriptions and 130k rooms, that stops the process by more then 5 minutes, create a huge load in database and increase the process memory to more than 2MB at the beginning. We force the CPU to deal with that and need to run the relationship methods over that amount of data, all that was causing a problem when we have outages since we could not bring all cloud users back at the same time or our cache load would crash our database.

We found some evidences that we need more instances since each one is busy doing queries in memory or even rebuild the relations when data changes, when we have a few users using the system the cache would be much better then the database in response time, but when we start having more users the CPU of the process start to be bottleneck while the database is free of usage.

We removed the room.usernames at all, since that amount of data in an array does not make sense, of course that forced us to back the big queries and some denormalization, our main goal here is to test that on our open server and see how the processes and database will deal with that change. We will introduce some other important performance improvements later to reduce the amount of data we process for notifications.

How it will affect the forks

Well, we will have a document listing all the APIs we removed and we will try to keep the changes compatible, one of the things we will not be able to keep will be the room.usernames array, if you rely on that we could discuss how to deal with the change.

Work in progress

As soon we have more information and metrics we will update this description.

Breaking Changes

  • Method public-settings/get doesn't return the field type anymore.

rodrigok added 4 commits May 2, 2018 10:18
# Conflicts:
#	packages/rocketchat-lib/server/lib/notifyUsersOnMessage.js
#	packages/rocketchat-lib/server/lib/sendEmailOnMessage.js
#	packages/rocketchat-lib/server/lib/sendNotificationsOnMessage.js
#	packages/rocketchat-lib/server/models/Subscriptions.js
#	packages/rocketchat-mentions/server/Mentions.js
@engelgabriel engelgabriel temporarily deployed to rocket-chat-pr-10749 May 12, 2018 20:24 Inactive
@rodrigok rodrigok temporarily deployed to rocket-chat-pr-10749 May 12, 2018 20:33 Inactive
@rodrigok rodrigok temporarily deployed to rocket-chat-pr-10749 May 12, 2018 20:56 Inactive
@rodrigok rodrigok temporarily deployed to rocket-chat-pr-10749 May 12, 2018 21:09 Inactive
@rodrigok rodrigok temporarily deployed to rocket-chat-pr-10749 May 12, 2018 21:11 Inactive
@RocketChat RocketChat deleted a comment May 12, 2018
@RocketChat RocketChat deleted a comment May 12, 2018
# Conflicts:
#	packages/rocketchat-api/server/api.js
@rodrigok rodrigok temporarily deployed to rocket-chat-pr-10749 May 16, 2018 14:06 Inactive
# Conflicts:
#	client/methods/leaveRoom.js
#	server/methods/browseChannels.js
#	server/methods/getUsersOfRoom.js
@rodrigok rodrigok temporarily deployed to rocket-chat-pr-10749 May 30, 2018 17:44 Inactive
@rodrigok rodrigok temporarily deployed to rocket-chat-pr-10749 May 30, 2018 18:19 Inactive
@rodrigok rodrigok temporarily deployed to rocket-chat-pr-10749 May 30, 2018 18:51 Inactive
@rodrigok rodrigok temporarily deployed to rocket-chat-pr-10749 May 30, 2018 19:01 Inactive
@rodrigok rodrigok temporarily deployed to rocket-chat-pr-10749 May 30, 2018 22:06 Inactive
@rodrigok rodrigok temporarily deployed to rocket-chat-pr-10749 May 30, 2018 22:33 Inactive
@rodrigok rodrigok temporarily deployed to rocket-chat-pr-10749 May 31, 2018 17:51 Inactive
// needs to negate globalPref because userPref represents its opposite
const groupByType = userPref !== undefined ? userPref : globalPref;
const groupByType = userPref !== undefined ? userPref : RocketChat.settings.get('UI_Group_Channels_By_Type');
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to cache?

const subscriptions = RocketChat.models.Subscriptions.findByRoomId(roomId, {fields: {u: 1}}).fetch().filter(s => s.u && s.u._id && s.u.username);
const userIds = subscriptions.map(s => s.u._id);
const options = {fields: {username: 1, name: 1}};
const subscriptions = RocketChat.models.Subscriptions.find({ rid, 'u.username': { $exists: 1 } }, { fields: { 'u._id': 1 } }).fetch();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this find to the model

@ggazzo ggazzo temporarily deployed to rocket-chat-pr-10749 July 2, 2018 22:30 Inactive
# Conflicts:
#	server/startup/migrations/v129.js
@rodrigok rodrigok temporarily deployed to rocket-chat-pr-10749 July 5, 2018 12:37 Inactive
@rodrigok rodrigok temporarily deployed to rocket-chat-pr-10749 July 5, 2018 20:42 Inactive
@rodrigok rodrigok temporarily deployed to rocket-chat-pr-10749 July 6, 2018 12:36 Inactive
@rodrigok rodrigok temporarily deployed to rocket-chat-pr-10749 July 6, 2018 12:38 Inactive
@rodrigok rodrigok temporarily deployed to rocket-chat-pr-10749 July 6, 2018 16:52 Inactive
@rodrigok rodrigok temporarily deployed to rocket-chat-pr-10749 July 6, 2018 17:02 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants