Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concept: Cache module #805

Open
Bastian opened this issue Jul 7, 2021 · 4 comments
Open

Concept: Cache module #805

Bastian opened this issue Jul 7, 2021 · 4 comments
Labels
🔨 breaking-change An issue or pull requests that would be a breaking change high priority An issue or pull request with a high priority ✨ feature request New feature or request

Comments

@Bastian
Copy link
Member

Bastian commented Jul 7, 2021

I've already suggested a design for an external cache module in #454 (comment).
However, since this suggestion was not well received because of the immutability, this issue proposes a mutable external cache that also allows sharing data between multiple shards and allows for different data stores like Redis.

Similar to the immutable cache, the mutable cache holds simple immutable and serializable POJOs that do not contain references to other entities except by their ids.

Since the module is independent of Javacord, we can use another JVM language like Kotlin to reduce the necessary boilerplate.

Storing data objects

  • The DataEntity interface that every data object has to extend:
interface DataEntity {
    val indices: Array<Index>
}

data class Index(
    val name: String,
    val value: String
)
  • The data entities itself
@Serializable
data class ServerData(
    val id: Long,
    val name: String,
    // More fields ...
): DataEntity {
    override val indices: Array<Index>
        get() = arrayOf(Index("ID", id.toString()))
}
@Serializable
data class ServerTextChannelData(
    val id: Long,
    val name: String,
    val serverId: Long,
    // More fields ...
): DataEntity {
    override val indices: Array<Index>
        get() = arrayOf(
            Index("ID", id.toString()),
            Index("SERVER", serverId.toString()),
        )
}
  • A DataStore that hold the data entities and manages their indices
interface DataStore {
    fun insert(value: DataEntity)
    fun get(clazz: Class<out DataEntity>, indexName: String, indexValue: String): List<DataEntity>
    fun delete(clazz: Class<out DataEntity>, indexName: String, indexValue: String)
}

The implementation of the data store is extremely simple and can look like this for a classic in-memory store:

class InMemoryDataStore: DataStore {

    val entities = HashMap<String, MutableList<DataEntity>>()
    val reverseIndex = HashMap<DataEntity, Array<String>>()

    override fun insert(value: DataEntity) {
        val indexKeys = value.indices.map {
            value::class.simpleName + "-" + it.name + "-" + it.value
        }.toTypedArray()
        indexKeys.forEach {
            val list = entities[it] ?: mutableListOf()
            list.add(value)
        }
        reverseIndex[value] = indexKeys
    }

    override fun get(clazz: Class<out DataEntity>, indexName: String, indexValue: String): List<DataEntity> {
        return entities[clazz.simpleName + "-" + indexName + "-" + indexValue] ?: listOf()
    }

    override fun delete(clazz: Class<out DataEntity>, indexName: String, indexValue: String) {
        val results = get(clazz, indexName, indexValue)
        results.forEach { entity ->
            val indexKeys = reverseIndex[entity]
            reverseIndex.remove(entity)
            indexKeys?.forEach {
                val list = entities[it] ?: mutableListOf()
                list.remove(entity)
                if (list.isEmpty()) {
                    entities.remove(it)
                }
            }
        }

    }
}

Since all entities can be serialized (e.g. to JSON), it's easy to create other data stores like Redis. A mixture of different data stores for different data classes (e.g. only users/members in Redis) is also possible and easy to implement.

Using the data objects

We don't want to expose the data objects themselves but keep our current API. The current entities can be constructed as needed:

public class Server {

    private final DataStore dataStore;
    private ServerData data;

    public Server(DataStore dataStore, ServerData data) {
        this.dataStore = dataStore;
        this.data = data;
    }

    public String getName() {
        refreshData(data.getId());
        return data.getName();
    }

    public List<ServerTextChannel> getTextChannels() {
        return dataStore.get(ServerTextChannelData.class, "SERVER_ID", String.valueOf(data.getId()))
            .stream()
            .map(dataEntity -> new ServerTextChannel(dataStore, (ServerTextChannelData) dataEntity))
            .collect(Collectors.toList());
    }

    private void refreshData(long id) {
        List<DataEntity> entities = dataStore.get(
            ServerData.class, "ID", String.valueOf(id)
        );
        if (!entities.isEmpty()) {
            data = (ServerData) entities.get(0);
        }
    }

}
public class ServerTextChannel {

    private final DataStore dataStore;
    private ServerTextChannelData data;
    private Server server;

    public ServerTextChannel(DataStore dataStore, ServerTextChannelData data) {
        this.dataStore = dataStore;
        this.data = data;
        this.server = new Server(dataStore, getServerData());
    }

    public String getName() {
        refreshData(data.getId());
        return data.getName();
    }

    public Server getServer() {
        return this.server;
    }

    private ServerData getServerData() {
        List<DataEntity> entities = dataStore.get(
            ServerData.class, "ID", String.valueOf(data.getServerId())
        );
        if (entities.isEmpty()) {
            return null;
        }
        return (ServerData) entities.get(0);
    }

    private void refreshData(long id) {
        List<DataEntity> entities = dataStore.get(
            ServerTextChannelData.class, "ID", String.valueOf(id)
        );
        if (!entities.isEmpty()) {
            data = (ServerTextChannelData) entities.get(0);
        }
    }

}

The two important differences to the previously suggested immutable cache are, that

  1. Children keep references to their parents. This is necessary because elements might no longer be in the data store when methods like #getServer() are called. The other direction is no problem. For example, the deletion of a server would only result in an empty list when #getChannels() is called.
  2. Before returning data for method calls like #getName(), we refresh the cache. This makes sure that we always return the latest data. For data that cannot change (e.g. the id), a refresh is not necessary.

Updating the cache

Updating the cache is very straightforward:

  • Delete the old entity from the cache
  • Insert the new one

Since entities do not store references to other objects, it should be possible to simply map Discord's gateway entities to the data objects and replace them in our cache with very little effort. E.g. for a channel creation or deletion, we can simply add/remove the channel from the cache without having to update any references in the server object (like we currently have to).

Considerations

  • The code above only exists to visualize the concept. There is a lot of space for improvements, e.g. to simplify the code and get rid of some race-conditions
  • Always refreshing the cache can be expensive when using any other data store than the in-memory one (traffic + deserialization). There are multiple ways to compensate for this. One option would be to not do the refresh automatically when using a non-in-memory cache but provide a manual #refresh() method that can be called. Another option that would not require any actions from the user is by introducing a caching layer in the store implementation that keeps frequently used entities in local memory.
  • Keeping references to entities is still a bad practice. However, the effects are now different. During normal operation, keeping entities for an extended period is now slightly worse because we regularly create new entities and do not re-use old ones. However, entities can now "survive" reconnects and keeping "parent" instances (e.g. servers) is much cheaper since they do not have references to all their "children" (e.g., channels and members).
  • Hybrid caches are possible and easy to implement (e.g., a cache that only stores members and users in an external database and the remaining ones in-memory).
  • Indices are simply created by concatenating entity name, key name, and key with a - in-between. This is probably secure (since we control both the entity name and the key name) but it might be worth giving it a second thought.
  • Using external databases (especially non-in-memory ones like MySQL) would result in many methods to become blocking without returning a CompletableFuture. I believe that this is an acceptable compromise. Especially, if you consider that in practice, you don't want every entity to be in a non-memory cache but probably only users and maybe messages. Users that require an external cache should be able to deal with these trade-offs.
  • This implementation would hardly change anything for the user-facing API. This would mainly be an internal change + the possibility to store entities in external databases
  • It would be possible to gradually migrate to the new cache
  • It would be possible to share entities across shards.
  • The entities (i.e., Server and ServerTextChannel) in the examples do not use interfaces. This would not be the case in the final implementation, but I wanted to keep the examples simple for demonstration purposes
  • Many things are not covered by the example implementation above (e.g. what happens on reconnects), but I don't think there's anything that is a deal-breaker that prevents this concept from working
@Bastian Bastian added ✨ feature request New feature or request low priority An issue or pull request with a low priority 🔨 breaking-change An issue or pull requests that would be a breaking change labels Jul 7, 2021
@Bastian
Copy link
Member Author

Bastian commented Jul 8, 2021

From discord/discord-api-docs#2184 (comment):

[...] It is not feasible to serve the entire member list of every guild to bots, and as guild sizes continue to grow it is unlikely we will support this operation over gateway forever. If you haven't already, I would recommend taking steps to move to persisting and caching data. Having stale data is generally considered OK, and you can update data as you need to instead.

This makes this issue even more important. We should expect, that one day, Discord will remove or at least severely limit the options to request all members of a server. When this happens, we should provide a way for users to have a (semi-)persistent cache for users.

@Bastian Bastian added high priority An issue or pull request with a high priority and removed low priority An issue or pull request with a low priority labels Jul 8, 2021
@felldo
Copy link
Member

felldo commented Jul 9, 2021

However, since this suggestion was not well received because of the immutability, this issue proposes a mutable external cache that also allows sharing data between multiple shards and allows for different data stores like Redis.

I think an immutable cache is still possible. If I am not wrong with this cache design it should be possible to have a mutable or immutable cache without the need to have a more complex code base:

private void refreshData(long id) {
       if(api.isImmutableCache){
           return;
       }

        List<DataEntity> entities = dataStore.get(
            ServerTextChannelData.class, "ID", String.valueOf(id)
        );
        if (!entities.isEmpty()) {
            data = (ServerTextChannelData) entities.get(0);
        }
    }

With a simple flag that can be set we can decide whether we want to actually refresh the data from the cache or not. This results in either a mutable or immutable enity.
Having this immutable behaviour should also be very performant as we do not refresh the data from the cache everytime a method is called and at the same time still allows us to share the cache among other bot instances.

@Bastian
Copy link
Member Author

Bastian commented Jul 9, 2021

This results in either a mutable or immutable enity.

This will not work for collections (e.g., Server#getChannels()) though. These are always fetched from the cache and the cache itself is mutable.

@felldo
Copy link
Member

felldo commented Jul 9, 2021

Then this will result in a sort of semi immutable. But once you get the channels from the cache they are immutable entities which you can store in a variable to process them in your task. I think that's a good trade off though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔨 breaking-change An issue or pull requests that would be a breaking change high priority An issue or pull request with a high priority ✨ feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants