-
Because Kafka does. Reasoning that modern operating systems do a great job of caching recently accessed files in memory, which means that most of the blocking IO is to/from memory and therefore fast.
-
And these caching systems are more reliable and battle-hardened than trying to write one yourself.
See more here
-
One directory per topic.
-
Messages are stored as they arrive, concatenated in files.
-
Once a file has grown to a certain size, a new file is started.
-
The files are given arbitrarily unique names.
-
The parent directory contains an index file that enumerates, for each topic, the filename sequence, and for each: it's lowest and highest message number, the oldest and newest message age, and the seek offset for each message number.
- Message storage files are simply the byte sequences comprising the messages - concatenated. A message file, in of itself, has no way of knowing where one message stops, and the next starts.
- The availability of the index almost completely avoids any (slow) seeking operations inside files.
- The seek-like behaviour to delimit and fetch messages for the Poll operation happens on memory slices, after the necessary message store files have been read, in their entirety into memory.
- Makes it possible to determine which message files are relavent to each of the operations without looking inside any of them.
- Moderates the size of message files, so that when one must be read into memory the cost is constrained.
- Reduces the message data-writing cost of the produce operation to only one append operation to one file.
- Makes it possible to do the old-message eviction operation without mutating files - it need only delete whole files.
- The random-looking file names for message storage files avoids any risk of people thinking the names have semantic significance and then mistakenly relying on this.
- It does not scale horizontally.
- The index file must be read and re-written for each of the 3 (produce, consume, evict operations. Although it should remain a relative small file in comparison with the message storage files. And the serialize/deserialize steps are relatively fast - using Gob encoding.
- Access to the the index file is required to be protected with a mutex, thus serializing access to the entire store. (Possible enhancement: Topics could be made completely independent, and each have an index of their own.