v1.0: Remove `--max-index-size` and --`max-task-db` configuration options #2077

maryamsulemani97 · 2023-01-03T11:12:25Z

Remove --max-index-size and --max-task-db configuration options

To do

Some may be missing

Update /learn/configuration/instance_options.md
Update learn/advanced/storage.md
Update reference/errors/error_codes.md
Update learn/advanced/known_limitations.md

Reference

Issue on core: Remove --max-index-size and --max-task-db configuration options meilisearch#3231
Spec: Remove --max-index-size and --max-task-db-size specifications#207
SME: @dureuill

The text was updated successfully, but these errors were encountered:

dureuill · 2023-01-10T12:30:14Z

Hey docs team 👋

The removal of these options introduces two limitations¹ that are "new"²:

The number of indexes that can exist simultaneously in a Meilisearch DB becomes around 200 for Linux/macOS and around 20 for Windows. This is due to OS limits on the amount of virtual memory allocatable by a single process.
The size of an index cannot grow beyond 500GiB.

Should we maybe document these limitations somewhere in the documentation?

EDIT: Oh, I'm seeing that Update learn/advanced/known_limitations.md is in the TODO list, so please disregard my message if this is already in the cards 🙏

See this discussion for more context ↩
The tension between the maximum size of an index and the number of indexes always existed, with a total maximum size of about 100TiB for the Unixes (Linux and macOS) and 10TiB for Windows. The changes of v1 merely set in stone the numbers since the max size of an index is now hardcoded to 500GiB, so the resulting number of environments is about 200 for the Unixes and 20 for Windows. ↩

guimachiavelli · 2023-01-16T16:15:59Z

Hi @dureuill!

I'm working on updating these and realised I need some clarification on a few points.

Maximum number of indexes

Why do we say the maximum number of indexes in an instance is around 200 (for unix/unix-like systems)? Is it possible for e.g. linux machine A to have a maximum of 201 indexes while linux machine B only supports 198?

More pragmatically, is the following statement correct?

"A single Meilisearch instance can have up to 200 indexes in Linux and macOS environments."

`database_size_limit_reached`

What will trigger this error? Reaching the maximum size for a single index? Reaching the maximum size for the task db? Trying to create more indexes than your system can support? All of those?

dureuill · 2023-01-16T16:59:36Z

Hi @guimachiavelli 👋

Maximum number of indexes

We're being imprecise on the exact number due to the following reasons:

the number was established empirically by running tests on an Archlinux system and a macOS system
the number is tied to a very low-level configuration detail of the operating system: the size of the virtual memory address space that is available to a single process. This value is different from swap space, RAM amount, or available disk space. I cannot point to where the precise value can be found (for example, on macOS, ulimit -v for "address space (kbytes)" returns unlimited, yet I measured it to be around 100TB in practice with dichotomic tests), it's possible that one has to read the source code of the kernel to find out the precise value¹.
The virtual memory address space is shared for the whole process. While indexes are the main users of the address space, any memory allocation that occurs during the lifecycle of the application takes from that shared address space. For example, meilisearch makes an allocation of 2/3 of the total RAM of the machine at startup². This will take 5.33GB from the address space for a machine with 8GB of RAM, which is pretty insignificant considering one index will take 500GB from that pool, but a machine with 128GB of RAM will take 83GB from the address space, so almost 1/5 of an index, which can make the difference between having 201 or 200 indexes available.
Address space fragmentation can result in the OS being unable to provide a contiguous 500GB region of virtual memory, even if the address space contains enough free memory to have the 500GB in a fragmented manner. This depends on the internal state of the OS allocator and the "history" of previous allocations, which will typically be unique from one execution of meilisearch to the next.

Due to these reasons, it is hard to set a hard limit to the number of available indexes that can coexist in a meilisearch instance. If we need a hard limit, it should be safe to take a smaller number, e.g. 180, which would mean that 900TB from the address space will be taken by indexes, and it is unlikely that fragmentation and other allocations will cause the remaining 100TB to be entirely occupied. This only works for the unixes though, because the address space if much smaller on Windows, and so fragmentation and other allocations can absolutely not be abstracted away.

To summarize:

"A single Meilisearch instance can have up to 200 indexes in Linux and macOS environments."

is not a correct statement. A more conservative statement could be:

"A single Meilisearch instance can safely have up to 180 indexes in Linux and macOS environments. A greater number of indexes might also work without issue, or cause allocation failures depending on the runtime environment of the instance."

`database_size_limit_reached`

What will trigger this error?

Reaching the maximum size for a single index? ✅
Reaching the maximum size for the task db? ✅
Trying to create more indexes than your system can support? ❌

database_size_limit_reached is thrown when an underlying "database" reports that it has filled the virtual memory we allocated for it. A "database" here can refer to a single index, or to the task db.

Trying to create more indexes than your system can support will unfortunately not result in a clear user error: typically, what could occur is that the virtual memory allocation will fail when first sending documents to a freshly created index (the memory is not reserved before this point), reporting an OS-specific "allocation failure". Under Windows where the address space is much smaller, I could also observe an unrelated allocation failing (such as further allocations needed to index documents).

I understand that the situation is subtle and also not very user-friendly. The root cause is that we're allocating the whole address space that an index might ever need upfront, forcing us to choose a "large enough" amount of virtual memory so as to make bigger indexes possible, but not large enough that having multiple indexes becomes an impossibility.

We're currently working on mitigations that would prevent such low-level system details from being exposed to the end-user (such as dynamically resizing the indexes so that they can start with a smaller virtual memory allocation, and closing unused indexes so that we don't have to keep all of them in the virtual memory space), but we didn't want to rush this for v1.

I hope that my answer sheds some light on the current status, feel free to ask if you have further questions :-)

This StackOverflow answer points to 128TiB of userspace virtual memory available to Linux programs. It is interesting that I measured less than that, I wonder if some of that virtual memory is used for some other purposes. ↩
unless one provides the --max-indexing-memory option with a different value. ↩

guimachiavelli · 2023-01-17T11:49:54Z

Thanks for the detailed answer, @dureuill! I think I have enough to move forward, but will soon request your review on the PR to make sure everything's accurate.

2098: v1.0 r=guimachiavelli a=maryamsulemani97 Staging branch for v1.0. Closes #2092, #2087, #2086, #2085, #2082, #2079, #2078, #2077, #2075, #2073, #2072, #2069, #2068, #2067, #2066, #2065 Co-authored-by: maryamsulemani97 <maryam@meilisearch.com> Co-authored-by: gui machiavelli <hey@guimachiavelli.com> Co-authored-by: Maryam <90181761+maryamsulemani97@users.noreply.github.com> Co-authored-by: gui machiavelli <gui@meilisearch.com>

maryamsulemani97 added the new release label Jan 3, 2023

maryamsulemani97 added this to the v1.0 milestone Jan 3, 2023

maryamsulemani97 assigned guimachiavelli Jan 10, 2023

maryamsulemani97 mentioned this issue Jan 10, 2023

v1.0 #2098

Merged

guimachiavelli mentioned this issue Jan 17, 2023

v1: Database size limits #2118

Merged

bors bot closed this as completed in c5dc37c Feb 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0: Remove `--max-index-size` and --`max-task-db` configuration options #2077

v1.0: Remove `--max-index-size` and --`max-task-db` configuration options #2077

maryamsulemani97 commented Jan 3, 2023 •

edited by guimachiavelli

Loading

dureuill commented Jan 10, 2023 •

edited

Loading

guimachiavelli commented Jan 16, 2023 •

edited

Loading

dureuill commented Jan 16, 2023

guimachiavelli commented Jan 17, 2023

v1.0: Remove --max-index-size and --max-task-db configuration options #2077

v1.0: Remove --max-index-size and --max-task-db configuration options #2077

Comments

maryamsulemani97 commented Jan 3, 2023 • edited by guimachiavelli Loading

To do

Reference

dureuill commented Jan 10, 2023 • edited Loading

Footnotes

guimachiavelli commented Jan 16, 2023 • edited Loading

Maximum number of indexes

database_size_limit_reached

dureuill commented Jan 16, 2023

Maximum number of indexes

database_size_limit_reached

Footnotes

guimachiavelli commented Jan 17, 2023

v1.0: Remove `--max-index-size` and --`max-task-db` configuration options #2077

v1.0: Remove `--max-index-size` and --`max-task-db` configuration options #2077

maryamsulemani97 commented Jan 3, 2023 •

edited by guimachiavelli

Loading

dureuill commented Jan 10, 2023 •

edited

Loading

guimachiavelli commented Jan 16, 2023 •

edited

Loading

`database_size_limit_reached`

`database_size_limit_reached`