LifoReclaim causing MDBX_TXN_FULL #123

AskAlexSharov · 2020-10-16T11:44:43Z

Having db with large GC 91072045 pages - GC DBI has 105 entities (I did it intentionally for testing):
Did try to make update of 2000 keys (small values).
Got "MDBX_TXN_FULL Transaction has too many dirty pages, i.e transaction is too big" with next txn_info:
{Id:9604 ReadLag:0 SpaceUsed:687662821376 SpaceLimitSoft:687721144320 SpaceLimitHard:2199023255552 SpaceRetired:4337664 SpaceLeftover:17175486464 SpaceDirty:4374528}
After retry and reboot of app - got same error and info as above.

Disabling of mdbx.LifoReclaim - it solved problem - commit passed. Also large commit for 16Millions keys also passing.
I did enable/disable Coalesce - doesn't change anything.

Can try to gather more info next week.

The text was updated successfully, but these errors were encountered:

erthink · 2020-10-16T11:54:27Z

I think I understood the reason and will fix this flaw immediately after the bug #121.

erthink · 2020-10-22T15:30:25Z

Эта доработка функционально аналогична вашему патчу для ускорения выделения больших пачек мульти-страниц.
Только вместо заданного здесь порога, вам стоит поставить подходящее вам (меньшее) значение.

На днях я добавлю поддержку установки опций и тогда это можно будет задавать через API.

AskAlexSharov · 2020-10-30T14:23:21Z

i started. test will take 2-3 days.

erthink · 2020-10-30T15:00:13Z

I assume that your test will be successful.
But it is need to define a reasonable threshold value (the size of the page list for suspending GC usage) for your scenarios.
The value of this threshold can be set at runtime after implementation #128.

If the threshold is high, the search for large linear sections (for overflow pages) will be proportionally slowed down.
If the threshold is too low (relative to the total number of pages in the database), the database will tend to constantly grow and accumulate pages in GC.

AskAlexSharov · 2020-10-30T16:15:12Z

Yes, it must be bigger than most of values in db. In our case i expecting threshold 1-10Mb

AskAlexSharov · 2021-01-22T08:30:12Z

Did test to drop 40Gb DBI on current devel branch (e0d4eaf)
Got: mdbx_txn_commit_ex: MDBX_KEYEXIST: Key/data pair already exists

AskAlexSharov · 2021-01-22T08:33:17Z

ah, no, probably it's because I still didn't switch to mdbx_drop(delete=false).

erthink · 2021-01-22T10:12:47Z

ah, no, probably it's because I still didn't switch to mdbx_drop(delete=false).

No, that can't be the reason.

The MDBX_KEYEXIST inside mdbx_txn_commit_ex()can only occur when updating the GC/Freelist, i.e., due to an attempt to insert an entry with an already existing key there.

I can't reproduce the problem, it is very likely that the manifestation depends on the internal state of the GC / Freelist (i.e., on the history of operations which can't be restored by mdbx_load).

So I want to ask you again to help me:

apply the patch 0001-more-logging-mdbx_update_gc-for-debug.patch;
build the non-debug (without additional logging) libmdbx;
reproduce the problem and give me the logs.

AskAlexSharov · 2021-01-22T13:25:34Z

Ok. Will do tomorrow. But move to dbi_drop(del=false) helped.
To give you more context: app does drop and create DBI with same name, then start using it. And applying your advise helped #146 (comment)

erthink · 2021-01-25T04:13:02Z

У меня получилось воспроизвести проблему и я примерно понял в чем дело.
Думаю сегодня поправлю.
Какой-либо информации и/или помощи не требуется.

AskAlexSharov · 2021-01-25T05:48:28Z

Отличненько, потому что у меня почему-то не получается

erthink · 2021-01-25T20:13:35Z

Подробности ниже, а если кратко, то был регресс после #123 (comment).

В lifo-режиме при фиксации транзакции, записи в GC могли быть перезаписаны (с утечкой страниц БД), либо могла возникать ошибка MDBX_KEYEXISTS, по следующему сценарию:

В истории БД были две транзакции с огромным кол-вом retired pages, после которых в GC остались две соответствующие записи.
В ходе очередной транзакции вторая из огромных GC-записей попадает в переработку и образует огромный reclaimed list.
При фиксации транзакции производится попытка разбить огромный reclaimed list на чанки размером в одну страницу. Для этого требуется много id для записей, которые в соответствии с LIFO должны быть максимально близки к голове GC, т. е. получены путем переработки последних записей GC от головы к хвосту.
В ходе переработки последних записей очередь доходит до первой огромной записи, при этом переработка прерывается, ибо иначе случилось бы переполнение reclaimed list.
Однако прерывание переработки внутри mdbx_update_gc() трактовалось как отсутствие записей в GC, поэтому список доступных просто добавлялись соответствующие id-шники.
Если в списке доступных id-шников для помещения в GC были переработанные, то записи с id по всему списку удалялись - тогда вторая большая запись (и возможно предыдущие) удалялись, а содержащиеся в них номера страниц выпадали из оборота.
Если же в списке доступных id-шников не было переработанных, то чистка не проводилась - тогда при последующая попытка помещения чанков reclaimed list в GC завершалась ошибкой MDBX_KEYEXISTS, которая и возвращалась из mdbx_commit_ex().

AskAlexSharov mentioned this issue Oct 16, 2020

mdbx support erigontech/erigon#1235

Merged

erthink mentioned this issue Nov 1, 2020

mdbx_txn_commit_ex: MDBX_KEYEXIST: Key/data pair already exists #131

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LifoReclaim causing MDBX_TXN_FULL #123

LifoReclaim causing MDBX_TXN_FULL #123

AskAlexSharov commented Oct 16, 2020 •

edited

Loading

erthink commented Oct 16, 2020

erthink commented Oct 22, 2020

AskAlexSharov commented Oct 30, 2020

erthink commented Oct 30, 2020

AskAlexSharov commented Oct 30, 2020

AskAlexSharov commented Jan 22, 2021

AskAlexSharov commented Jan 22, 2021

erthink commented Jan 22, 2021

AskAlexSharov commented Jan 22, 2021

erthink commented Jan 25, 2021

AskAlexSharov commented Jan 25, 2021

erthink commented Jan 25, 2021 •

edited

Loading

LifoReclaim causing MDBX_TXN_FULL #123

LifoReclaim causing MDBX_TXN_FULL #123

Comments

AskAlexSharov commented Oct 16, 2020 • edited Loading

erthink commented Oct 16, 2020

erthink commented Oct 22, 2020

AskAlexSharov commented Oct 30, 2020

erthink commented Oct 30, 2020

AskAlexSharov commented Oct 30, 2020

AskAlexSharov commented Jan 22, 2021

AskAlexSharov commented Jan 22, 2021

erthink commented Jan 22, 2021

AskAlexSharov commented Jan 22, 2021

erthink commented Jan 25, 2021

AskAlexSharov commented Jan 25, 2021

erthink commented Jan 25, 2021 • edited Loading

AskAlexSharov commented Oct 16, 2020 •

edited

Loading

erthink commented Jan 25, 2021 •

edited

Loading