You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Lazy evaluation in Postgres is an optimization to save memory and possibly computation in cases
when an SQL function is supposed to return the results of the last statement (and it is a
SELECT) and the caller of the SQL function doesn't require all results from the function at
once.
Consider the following example with 2 functions (an SQL function and a PLPGSQL function):
CREATE OR REPLACE FUNCTION get_rows() RETURNS TABLE(z VARCHAR(100))
...
...
SELECT * from t;
CREATE OR REPLACE PROCEDURE outer_func() LANGUAGE PLPGSQL
...
...
FOR target IN SELECT get_rows() LOOP
....
In outer_func, there is a FOR LOOP that iterates over the results of the get_rows SQL function.
One way to perform this execution would be to fetch all records of the last SELECT statement in
get_rows() and store them in memory. Then, the FOR LOOP can perform each iteration on a new row
picked from memory. However, this requires using a possibly large amount of memory. Also, in
case the FOR LOOP was to exit early based on some condition, the rest of the rows stored in
memory would be rendered useless along with the computation that was done to fetch them in the
get_rows function. For this reason, Postgres allows SQL functions which have a SELECT statement
at the end to be lazily evaluable. This means that the function allows a caller to request rows
one at a time -- hence removing the need for the function to fetch all rows of that SELECT and
store in memory.
Note that in READ COMMITTED isolation, each statement in a volatile function runs on a new
snapshot. Lazy evaluation requires executing a function's last SELECT statement partially
multiple times. But all rows from a lazily evaluated SELECT statement should always correspond
to a consistent snapshot. But, other execution in the backend between two calls to a lazily
evaluable function might require running on a newer snapshot. Postgres handles this by storing
snapshot information in the lazy function's execution state and switching back to that snapshot
when context moves back to the function to fetch more rows. The below code is where Postgres
switches back to a saved snapshot when re-entering a lazily evaluated function -
/* Re-establish active snapshot when re-entering function ...
PushActiveSnapshot(es->qd->snapshot);
In YB, a consistent read point is the analogue of a snapshot. The information about a
consistent read point is maintained in pg_client_session.cc on the tserver process of the same
node. Only one consistent read point can be stored for a transaction at a single point in time.
YSQL is allowed to change the consistent read point in limited ways: change it to the current
time as seen by the tserver process, set it to the restart read time it in case of a read
restart or update it to a specific value in case of backfill. But YSQL doesn't have the
capability to read the consistent read point that is chosen by the tserver process if set to
the current time. So, it is not possible to switch back to the consistent read point of a
lazily evaluated query.
This limitation results in later invocations of a lazily evaluated query to possibly use later
consistent read points that were set as part of other execution between invocations to the
lazily evaluated query. This breaks correctness since the data now doesn't correspond to a
single snapshot.
This limitation will be addressed in #12959. Till we add the necessary framework to allow
saving and reusing snapshots, YSQL disables lazy evaluation in READ COMMITTED isolation to
avoid correctness issues that can stem from not using a single snapshot for a lazily evaluated
query.
The text was updated successfully, but these errors were encountered:
pkj415
changed the title
[YSQL] Block lazy evaluation in functions in READ COMMITTED isolation level
[YSQL] Disable lazy evaluation in functions in READ COMMITTED isolation level
Aug 8, 2022
Jira Link: DB-3051
Description
Lazy evaluation in Postgres is an optimization to save memory and possibly computation in cases
when an SQL function is supposed to return the results of the last statement (and it is a
SELECT) and the caller of the SQL function doesn't require all results from the function at
once.
Consider the following example with 2 functions (an SQL function and a PLPGSQL function):
In outer_func, there is a FOR LOOP that iterates over the results of the get_rows SQL function.
One way to perform this execution would be to fetch all records of the last SELECT statement in
get_rows() and store them in memory. Then, the FOR LOOP can perform each iteration on a new row
picked from memory. However, this requires using a possibly large amount of memory. Also, in
case the FOR LOOP was to exit early based on some condition, the rest of the rows stored in
memory would be rendered useless along with the computation that was done to fetch them in the
get_rows function. For this reason, Postgres allows SQL functions which have a SELECT statement
at the end to be lazily evaluable. This means that the function allows a caller to request rows
one at a time -- hence removing the need for the function to fetch all rows of that SELECT and
store in memory.
Note that in READ COMMITTED isolation, each statement in a volatile function runs on a new
snapshot. Lazy evaluation requires executing a function's last SELECT statement partially
multiple times. But all rows from a lazily evaluated SELECT statement should always correspond
to a consistent snapshot. But, other execution in the backend between two calls to a lazily
evaluable function might require running on a newer snapshot. Postgres handles this by storing
snapshot information in the lazy function's execution state and switching back to that snapshot
when context moves back to the function to fetch more rows. The below code is where Postgres
switches back to a saved snapshot when re-entering a lazily evaluated function -
In YB, a consistent read point is the analogue of a snapshot. The information about a
consistent read point is maintained in pg_client_session.cc on the tserver process of the same
node. Only one consistent read point can be stored for a transaction at a single point in time.
YSQL is allowed to change the consistent read point in limited ways: change it to the current
time as seen by the tserver process, set it to the restart read time it in case of a read
restart or update it to a specific value in case of backfill. But YSQL doesn't have the
capability to read the consistent read point that is chosen by the tserver process if set to
the current time. So, it is not possible to switch back to the consistent read point of a
lazily evaluated query.
This limitation results in later invocations of a lazily evaluated query to possibly use later
consistent read points that were set as part of other execution between invocations to the
lazily evaluated query. This breaks correctness since the data now doesn't correspond to a
single snapshot.
This limitation will be addressed in #12959. Till we add the necessary framework to allow
saving and reusing snapshots, YSQL disables lazy evaluation in READ COMMITTED isolation to
avoid correctness issues that can stem from not using a single snapshot for a lazily evaluated
query.
The text was updated successfully, but these errors were encountered: