Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Improve index scan to batch select rows from base table #3103

Closed
ndeodhar opened this issue Dec 6, 2019 · 1 comment
Closed

[YSQL] Improve index scan to batch select rows from base table #3103

ndeodhar opened this issue Dec 6, 2019 · 1 comment
Assignees
Labels
area/ysql Yugabyte SQL (YSQL)
Milestone

Comments

@ndeodhar
Copy link
Contributor

ndeodhar commented Dec 6, 2019

Today, for index scans, we first scan the index table to get the rows of interest. Then for each row, we send 1 read query per row to tserver. We should batch these reads from tserver instead.

@ndeodhar ndeodhar added the area/ysql Yugabyte SQL (YSQL) label Dec 6, 2019
@ndeodhar ndeodhar added this to the v2.1 milestone Dec 6, 2019
nocaway added a commit that referenced this issue Feb 25, 2020
Summary:
**Background**:
The query
        SELECT <data> FROM <table> WHERE ybctid IN (SELECT base_ybctid FROM <index>)
Currently, after getting ybctids from IndexTable, ybctid values are sent to tablet server one at a time to select one row at a time from the storage.
To improve the performance, ybctid values are now sent in batches.

**IndexScan's New process**
- Fetch ybctids from IndexTable in batches.  Size of each batch is determined by the yql_prefetch_limit gflag.
- The selected ybctids are then grouped by hash-code buckets that are associated with the table tablets.
- The ybctids batches are sent to tablet servers to query rows in batches.
- The above steps are repeated until all ybctids are read from the index.

**Implementation**
(1) New data structures
Two different SELECT classes are introduced, PgSelect and PgSelectIndex.  PgSelect is to query data from table. PgSelectIndex is to query ybctid from IndexTable.
- Sequential and PrimaryKey scans  use PgSelect to query data from table.
- IndexOnlyScan uses PgSelectIndex to query data from IndexTable
- IndexScan uses both PgSelect and PgSelectIndex.
When index-scanning system catalogs, PgSelect and PgSelectIndex combine their read-request into one. This has been done in the past.

(2) Parallel processing ybctids when querying data.
- For each partition, one read request is created.
- Ybctid values within a partition are added to their associated read request.
- The requests are sent to select data.
- PgGate also keeps track of the orders of ybctid values and send them to users in the same order as the indexing order.

Test Plan: Testing is in progress

Reviewers: mihnea

Reviewed By: mihnea

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D7952
@nocaway
Copy link
Contributor

nocaway commented Feb 25, 2020

Fixed by eeab192

@nocaway nocaway closed this as completed Feb 25, 2020
nocaway added a commit that referenced this issue Mar 13, 2020
Summary:
This regression is due the following commit
eeab192
```
commit eeab192
Author: neil <nocaway@users.noreply.github.com>
Date:   2020-02-24

    [YSQL] #3103 Improve performance when running index scan to query data
```

When no-ybctid is found from the IndexTable, PgGate should stop the processing effort and return empty result-set. However, it continues processing SELECT from the main table and issues a read request for full-scan.

This bug only affect the scenarios where no ybctid is found by the secondary-index scan.  When ybctids are found, PgGate issues a read request for only those ybctids.

Test Plan: Extend test yb_perf_secondary_index_scan.sql for the reported issue.

Reviewers: kannan, mihnea

Reviewed By: kannan, mihnea

Subscribers: kannan, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D8118
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL)
Projects
None yet
Development

No branches or pull requests

2 participants