Skip to content

Commit

Permalink
[server] Faster query for reports
Browse files Browse the repository at this point in the history
The query of reports has the following format:

```.sql
SELECT col1, col2 FROM reports WHERE col IN
(SELECT col FROM reports WHERE conditions);
```

This is the same(?) as

```.sql
SELECT col1, col2 FROM reports WHERE conditions;
```

SQL queries are translated to query strategies by PostgreSQL. This
strategy determines the algorithm for gathering resulting records from
the database. The first query generates a strategy where the inner
select is implemented with a nested loop. Such a nested loop on
"reports" table is so slow that it times out. In this commit the query is
rewritten to the second form.

But even if the second form is used, PostgreSQL may generate a strategy
with inner loop in some cases. The generated strategy not only depends
on the query but on table size and many other things too. So it is
possible that later this timeout issue will come back. One technique
for minimizing nested loops is not using table JOINS if possible. For
this reason a query has been split to two queries in order to avoid
joining "files" table. The results of the two queries are merged in
Python code. In order to avoid unnecessary table joins, we join only
those tables which occur in some query condition. Earlier there were
some unnecessary joins, for example the columns of files is not used
anywhere in the query:

```.sql
SELECT col1, col2 FROM reports LEFT JOIN files ON ... WHERE col1 = 42;
```
  • Loading branch information
bruntib committed May 17, 2021
1 parent 46cca7e commit 3c746a4
Showing 1 changed file with 125 additions and 87 deletions.
Loading

0 comments on commit 3c746a4

Please sign in to comment.