Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
The query of reports has the following format: ```.sql SELECT col1, col2 FROM reports WHERE col IN (SELECT col FROM reports WHERE conditions); ``` This is almost the same as ```.sql SELECT col1, col2 FROM reports WHERE conditions; ``` The difference comes when the reports of two runs on different dates are queried. The first version returns all reports from all runs of which the bug hash has been selected by the filter, the second one selects only those reports which match the condition. SQL queries are translated to query strategies by PostgreSQL. This strategy determines the algorithm for gathering resulting records from the database. The first query generates a strategy where the inner select is implemented with a nested loop. Such a nested loop on "reports" table is so slow that it times out. In this commit the query is rewritten to the second form. But even if the second form is used, PostgreSQL may generate a strategy with inner loop in some cases. The generated strategy not only depends on the query but on table size and many other things too. So it is possible that later this timeout issue will come back. One technique for minimizing nested loops is not using table JOINS if possible. For this reason a query has been split to two queries in order to avoid joining "files" table. The results of the two queries are merged in Python code. In order to avoid unnecessary table joins, we join only those tables which occur in some query condition. Earlier there were some unnecessary joins, for example the columns of files is not used anywhere in the query: ```.sql SELECT col1, col2 FROM reports LEFT JOIN files ON ... WHERE col1 = 42; ```
- Loading branch information