New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

New sql schema #53

Merged

darix merged 8 commits into opensuse from new_sql_schema

Jul 20, 2020

Member

darix commented Jul 15, 2020

New proposed database schema

for storing file information and mirror mapping

all the mirr_* functions are not ported yet.

For the details see https://github.com/openSUSE/mirrorbrain/wiki/Roadmap

Member

lrupp commented Jul 16, 2020

+1 from my side.
We started with the new Roadmap in 2019. Time to move forward...

lrupp assigned andrii-suse

lrupp added the Prio1 label

lrupp requested a review from andrii-suse

July 16, 2020 08:09

lrupp unassigned andrii-suse

darix force-pushed the new_sql_schema branch from a9b6418 to 6718e9b Compare

July 16, 2020 08:45

Member Author

darix commented Jul 16, 2020

the comment patch is now already in the opensuse branch so it can be ignored here.

darix added 2 commits

July 16, 2020 11:51


          New proposed database schema

e7790b6

for storing file information and mirror mapping

all the mirr_* functions are not ported yet.

For the details see https://github.com/openSUSE/mirrorbrain/wiki/Roadmap


          Port the hexhash view and helper functions

65cdc1f

darix force-pushed the new_sql_schema branch from cef1eb8 to 65cdc1f Compare

July 16, 2020 09:51


          Return numbers of affected rows where possible

59e3f89

and have consistent naming of all functions

andrii-suse requested changes

View reviewed changes

Collaborator

andrii-suse left a comment

Some comments are less important here, but we should build common understanding on some of the notes.

sql/migrations/schema-postgresql-move-to-mirrors.sql Outdated Show resolved Hide resolved

sql/migrations/schema-postgresql-move-to-mirrors.sql Outdated Show resolved Hide resolved

sql/migrations/schema-postgresql-move-to-mirrors.sql Outdated Show resolved Hide resolved

sql/migrations/schema-postgresql-move-to-mirrors.sql Outdated

+              CREATE TABLE filemetadata
+              (
+                  id integer GENERATED ALWAYS AS IDENTITY,

Collaborator

andrii-suse Jul 16, 2020

I suggest going with bigint: currently max id in db is above 900M, which is close to 50% of int capacity. So limit may be reached in few years

Member Author

darix Jul 16, 2020

will do

sql/migrations/schema-postgresql-move-to-mirrors.sql Outdated Show resolved Hide resolved

sql/migrations/schema-postgresql-move-to-mirrors.sql Outdated

+                DELETE FROM filemetadata
+                  WHERE id IN (
+                    SELECT filemetadata_id
+                      FROM filemetadata_mirror_count

Collaborator

andrii-suse Jul 16, 2020

Following query should do the same job and is much lighter, because it doesn't calculate exact counts (m.id will be NULL only for those rows, which don't have any mirror):

DELETE FROM filemetadata 
USING filemetadata AS f
LEFT OUTER JOIN mirrors AS m ON filemetadata_id = f.id
WHERE filemetadata.id = f.id AND m.id is NULL AND f.mtime < (now()-'3 months'::interval)

I know that the counts are pre-calculated anyway - but maybe we can simplify workflow and do not maintain MATERIALIZED VIEW at all?

Member Author

darix Jul 16, 2020

explain (analyze, buffers) delete from filemetadata using filemetadata AS f LEFT OUTER JOIN mirrors AS m ON filemetadata_id = f.id WHERE filemetadata_id = f.id AND  m.server_id is NULL AND f.mtime < (now()-'3 months'::interval);
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on filemetadata  (cost=1.00..1116601.47 rows=2862434 width=18) (actual time=0.638..0.638 rows=0 loops=1)
   Buffers: shared read=4
   ->  Nested Loop  (cost=1.00..1116601.47 rows=2862434 width=18) (actual time=0.638..0.638 rows=0 loops=1)
         Buffers: shared read=4
         ->  Nested Loop  (cost=1.00..16.55 rows=1 width=12) (actual time=0.637..0.637 rows=0 loops=1)
               Buffers: shared read=4
               ->  Index Scan using idx_mirrors_server_id on mirrors m  (cost=0.56..8.08 rows=1 width=10) (actual time=0.636..0.636 rows=0 loops=1)
                     Index Cond: (server_id IS NULL)
                     Buffers: shared read=4
               ->  Index Scan using pk_filemetadata on filemetadata f  (cost=0.44..8.46 rows=1 width=10) (never executed)
                     Index Cond: (id = m.filemetadata_id)
                     Filter: (mtime < (now() - '3 mons'::interval))
         ->  Seq Scan on filemetadata  (cost=0.00..888591.96 rows=22799296 width=6) (never executed)
 Planning Time: 5.143 ms
 JIT:
   Functions: 14
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 1.544 ms, Inlining 0.000 ms, Optimization 0.000 ms, Emission 0.000 ms, Total 1.544 ms
 Execution Time: 2.353 ms
(19 rows)

Collaborator

andrii-suse Jul 16, 2020

oh it probably should be WHERE filemetadata.id instead of WHERE filemetadata_id

sql/migrations/schema-postgresql-move-to-mirrors.sql Outdated Show resolved Hide resolved

sql/migrations/schema-postgresql-move-to-mirrors.sql Outdated Show resolved Hide resolved

andrii-suse reviewed

View reviewed changes

sql/migrations/schema-postgresql-move-to-mirrors.sql Outdated

+                  sha1pieces bytea,
+                  btih bytea,
+                  pgp text,
+                  zblocksize smallint,

Collaborator

andrii-suse Jul 16, 2020

We should use INT instead of SMALLINT, which is mileted to 32K : zsync algorithm doesn't have practical limit for block size and e.g. block size of 1M is much suitable for huge files (see #47 )

andrii-suse reviewed

View reviewed changes

sql/migrations/schema-postgresql-move-to-mirrors.sql Outdated

+                  zblocksize,
+                  zhashlens,
+                  zsums,
+                  encode(zsums, 'hex') AS zsumshex

Collaborator

andrii-suse Jul 16, 2020

we need path column here as well, so the view can be queried without join

Member Author

darix Jul 16, 2020

I thought about it. but right know it is 2 queries anyway.
https://github.com/openSUSE/mirrorbrain/blob/opensuse/mod_mirrorbrain/mod_mirrorbrain.c#L139..L163

Collaborator

andrii-suse Jul 16, 2020

I mean instead of subquery in the hash query

mirrorbrain/mod_mirrorbrain/mod_mirrorbrain.c

Lines 157 to 159 in f33ca26

    
           "WHERE file_id = (SELECT id " \ 
        
                            "FROM filearr " \ 
        
                            "WHERE path = %s) " \

we can just have WHERE path = %s if the view would have path

Collaborator

andrii-suse Jul 17, 2020

This is how til will look if the column is inside view https://github.com/openSUSE/mirrorbrain/pull/52/files#diff-a44eaf51cd9831129dc4b515db0bd2aeL157-R156

andrii-suse reviewed

View reviewed changes

sql/migrations/schema-postgresql-migrate-data-to-mirrors.sql Outdated Show resolved Hide resolved

darix added 5 commits

July 16, 2020 17:37


          rename tables as requested

f506ad9


          allow larger zsum blocks

7d046b9


          make it more clear where each column comes from

c5c7e8a


          This experiment didnt work any way

372bcdd


          rename files to make clear in which order to call them

5eddb87

darix requested a review from andrii-suse

July 20, 2020 00:49

Member Author

darix commented Jul 20, 2020

all your changes should be addressed

darix merged commit e6fd0bb into opensuse

darix deleted the new_sql_schema branch

July 21, 2020 12:12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Prio1