Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pg_sample doesn't correctly handle generated columns #52

Closed
nicholasd-ff opened this issue Feb 14, 2024 · 6 comments · Fixed by #53
Closed

pg_sample doesn't correctly handle generated columns #52

nicholasd-ff opened this issue Feb 14, 2024 · 6 comments · Fixed by #53

Comments

@nicholasd-ff
Copy link
Contributor

If you create a sample database with a single table as follows:

CREATE TABLE some_numbers(
    id int PRIMARY KEY GENERATED ALWAYS AS IDENTITY
    , important_value int
    , double_value int GENERATED ALWAYS AS (important_value*2) STORED
);

INSERT INTO some_numbers(important_value) (SELECT * FROM generate_series(1,100));

Then dump the table using:

 pg_sample --limit="some_numbers=10%" --file=pg_sample_generated_bug.sql

You get a dump file with the generated columns in, which you can't import because the schema prevents it:

 \i dump_files/pg_sample_generated_bug.sql 
SET
SET
SET
SET
SET
 set_config 
------------
 
(1 row)

SET
SET
SET
SET
SET
SET
CREATE TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
SET
SET
SET
SET
SET
ALTER TABLE
psql:dump_files/pg_sample_generated_bug.sql:85: ERROR:  extra data after last expected column
CONTEXT:  COPY some_numbers, line 1: "14        14      28"
ALTER TABLE

Note that this bug doesn't occur with raw pg_dump, which correctly omits the generated column.

@nicholasd-ff
Copy link
Contributor Author

nicholasd-ff commented Feb 14, 2024

Looks like we can omit the generated columns by querying the information schema:

SELECT column_name, is_generated FROM information_schema.columns WHERE table_name='some_numbers'
column_name   | is_generated 
-----------------+--------------
 id              | NEVER
 important_value | NEVER
 double_value    | ALWAYS

I can try and make the change and make a PR if that's useful? I've not written much Perl lately but I think the change would only be a couple of lines around line 696 to extract columns that are not generated:

  # Only extract non-generated columns
  my ($cols_to_copy) = $dbh->selectrow_array(qq{
       SELECT string_agg(column_name, ', ')
       FROM information_schema.columns 
       WHERE table_name=? 
       AND table_schema=?
       AND is_generated='NEVER' 
  }, undef, ($table->table, $table->schema) );
  notice "only copying cols [$cols_to_copy] ";

  $dbh->do(qq{
    CREATE $unlogged TABLE $sample_table AS
            SELECT $cols_to_copy
         FROM ONLY $table
      $tablesample
             WHERE $where
            $order
            $limit
  });

@mla
Copy link
Owner

mla commented Feb 15, 2024

Thank you, @nicholasd-ff ! I will try to review your changes tomorrow.

nicholasd-ff added a commit to nicholasd-ff/pg_sample that referenced this issue Feb 15, 2024
This change adds support for sampling tables with generated columns
and tests to check the new feature is working correctly.

Fixes mla#52
@nicholasd-ff
Copy link
Contributor Author

@mla I wrote a test and found a bug, so I made a PR with the proper fix for you. Hope that's OK!

@nicholasd-ff
Copy link
Contributor Author

It currently depends on my earlier PR to update the base image, but if that's a problem I can unstack them for you.

@nicholasd-ff
Copy link
Contributor Author

Hm, and I've encountered another bug with the referential integrity stuff I thought I'd covered that with the test but obviously not. I'll investigate and update the PR because I think my generated table needs a FK constraint to properly check the functionality.

@nicholasd-ff
Copy link
Contributor Author

And the bug is now fixed, and the PR updated.

@mla mla closed this as completed in #53 Feb 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants