Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: FK validations are slow #15157

Closed
vivekmenezes opened this issue Apr 19, 2017 · 14 comments
Closed

sql: FK validations are slow #15157

vivekmenezes opened this issue Apr 19, 2017 · 14 comments
Assignees
Labels
A-sql-mutations Mutation statements: UPDATE/INSERT/UPSERT/DELETE. C-performance Perf of queries or internals. Solution not expected to change functional behavior. docs-done docs-known-limitation

Comments

@vivekmenezes
Copy link
Contributor

As part of #15026 a complex schema with FKs

CREATE TABLE raw_visits (
	id INT NOT NULL DEFAULT unique_rowid(),
	venue_id INT NOT NULL,
	device_id STRING NOT NULL,
	first_signal TIMESTAMP WITH TIME ZONE NULL,
	last_signal TIMESTAMP WITH TIME ZONE NULL,
	duration_signal INTERVAL NULL,
	signals_per_hour INT NULL,
	avg_distance INT NULL,
	visitortype INT NULL,
	durationtype INT NULL,
	dayssincelastvisit INT NULL,
	visitweight INT NULL,
	min_distance INT NULL DEFAULT 0,
	CONSTRAINT "primary" PRIMARY KEY (id ASC),
	CONSTRAINT fk_venue_id_ref_venues FOREIGN KEY (venue_id, device_id, first_signal, duration_signal) REFERENCES venues (id),
	CONSTRAINT fk_device_id_ref_devices FOREIGN KEY (device_id) REFERENCES devices (id),
	FAMILY "primary" (id, venue_id, device_id, first_signal, last_signal, duration_signal, signals_per_hour, avg_distance, visitortype, durationtype, dayssincelastvisit, visitweight, min_distance)
)

reveal high latency on SQL queries.

@vivekmenezes vivekmenezes changed the title sql: sql requests against tables with FKs are slow sql: requests against tables with FKs are slow Apr 19, 2017
@vivekmenezes
Copy link
Contributor Author

@mattiasmak lets discuss the latency issues you are observing here. Thanks!

@vivekmenezes
Copy link
Contributor Author

We'd appreciate it if you attach the kinds of SQL requests you are running against this cluster, and also include the schema for the venue and device tables. Thanks

@dianasaur323 dianasaur323 added this to the 1.0 milestone Apr 19, 2017
@mattiasmak
Copy link

CREATE TABLE devices (
	id STRING NOT NULL,
	prefix STRING NULL,
	name STRING NULL,
	CONSTRAINT "primary" PRIMARY KEY (id ASC),
	INDEX prefix_index (prefix ASC),
	FAMILY "primary" (id, prefix, name)
)
CREATE TABLE venues (
	id INT NOT NULL DEFAULT unique_rowid(),
	client_id INT NOT NULL,
	name STRING NULL,
	CONSTRAINT "primary" PRIMARY KEY (id ASC),
	CONSTRAINT fk_client_id_ref_clients FOREIGN KEY (client_id) REFERENCES clients (id),
	FAMILY "primary" (id, client_id, name)
)

These two queries are executed:

UPSERT INTO devices (id, prefix) VALUES ('aabbcc', 'aa');

INSERT INTO raw_visits (venue_id, device_id, first_signal, last_signal, duration_signal, signals_per_hour, avg_distance, min_distance) VALUES (1, 'aabbcc', ...etc );

@vivekmenezes vivekmenezes modified the milestones: 1.1, 1.0 Apr 25, 2017
@dt dt removed this from the 1.1 milestone Aug 1, 2017
@dt dt removed their assignment Aug 1, 2017
@jordanlewis jordanlewis changed the title sql: requests against tables with FKs are slow sql: inserts into tables with FKs are slow Sep 21, 2017
@jordanlewis
Copy link
Member

jordanlewis commented Sep 21, 2017

I have a few ideas here, ordered roughly from simpler to more challenging.

  • Combine all of the FK scans for a single key into a single batch to give to dist sender. A similar strategy is used in the row writer itself.
  • Cache foreign key lookups across the lifetime of a foreign key helper. This would have a major performance improvement for the common case of a large batch of inserts for a single entity - for example, inserting the 100,000 stock rows per warehouse in TPCC must ensure that the very same warehouse row is present for all 100,000 stock rows.
  • Combine FK scans for several keys into a single batch, deferring the insertion of that batch's worth of keys until success is returned.
  • Cache foreign key lookups across the lifetime of a transaction somehow, for the other common case where a batch of inserts is performed within a transaction but not in a single SQL batch. There are obvious complications here - what if one of the other statements in the transaction modifies the foreign table?

@jordanlewis jordanlewis self-assigned this Sep 21, 2017
@petermattis
Copy link
Collaborator

@jordanlewis Can you explain how FK lookups are currently done? Is no batching being done?

@jordanlewis
Copy link
Member

jordanlewis commented Sep 22, 2017

There is no batching done at all for FK lookups. On insert, a single scan is sent sequentially for each of the n foreign keys present in the table.

Proposal 1 above involves batching the n scans per insert row.
Proposal 3 involves batching the n * k scans for all k insert rows in a SQL-level insert batch.

@jordanlewis
Copy link
Member

Well, to be more precise, proposal 3 has at most n * k scans in the case that all of the input inserts have distinct foreign key columns.

@petermattis
Copy link
Collaborator

Yowzer, that's not good.

@dt
Copy link
Member

dt commented Sep 22, 2017

I expect helper-lifetime caching to be a big win too and am excited to see where that goes if you try it. Note that even in the statement-lifetime cache, you might want to disable caching for self-referential FKs: unlike cross-table FKs, a self-FK could easily be writing to the checked rows in the same statement, so just like the txn-lifetime cache, you'd have add some substantial invalidation complexity.

@jordanlewis
Copy link
Member

#18730 helped with this, but there's still more work to be done. @BramGruneir would you be interested in looking at this after you're done with the FK actions work?

@BramGruneir BramGruneir self-assigned this Jan 25, 2018
@vivekmenezes vivekmenezes added this to the 2.0 milestone Jan 25, 2018
@jordanlewis jordanlewis modified the milestones: 2.0, 2.1 Feb 5, 2018
@jordanlewis jordanlewis removed their assignment Feb 22, 2018
@petermattis petermattis added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Mar 16, 2018
@jordanlewis jordanlewis changed the title sql: inserts into tables with FKs are slow sql: FK validations are slow Mar 20, 2018
@knz knz modified the milestones: 2.1, 2.2 Aug 31, 2018
@knz knz unassigned emsal0 Aug 31, 2018
@knz knz added the A-sql-mutations Mutation statements: UPDATE/INSERT/UPSERT/DELETE. label Sep 21, 2018
@knz
Copy link
Contributor

knz commented Sep 21, 2018

FWIW @nvanbenschoten had proposed a KV-level key existence cache which would help here too.

@petermattis petermattis removed this from the 2.2 milestone Oct 5, 2018
@sploiselle
Copy link
Contributor

sploiselle commented Oct 24, 2018

@BramGruneir @jordanlewis Disregard request for description; didn't see this was written up in 1.0.

@nvanbenschoten
Copy link
Member

@RaduBerinde think we can close this now? Is it providing any value?

@jordanlewis
Copy link
Member

FKs have been rewritten recently. Feel free to open a new issue if you still are having problems, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-mutations Mutation statements: UPDATE/INSERT/UPSERT/DELETE. C-performance Perf of queries or internals. Solution not expected to change functional behavior. docs-done docs-known-limitation
Projects
None yet
Development

No branches or pull requests