A tool for convenient and integrity-preserving data sharing between database servers.
Only users interacting with the destination of the data need to install the datajoint-link package:
pip install datajoint-link
Datajoint-link requires access to the database server from which data will be pulled. It is recommended to create a new user for this purpose:
CREATE USER 'djlink'@'%' IDENTIFIED BY 'secret-password';
The user needs to have certain privileges on the table from which data will be pulled:
GRANT SELECT, REFERENCES ON `source\_schema`.`source\_table` TO 'djlink'@'%';
Each table from which data will be pulled also needs an additional helper table:
GRANT ALL PRIVILEGES ON `helper\_schema`.`helper\_table` TO 'djlink'@'%';
In order to preserve data integrity across the link regular users must not have any privileges on this helper table.
Datajoint-link needs to be configured with the username and password of the user created in the previous section. This is accomplished via environment variables:
LINK_USER=djlink
LINK_PASS=secret-password
The destination table is created by passing information about where to find the source table to the link
decorator:
from link import link
@link(
"databaseserver.com",
"source_schema",
"helper_schema",
"helper_table",
"destination_schema"
)
class Table:
"""Some table present in the source schema on the source database server."""
Note that the name of the declared class must match the name of the table from which the data will be pulled.
The class returned by the decorator behaves like a regular table with some added functionality. For one it allows the browsing of rows present in the source:
Table().source
All the rows can be pulled like so:
Table().source.pull() # Hint: Pass display_progress=True to get a progress bar
That said usually we only want to pull rows that match a certain criteria:
(Table().source & "foo = 1").pull()
The deletion of already pulled rows works the same as for any other table:
(Table() & "foo = 1").delete()
The deletion of certain rows from the destination can also be requested by flagging them in the corresponding helper table:
row = (Helper() & "foo = 1").fetch1()
(Helper() & row).delete()
row["is_flagged"] = "TRUE"
Helper().insert1(row)
The flagged
attribute makes the deletion of flagged rows from the destination table convenient:
(Table() & Table().source.flagged).delete()
Deleting a flagged row automatically updates its corresponding row in the helper table:
assert (Helper() & "foo = 1").fetch1("is_deprecated") == "TRUE" # No error!
Now it is save to delete the row from the source table as well!
Data stored in a source table that refers to one (or more) external stores can be stored in different store(s) after pulling:
@link(
...,
stores={"source_store": "destination_store"}
)
class Table:
...
Note that all stores mentioned in the dictionary need to be configured via dj.config
.
Clone this repository and run the following command from within the cloned repository to run all tests:
docker compose run functional_tests tests