-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow prefix to be specified per source (#2650) #2827
Conversation
Testing will fail; env vars need to be updated if we decide to go with this change. |
Need to update the |
88beae3
to
3e08211
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, the spec in the ticket has been obsoleted by recent changes and I failed to keep it up to date. The prefix should be specific to the source, not the catalog. I'll update the ticket.
UPGRADING.rst
Outdated
|
||
The syntax of the value of the AZUL_CATALOGS environment variable was modified | ||
to include a prefix value. To upgrade a deployment, append every catalog entry | ||
in that variable with a prefix ``:foo``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefixes have to be hexadecimal so this isn't a representative example.
src/azul/__init__.py
Outdated
for catalog_name, catalog in catalogs.items(): | ||
reject('/' in catalog_name, | ||
'It appears AZUL_CATALOGS was not upgraded to include atlas names.') | ||
reject(catalog.prefix == '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty prefixes should be accepted. In fact, they are the norm.
@amarjandu note my edit to the description of the underlying ticket. |
bd3ccd5
to
81a55ab
Compare
Codecov Report
@@ Coverage Diff @@
## develop #2827 +/- ##
===========================================
+ Coverage 82.59% 82.60% +0.01%
===========================================
Files 117 117
Lines 12221 12229 +8
===========================================
+ Hits 10094 10102 +8
Misses 2127 2127
Continue to review full report at Codecov.
|
a89cbfb
to
35533b3
Compare
UPGRADING.rst
Outdated
#2650 Add prefix to sources | ||
=========================== | ||
|
||
The syntax for the value of the ``AZUL_TDR_SOURCES`` and ``AZUL_TDR_…_SOURCES`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The syntax for the value of the ``AZUL_TDR_SOURCES`` and ``AZUL_TDR_…_SOURCES`` | |
The syntax of ``AZUL_TDR_SOURCES`` and ``AZUL_TDR_…_SOURCES`` |
UPGRADING.rst
Outdated
=========================== | ||
|
||
The syntax for the value of the ``AZUL_TDR_SOURCES`` and ``AZUL_TDR_…_SOURCES`` | ||
environment variables were modified to include a prefix. To upgrade a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
environment variables were modified to include a prefix. To upgrade a | |
environment variables was modified to include a UUID prefix. To upgrade a |
scripts/reindex.py
Outdated
'default) no partitioning occurs, the DSS is queried locally and the indexer notification ' | ||
'endpoint is invoked for each bundle individually and concurrently using worker threads. ' | ||
'This is magnitudes slower that partitioned indexing.') | ||
'The lambda queries the data repository and queues a notification for each matching bundle. ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'The lambda queries the data repository and queues a notification for each matching bundle. ' | |
'The lambda queries the repository and queues a notification for each matching bundle. ' |
scripts/reindex.py
Outdated
'endpoint is invoked for each bundle individually and concurrently using worker threads. ' | ||
'This is magnitudes slower that partitioned indexing.') | ||
'The lambda queries the data repository and queues a notification for each matching bundle. ' | ||
'If 0 (the default) no partitioning occurs, the data repository is queried locally and the ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'If 0 (the default) no partitioning occurs, the data repository is queried locally and the ' | |
'If 0 (the default) no partitioning occurs, the repository is queried locally and the ' |
src/azul/azulclient.py
Outdated
@@ -262,8 +259,8 @@ def remote_reindex_partition(self, message: JSON) -> None: | |||
bundle_fqids = self.list_bundles(catalog, source, prefix) | |||
bundle_fqids = self.filter_obsolete_bundle_versions(bundle_fqids) | |||
logger.info('After filtering obsolete versions, ' | |||
'%i bundles remain in prefix %r of catalog %r', | |||
len(bundle_fqids), prefix, catalog) | |||
'%i bundles remain in prefix %r for the source %r in catalog %r', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'%i bundles remain in prefix %r for the source %r in catalog %r', | |
'%i bundles remain in prefix %r of source %r in catalog %r', |
src/azul/azulclient.py
Outdated
logger.info('Successfully queued %i notification(s) for prefix %s for ' | ||
'the source %r', num_messages, prefix, source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.info('Successfully queued %i notification(s) for prefix %s for ' | |
'the source %r', num_messages, prefix, source) | |
logger.info('Successfully queued %i notification(s) for prefix %s of ' | |
'source %r', num_messages, prefix, source) |
src/azul/terra.py
Outdated
@@ -75,6 +78,7 @@ class TDRSourceName(SourceName): | |||
project: str | |||
name: str | |||
is_snapshot: bool | |||
prefix: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefix: str | |
prefix: str = '' |
test/indexer/test_tdr.py
Outdated
is_snapshot=True, | ||
prefix='')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_snapshot=True, | |
prefix='')) | |
is_snapshot=True)) |
scripts/reindex.py
Outdated
else: | ||
azul.reindex(catalog, args.prefix) | ||
azul.reindex(catalog, '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only call site so it doesn't make sense to fix this here but retain the argument. Why did you remove the args.reindex option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are still reviewing this, but did you mean args.prefix
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm, yes. Good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the args.prefix
option because it seemed like it would not be needed if we hardcode the prefix to the source. By allowing a prefix to be provided to azulclient.reindex
the reindexed bundles would be those under {source.name.prefix} + {args.prefix}
. At the time I was thinking that if we are required to redeploy to update the the source prefix, why not just make this the only way to specify the prefix.
Maybe it's better to retain the args.prefix, make the default value of it ''
, and update the description to mention that its used to narrow down the source prefix, rather than to reindex an entirely different prefix.
This is the only call site so it doesn't make sense to fix this here but retain the argument.
I would think that we want the azulclient to be robust enough to handle a prefix during reindex, so I opted to retain the prefix argument in that method vs removing it, even though its only used in this one call site.
I think there is a problem that we overload the term prefix
in the repository plugins to mean source.name.prefix
or partition prefix
. By having distinct optional arguments for prefix and partition prefix within the methods of the repository plugin we can make make the client more flexible, and is probably something worth looking into in the long run.
a5cfec5
to
8a80914
Compare
pulling from review to rebase |
8a80914
to
c879124
Compare
6a3e5b2
to
e01184a
Compare
deployments/sandbox/environment.py
Outdated
]) | ||
for catalog in ('dcp2ebi', 'it2ebi') | ||
for catalog, prefix in (('dcp2ebi', '42'), ('it2ebi', '')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IT shortens the prefix until it finds enough bundles. If a configured prefix prevents the IT from finding any bundles, even with an IT prefix of ''
then the configured prefix is wrong and actually includes no bundles. So if a prefix doesn't work for a source of an IT catalog, that prefix wouldn't work for that source on a non-IT catalog either since the source wouldn't contribute any bundles to said catalog.
add fixme: |
IT failed during the 10min wait period for queue stabilization. All notifications were handled by the indexer lambdas, restarted IT to double check. |
2a478bf
to
c8974d1
Compare
I didn't see this until now, after my approval. Looks like you asked for approval first, then ran IT. That would be out of order. If the IT passes please assign to @achave11, move to Approved and check the two checklist items, please. Otherwise, back to square one (apply fix fixes, ask for re-review). |
Sorry that was for my reference, I'll keep those types of notes off the PR thread to reduce clutter going forward, the IT does pass on my local deployment. |
Just prefix them with "Note to self" or similar. |
c8974d1
to
9c23cb7
Compare
@hannes-ucsc "for demo, show evidence of diverse prefixes between catalogs in sandbox IT job logs on GitLab". |
Author
Author (reindex)
r
tag to commit title or this PR does not require reindexingreindex
label to PR or this PR does not require reindexingAuthor (freebies & chains)
chain
label to the blocking PR or this PR is not chained to another PRAuthor (upgrading)
u
tag to commit title or this PR does not require upgradingupgrade
label to PR or this PR does not require upgradingAuthor (requirements, before every review)
make requirements_update
or this PR leaves requirements*.txt, common.mk and Makefile untouchedR
tag to commit title or this PR leaves requirements*.txt untouchedreqs
label to PR or this PR leaves requirements*.txt untouchedAuthor (before every review)
make integration_test
passes in personal deployment or this PR does not touch functionality that could break the ITdevelop
, squashed old fixupsPrimary reviewer (after approval)
no demo
no sandbox
Operator (before pushing merge the commit)
reindex
label andr
commit title tagno demo
sandbox
or addedno sandbox
labelsandbox
or this PR does not require reindexingsandbox
sandbox
or this PR does not require reindexingsandbox
Operator (after pushing the merge commit)
N reviews
labelling is accurateOperator (reindex)
dev
or this PR does not require reindexingdev
dev
or this PR does not require reindexingdev
dev
or this PR does not require reindexingdev
dev
or this PR does not require reindexingdev
prod
or this PR does not require reindexingprod
prod
or this PR does not require reindexingprod
prod
or this PR does not require reindexingprod
prod
or this PR does not require reindexingprod
Operator