Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow prefix to be specified per source (#2650) #2827

Merged
merged 3 commits into from
May 3, 2021

Conversation

amarjandu
Copy link
Contributor

@amarjandu amarjandu commented Feb 23, 2021

Author

  • PR title references issue
  • Title of main commit references issue
  • PR is linked to Zenhub issue

Author (reindex)

  • Added r tag to commit title or this PR does not require reindexing
  • Added reindex label to PR or this PR does not require reindexing

Author (freebies & chains)

  • Freebies are blocked on this PR or there are no freebies in this PR
  • Freebies are referenced in commit titles or there are no freebies in this PR
  • This PR is blocked by previous PR in the chain or this PR is not chained to another PR
  • Added chain label to the blocking PR or this PR is not chained to another PR

Author (upgrading)

  • Documented upgrading of deployments in UPGRADING.rst or this PR does not require upgrading
  • Added u tag to commit title or this PR does not require upgrading
  • Added upgrade label to PR or this PR does not require upgrading

Author (requirements, before every review)

  • Ran make requirements_update or this PR leaves requirements*.txt, common.mk and Makefile untouched
  • Added R tag to commit title or this PR leaves requirements*.txt untouched
  • Added reqs label to PR or this PR leaves requirements*.txt untouched

Author (before every review)

  • make integration_test passes in personal deployment or this PR does not touch functionality that could break the IT
  • Rebased branch on develop, squashed old fixups

Primary reviewer (after approval)

  • Commented on demo expectations or labelled PR as no demo
  • Decided if PR can be labeled no sandbox
  • PR title is appropriate as title of merge commit
  • Moved ticket to Approved column
  • Assigned PR to an operator

Operator (before pushing merge the commit)

  • Checked reindex label and r commit title tag
  • Checked that Demo expectations are clear or PR is labeled as no demo
  • Rebased and squashed branch
  • Sanity-checked history
  • Pushed PR branch to Github
  • Branch pushed to Gitlab and build passes in sandbox or added no sandbox label
  • Started reindex in sandbox or this PR does not require reindexing sandbox
  • Checked for failures in sandbox or this PR does not require reindexing sandbox
  • Added PR reference to merge commit title
  • Collected commit title tags in merge commit title
  • Moved linked issue to Merged column
  • Pushed merge commit to Github

Operator (after pushing the merge commit)

  • Moved freebies to Merged column or there are no freebies in this PR
  • Shortened the PR chain or this PR is not the base of another PR
  • Verified that N reviews labelling is accurate
  • Pushed merge commit to Gitlab or this changes can be pushed later, together with another PR
  • Deleted PR branch from Github and Gitlab

Operator (reindex)

  • Purchased BQ slot committment in dev or this PR does not require reindexing dev
  • Started reindex in dev or this PR does not require reindexing dev
  • Deleted BQ slot committment in dev or this PR does not require reindexing dev
  • Checked for failures in dev or this PR does not require reindexing dev
  • Purchased BQ slot committment in prod or this PR does not require reindexing prod
  • Started reindex in prod or this PR does not require reindexing prod
  • Deleted BQ slot committment in prod or this PR does not require reindexing prod
  • Checked for failures in prod or this PR does not require reindexing prod

Operator

  • Unassigned PR

@github-actions github-actions bot added the orange [process] Done by the Azul team label Feb 23, 2021
@amarjandu
Copy link
Contributor Author

Testing will fail; env vars need to be updated if we decide to go with this change.
Waiting on #2650 (comment)

@amarjandu
Copy link
Contributor Author

Need to update the scripts/reindex.py and inspect its usage in pipelines.

@amarjandu amarjandu force-pushed the issues/amar/2650-catalog-prefix branch from 88beae3 to 3e08211 Compare February 23, 2021 20:08
Copy link
Member

@hannes-ucsc hannes-ucsc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the spec in the ticket has been obsoleted by recent changes and I failed to keep it up to date. The prefix should be specific to the source, not the catalog. I'll update the ticket.

UPGRADING.rst Outdated

The syntax of the value of the AZUL_CATALOGS environment variable was modified
to include a prefix value. To upgrade a deployment, append every catalog entry
in that variable with a prefix ``:foo``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefixes have to be hexadecimal so this isn't a representative example.

for catalog_name, catalog in catalogs.items():
reject('/' in catalog_name,
'It appears AZUL_CATALOGS was not upgraded to include atlas names.')
reject(catalog.prefix == '')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty prefixes should be accepted. In fact, they are the norm.

@hannes-ucsc hannes-ucsc changed the title Allow prefix to be specified per catalog (#2650) Allow prefix to be specified per source (#2650) Feb 24, 2021
@hannes-ucsc
Copy link
Member

@amarjandu note my edit to the description of the underlying ticket.

@amarjandu amarjandu force-pushed the issues/amar/2650-catalog-prefix branch 3 times, most recently from bd3ccd5 to 81a55ab Compare March 8, 2021 23:15
@codecov
Copy link

codecov bot commented Mar 9, 2021

Codecov Report

Merging #2827 (2a478bf) into develop (3c6079d) will increase coverage by 0.01%.
The diff coverage is 82.60%.

❗ Current head 2a478bf differs from pull request most recent head 9c23cb7. Consider uploading reports for the commit 9c23cb7 to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2827      +/-   ##
===========================================
+ Coverage    82.59%   82.60%   +0.01%     
===========================================
  Files          117      117              
  Lines        12221    12229       +8     
===========================================
+ Hits         10094    10102       +8     
  Misses        2127     2127              
Impacted Files Coverage Δ
src/azul/azulclient.py 37.98% <0.00%> (-0.14%) ⬇️
src/azul/plugins/repository/tdr/__init__.py 82.72% <ø> (ø)
src/azul/plugins/repository/dss/__init__.py 80.15% <40.00%> (-1.10%) ⬇️
src/azul/__init__.py 82.08% <100.00%> (+0.15%) ⬆️
src/azul/indexer/__init__.py 98.61% <100.00%> (+0.12%) ⬆️
src/azul/indexer/document.py 95.45% <100.00%> (ø)
src/azul/terra.py 50.29% <100.00%> (+0.60%) ⬆️
test/indexer/test_indexer_controller.py 100.00% <100.00%> (ø)
test/service/test_repository_proxy.py 99.37% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 377112d...9c23cb7. Read the comment docs.

@coveralls
Copy link

coveralls commented Mar 9, 2021

Coverage Status

Coverage increased (+0.01%) to 83.129% when pulling 9c23cb7 on issues/amar/2650-catalog-prefix into 377112d on develop.

@amarjandu amarjandu force-pushed the issues/amar/2650-catalog-prefix branch from a89cbfb to 35533b3 Compare March 9, 2021 02:34
@amarjandu amarjandu added the upgrade [process] PR includes commit requiring manual upgrade label Mar 9, 2021
UPGRADING.rst Outdated
#2650 Add prefix to sources
===========================

The syntax for the value of the ``AZUL_TDR_SOURCES`` and ``AZUL_TDR_…_SOURCES``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The syntax for the value of the ``AZUL_TDR_SOURCES`` and ``AZUL_TDR_…_SOURCES``
The syntax of ``AZUL_TDR_SOURCES`` and ``AZUL_TDR_…_SOURCES``

UPGRADING.rst Outdated
===========================

The syntax for the value of the ``AZUL_TDR_SOURCES`` and ``AZUL_TDR_…_SOURCES``
environment variables were modified to include a prefix. To upgrade a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
environment variables were modified to include a prefix. To upgrade a
environment variables was modified to include a UUID prefix. To upgrade a

deployments/sandbox/environment.py Show resolved Hide resolved
'default) no partitioning occurs, the DSS is queried locally and the indexer notification '
'endpoint is invoked for each bundle individually and concurrently using worker threads. '
'This is magnitudes slower that partitioned indexing.')
'The lambda queries the data repository and queues a notification for each matching bundle. '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'The lambda queries the data repository and queues a notification for each matching bundle. '
'The lambda queries the repository and queues a notification for each matching bundle. '

'endpoint is invoked for each bundle individually and concurrently using worker threads. '
'This is magnitudes slower that partitioned indexing.')
'The lambda queries the data repository and queues a notification for each matching bundle. '
'If 0 (the default) no partitioning occurs, the data repository is queried locally and the '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'If 0 (the default) no partitioning occurs, the data repository is queried locally and the '
'If 0 (the default) no partitioning occurs, the repository is queried locally and the '

@@ -262,8 +259,8 @@ def remote_reindex_partition(self, message: JSON) -> None:
bundle_fqids = self.list_bundles(catalog, source, prefix)
bundle_fqids = self.filter_obsolete_bundle_versions(bundle_fqids)
logger.info('After filtering obsolete versions, '
'%i bundles remain in prefix %r of catalog %r',
len(bundle_fqids), prefix, catalog)
'%i bundles remain in prefix %r for the source %r in catalog %r',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'%i bundles remain in prefix %r for the source %r in catalog %r',
'%i bundles remain in prefix %r of source %r in catalog %r',

Comment on lines 282 to 283
logger.info('Successfully queued %i notification(s) for prefix %s for '
'the source %r', num_messages, prefix, source)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.info('Successfully queued %i notification(s) for prefix %s for '
'the source %r', num_messages, prefix, source)
logger.info('Successfully queued %i notification(s) for prefix %s of '
'source %r', num_messages, prefix, source)

@@ -75,6 +78,7 @@ class TDRSourceName(SourceName):
project: str
name: str
is_snapshot: bool
prefix: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
prefix: str
prefix: str = ''

Comment on lines 65 to 66
is_snapshot=True,
prefix=''))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is_snapshot=True,
prefix=''))
is_snapshot=True))

else:
azul.reindex(catalog, args.prefix)
azul.reindex(catalog, '')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only call site so it doesn't make sense to fix this here but retain the argument. Why did you remove the args.reindex option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are still reviewing this, but did you mean args.prefix?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, yes. Good catch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the args.prefixoption because it seemed like it would not be needed if we hardcode the prefix to the source. By allowing a prefix to be provided to azulclient.reindex the reindexed bundles would be those under {source.name.prefix} + {args.prefix}. At the time I was thinking that if we are required to redeploy to update the the source prefix, why not just make this the only way to specify the prefix.

Maybe it's better to retain the args.prefix, make the default value of it '', and update the description to mention that its used to narrow down the source prefix, rather than to reindex an entirely different prefix.

This is the only call site so it doesn't make sense to fix this here but retain the argument.

I would think that we want the azulclient to be robust enough to handle a prefix during reindex, so I opted to retain the prefix argument in that method vs removing it, even though its only used in this one call site.

I think there is a problem that we overload the term prefix in the repository plugins to mean source.name.prefix or partition prefix. By having distinct optional arguments for prefix and partition prefix within the methods of the repository plugin we can make make the client more flexible, and is probably something worth looking into in the long run.

@hannes-ucsc hannes-ucsc removed their assignment Mar 10, 2021
@hannes-ucsc hannes-ucsc added the 1 review [process] Lead requested changes once label Mar 10, 2021
@amarjandu amarjandu force-pushed the issues/amar/2650-catalog-prefix branch 4 times, most recently from a5cfec5 to 8a80914 Compare March 15, 2021 18:18
@amarjandu
Copy link
Contributor Author

pulling from review to rebase

@hannes-ucsc hannes-ucsc removed their assignment Apr 26, 2021
@amarjandu amarjandu force-pushed the issues/amar/2650-catalog-prefix branch 2 times, most recently from 6a3e5b2 to e01184a Compare April 27, 2021 22:32
])
for catalog in ('dcp2ebi', 'it2ebi')
for catalog, prefix in (('dcp2ebi', '42'), ('it2ebi', ''))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IT shortens the prefix until it finds enough bundles. If a configured prefix prevents the IT from finding any bundles, even with an IT prefix of '' then the configured prefix is wrong and actually includes no bundles. So if a prefix doesn't work for a source of an IT catalog, that prefix wouldn't work for that source on a non-IT catalog either since the source wouldn't contribute any bundles to said catalog.

@hannes-ucsc hannes-ucsc removed their assignment Apr 29, 2021
@amarjandu
Copy link
Contributor Author

add fixme:
"add tooling to aid in prefix choice"
create issue for this as well.

@amarjandu
Copy link
Contributor Author

IT failed during the 10min wait period for queue stabilization. All notifications were handled by the indexer lambdas, restarted IT to double check.

@hannes-ucsc hannes-ucsc force-pushed the issues/amar/2650-catalog-prefix branch from 2a478bf to c8974d1 Compare April 29, 2021 23:27
@hannes-ucsc hannes-ucsc added demo ext [process] To be demonstrated to stakeholders no demo [process] Not to be demonstrated at the end of the sprint labels Apr 29, 2021
@hannes-ucsc hannes-ucsc removed their assignment Apr 29, 2021
@hannes-ucsc
Copy link
Member

IT failed during the 10min wait period for queue stabilization. All notifications were handled by the indexer lambdas, restarted IT to double check.

I didn't see this until now, after my approval. Looks like you asked for approval first, then ran IT. That would be out of order.

If the IT passes please assign to @achave11, move to Approved and check the two checklist items, please.

Otherwise, back to square one (apply fix fixes, ask for re-review).

@amarjandu
Copy link
Contributor Author

IT failed during the 10min wait period for queue stabilization. All notifications were handled by the indexer lambdas, restarted IT to double check.

I didn't see this until now, after my approval. Looks like you asked for approval first, then ran IT. That would be out of order.

If the IT passes please assign to @achave11, move to Approved and check the two checklist items, please.

Otherwise, back to square one (apply fix fixes, ask for re-review).

Sorry that was for my reference, I'll keep those types of notes off the PR thread to reduce clutter going forward, the IT does pass on my local deployment.

@hannes-ucsc
Copy link
Member

Just prefix them with "Note to self" or similar.

@achave11-ucsc achave11-ucsc force-pushed the issues/amar/2650-catalog-prefix branch from c8974d1 to 9c23cb7 Compare April 30, 2021 15:24
@achave11-ucsc achave11-ucsc added the sandbox [process] Resolution is being verified in sandbox deployment label Apr 30, 2021
@achave11-ucsc achave11-ucsc merged commit 20061e8 into develop May 3, 2021
@achave11-ucsc achave11-ucsc deleted the issues/amar/2650-catalog-prefix branch May 3, 2021 23:35
@achave11-ucsc achave11-ucsc removed their assignment May 4, 2021
@theathorn theathorn removed the demo ext [process] To be demonstrated to stakeholders label May 10, 2021
@theathorn
Copy link

@hannes-ucsc "for demo, show evidence of diverse prefixes between catalogs in sandbox IT job logs on GitLab".

@theathorn theathorn removed the no demo [process] Not to be demonstrated at the end of the sprint label May 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4+ reviews [process] Lead requested changes four times or more orange [process] Done by the Azul team sandbox [process] Resolution is being verified in sandbox deployment upgrade [process] PR includes commit requiring manual upgrade
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants