Skip to content

v0.2.0

Latest
Compare
Choose a tag to compare
@clnsmth clnsmth released this 17 Jan 21:34
· 1 commit to main since this release

v0.2.0 (2024-01-17)

Build

  • build: remove dev dependencies to lighten package

Remove developer dependencies from the list of user/production
dependencies to lighten the installation process for users. (5496256)

Ci

  • ci: allow GHA to bypass protection rules on release

Permit GitHub Actions to bypass repository branch protection rules to
enable Python Semantic Release to commit release-related changes back
to the main branch. (d3c012b)

  • ci: fix failing release workflow

Update the failing release workflow (GitHub Action) to enable Python
Semantic Release to automatically create a new release when the
development branch is merged into main.

  • Add the GITHUB_TOKEN to the semantic release step, where it is
    necessary for committing changes, using the standard syntax for
    referencing secrets.
  • Include the missing versioning command in the semantic release step to
    ensure the new version is calculated.
  • Correct the outdated syntax for referencing the version number in the
    pyproject.toml file to align with the requirements of the current version
    of Python Semantic Release. (49c986f)
  • ci: update release token reference

Update the name of the authentication token used by the CD GitHub
Actions Workflow to reflect the new permissive permissions set on the
default repository token, resolving the CD workflow failure. (dc80841)

  • ci: update failing GitHub Action workflows

Adjust failing GitHub Actions workflows for 'checkout' and
'setup-python' used in the CD workflow. This workflow differs from the
CI workflow in that authentication is required for git commit and
merging operations. (bf30458)

  • ci: ignore pylint c-extension-no-member messages

Ignore pylint 'c-extension-no-member' (I1101) messages, originating
from lxml, for the sake of a readable message log.

Another option, adding lxml to the pylint --extension-pkg-allow-list,
may run arbitrary code and is a decision that shouldn't be made for
collaborators running pylint in the context of this project. (3b9cefb)

  • ci: update GitHub Actions

Assign GitHub Actions branch merge permissions to ensure that
'development' remains up-to-date with the main branch after 'main'
is tagged during the release process.

Declare Pylint checks in 'pyproject.toml' to ensure synchronization
between local checks and CI pipeline checks. (19e784f)

Documentation

  • docs: add blank space for release commit

Insert an empty blank space into the README to ease the creation of a
commit message, via Python Semantic Release, intended solely for bumping
the major version number.

BREAKING CHANGE: This marks the first fully functioning release of the
gbif_registrar package. The APIs of previously released functionality have
been considerably modified. (86b3da0)

  • docs: update CONTRIBUTING for current project status

Revise the CONTRIBUTING file to align with the current status of the project. (71e6380)

  • docs: emphasize running main workflow after creation

Emphasize the importance of running the main workflow after creation,
as skipping this step can result in incomplete registration and uploading
of a package to GBIF. (20d937e)

  • docs: update installation instructions

Revise installation instructions to recommend using pip from GitHub
rather than conda. While installation from conda is possible, the pip
method is more straightforward. (49d3ffa)

  • docs: add examples of public-facing API usage

Add missing examples of public-facing API usage to provide users with
demonstrations of how to use the functions. (85f5364)

  • docs: encourage subscription to API mailing list

Add a note to the developer section of the README, advising maintainers
to subscribe to the EDI and GBIF API mailing lists for timely updates on
outages and changes. This ensures they can adjust expectations or the
codebase accordingly. (d62edb6)

  • docs: correct subsection formatting

Apply a small fix to ensure consistent subsection formatting throughout the
document. (71a837e)

  • docs: clarify dataset synchronization concept

Enhance clarity in the documentation regarding the concept of dataset
synchronization to preempt any potential confusion. (1ba25e7)

  • docs: update README for release

Add major missing components to the README in preparation for release. (e55cccf)

  • docs: standardize descriptions for a consistent API

Standardize function descriptions for API clarity and consistency. (947093b)

  • docs: standardize parameters for consistent API

Standardize function parameter names and definitions for a consistent public
facing API. (92a56ec)

  • docs: comment for clarity and understanding

Update code and test comments for improved clarity and understanding. (88842cb)

  • docs: revise descriptions of a few utilities

Enhance descriptions and provide examples for the utility functions
'get_local_dataset_group_id,' 'get_local_dataset_endpoint,' and
'get_gbif_dataset_uuid' to facilitate better understanding. (eae5990)

  • docs: fix outdated references to read_registrations

Address outdated mentions of the 'read_registrations' function that were
missed in commit 39367cc. (37745d6)

  • docs: address RTD build deprecation

Switch to 'build.os' instead of 'build.image' to address the deprecation
of the 'build.image' config key in Read the Docs. This change is
necessary for successful documentation building. (85e3fde)

Feature

  • feat: upload new and revised datasets to GBIF

Implement a new function for uploading both new and revised datasets to
GBIF. Build the workflow to handle typical conditions and edge cases.
Additionally, create integration tests for making actual HTTP calls,
extended tests meant for occasional manual execution, and mock HTTP
calls, which are always run and provide faster results. (53219b6)

  • feat: enable registration repair on demand

Modify 'complete_registration_records' to operate on a single record
when directed to do so, rather than always processing all incomplete
registrations. (e164d13)

  • feat: check synchronization of local dataset w/GBIF

Report the success or failure of a dataset creation or update operation
to alert users of synchronization issues. Define success and failure by
comparing the publication date of the local dataset EML metadata
and the endpoint of the zip archive download, with that of the remote
GBIF instance.

Move get_local_dataset_endpoint to utilities.py to prevent a circular
reference. (c9ebad3)

  • feat: wrap get GBIF dataset details for general use

Wrap calls for GBIF dataset details to simplify response handling and
to be DRY when calling from different contexts. (c3ec165)

  • feat: post local datasets to GBIF

Publish a set of functions for posting a local dataset to GBIF and
maintaining synchronization as the local dataset evolves over time. (14336ac)

Fix

  • fix: resolve reference to dataset group, not endpoint

To retrieve the corresponding 'gbif_dataset_uuid' without errors,
utilize the 'local_dataset_group_id' instead of the
'local_dataset_endpoint.' The 'local_dataset_endpoint' does not
reference previously used gbif_dataset_uuid values due to its
one-to-one cardinality. (5139a62)

  • fix: update dependencies to resolve doc build failures

Update the 'autoapi.extension' to prevent the exception "'Module' object
has no attribute 'doc'" and to enable successful local and Read the Docs
documentation builds.

Pin project documentation dependencies to address the deprecation of
default project dependencies on Read the Docs (see: https://blog.readthedocs.com/newsletter-september-2023/).

Update related project dependencies and resolve associated deprecation
errors and warnings to maintain a functional code base. (f23fc23)

  • fix: use PASTA environment consistently

Use the PASTA_ENVIRONMENT variable to ensure consistent alignment of
data package references. Using different environments results in data package
reference mismatches and various errors throughout the application code. (2370ab0)

  • fix: use synchronized dataset for testing

Add a dataset that has been synchronized between EDI and GBIF to
'registrations.csv' for testing purposes. (cc30d49)

  • fix: get new uuid if it does not exist

Fix the logic in 'get_gbif_dataset_uuid' for determining an empty gbif_dataset_uuid to ensure a new value is requested if it doesn't yet exist.

Additionally, use pytest-mock to simulate both success and failure conditions for this feature. (1095985)

  • fix: address pylint messages

Address lingering pylint messages to adhere to best practices and clean
up the message log, which has become quite lengthy. (c9ab920)

  • fix: update outdated dependency files

Update the outdated dependency files to build the project without error. (fa4d11c)

Refactor

  • refactor: rename module for improved descriptiveness

Rename the 'crawl.py' module to 'upload.py' to better reflect its
purpose, which involves the user posting content to GBIF rather
than performing crawling operations. (50205d6)

  • refactor: enhance credential security in config file
  • Relocate the configuration file to an external location, removing it from version control to ensure the safety of credentials.
  • Introduce a 'write configuration file' helper function, which generates a boilerplate configuration to be completed by the user.
  • Create utility functions for loading and unloading the configuration as environmental variables, making them accessible throughout the package.
  • Note: The current implementation doesn't fully restore the user's environmental variables to their original state, as any variables with the same names will be overwritten by the load_configuration function and removed by the unload_configuration function. Addressing this issue is a potential improvement for future implementation. (dfa5e39)
  • refactor: expand abbreviations for clarity

Expand abbreviated references to the registrations data frame for
improved clarity and comprehension (9ea61b1)

  • refactor: eliminate useless '_has_metadata' function

Remove the '_has_metadata' function as it does not serve a purpose.
Initially, it was designed to determine whether a local dataset group
had a member on GBIF and was used to guide decision logic concerning
resetting dataset endpoints and re-uploading metadata in the event of a
dataset update. However, it became apparent that this function returned
'True' even if only boilerplate stand-in metadata was posted to GBIF
before the actual metadata was posted during a crawl operation. (bff89ba)

  • refactor: check for 'NA' instead of 'None'

When performing decision logic (boolean operations) based on values
retrieved from the registrations file, ensure that the values are 'NA'
rather than 'None.' This change is necessary to avoid the 'boolean value
of NA is ambiguous' error potentially arising from the recent
implementation at commit
f23fc23, which transitions from
using 'None' to 'NA' values in the registrations file in preparation for
addressing a future deprecation in pandas. (4b213cf)

  • refactor: clarify definition of 'synchronization'

Rename the 'is_synchronized' column to 'synchronized' to clarify its
meaning, shifting from "this dataset is currently synchronized with
GBIF" to "this dataset has in the past been synchronized with GBIF."
Also, updated the 'check_is_synchronized' function to align with this
renaming. (2cb5392)

  • refactor: internalize utilities for backwards compatibility

Internalize utility functions to reduce the risk of introducing
backward compatibility issues in the public-facing API when
refactoring the codebase. (6a71eec)

  • refactor: default synchronization value to 'False'

Change the default synchronization indicator from 'None' to 'False' to
align the code with example and test usage. (aa6353b)

  • refactor: deprecate extended validation checks

Enforce consistent validation of registration file contents using
extended checks, always. Remove the controlling parameter for this
half-implemented external repository customization feature, which we
have decided not to support. (e5be3d9)

  • refactor: separate concerns of register_dataset

Refactor 'register_dataset' to exclusively handle the registration of a
single dataset, removing the attempt to repair partially registered
datasets resulting from past registration failures. Move the repair
action to 'complete_registration_records'. This separation of
concerns improves code maintainability and usability. (12742f4)

  • refactor: enhance clarity of complete_registrations

Rename the 'complete_registrations' function and update documentation
to reflect that it handles the completion of all components within
registration records, not solely the 'gbif_dataset_uuid'. (6266ef9)

  • refactor: enhance clarity in read_registrations

Rename the 'read_registrations' function and the 'file_path' parameter
to indicate that the registrations file is being read, and to follow a
consistent call pattern being implemented throughout the codebase. (39367cc)

  • refactor: enhance clarity in the register function
  • Rename the 'register' function to explicitly indicate that it
    registers a dataset.
  • Move the 'dataset' parameter to the first position for improved
    function call readability.
  • Rename the 'file_path' parameter to better convey that it represents
    registration information as a file for better understanding. (c92b496)
  • refactor: improve clarity in initialize_registrations
  • Rename the 'initialize_registrations' function to enhance understanding,
    making it clear that it initializes a file.
  • Enhance file content descriptions and their mappings to concepts in
    the EDI repository for better comprehension.
  • Move the function to the 'register.py' module, where it joins similar
    code for improved findability. (ee98c9e)
  • refactor: deprecate gbif_endpoint_set_datetime

Deprecate gbif_endpoint_set_datetime in favor of is_synchronized to
indicate the synchronization status of an EDI dataset with GBIF.

Is related to c9ebad3. (2c7ea77)

  • refactor: apply read_gbif_dataset_metadata

Apply 'read_gbif_dataset_metadata' to functions requiring this
information in their custom implementations to maintain a DRY
codebase. (c5896e3)

  • refactor: rename get_gbif_datatset_details

Improve user understanding by renaming 'get_gbif_datatset_details.'
Replace the 'get' prefix with 'read' to clarify the operation as an I/O
operation with possible parsing. (b1c4786)

  • refactor: fail 'has_metadata' gracefully

Handle HTTP errors gracefully in the 'has_metadata' function to prevent
systematic failures.

Employ pytest-mock to simulate both success and failure conditions. (7381637)

  • refactor: fail 'read_local_dataset_metadata' gracefully

Handle HTTP errors gracefully in the 'read_local_dataset_metadata'
function to prevent systematic failures.

Employ pytest-mock to simulate both success and failure conditions. (a0e9515)

  • refactor: fail 'request_gbif_dataset_uuid' gracefully

Handle HTTP errors gracefully in the 'request_gbif_dataset_uuid'
function to prevent systematic failures. Employ pytest-mock to simulate
both success and failure conditions. (f1bca32)

  • refactor: check for metadata before replacing

Prior to replacement, verify the presence of a metadata document in
GBIF. This precaution prevents potential errors when attempting to
replace a metadata document that does not currently exist. (01ddd6c)

  • refactor: reorder func params for better semantics

Reorder the parameter positions of functions in the 'crawl' module to
align function calls more effectively with the underlying semantics. (45f94c6)

Test

  • test: eliminate empty test module

Remove the empty 'test_validate.py' module as testing for these routines
is consolidated in the 'test__utilities.py' module. (4e11f80)

  • test: verify reuse of 'gbif_dataset_uuid' for updates

Confirm that the 'register_dataset' function reuses the
'gbif_dataset_uuid' for members sharing the same
'local_dataset_group_id,' enabling updates of the GBIF
dataset instance. (f7210e1)

  • test: register the first dataset w/o error

Test that the 'complete_registration_records' function works for the
scenario of the first dataset to ensure that this situation does not
trigger an error. (3030cf9)

  • test: validate iterative registration repair

Executed 'complete_registration_records' on a registrations file
containing two incomplete registrations to verify the functionality of
iterative registration repair under this specific use case. (07e2e67)

  • test: include missing test for failed registration

Add a test case to verify that a failed registration does not write a GBIF dataset UUID
in the registrations file, returns 'NA,' and does not raise an exception. (e974ac9)

  • test: share fixtures with conftest.py

Utilize 'conftest.py' for sharing test fixtures, currently isolated
within test modules. (e067bae)

  • test: mock HTTP requests for 'register'

Utilize pytest-mock to simulate both success and failure conditions for
the 'register' function. (dabeec1)

  • test: use pytest-mock to mock tests

Utilize pytest-mock to mock tests that involve remote API calls,
allowing tests to run even when offline. This approach enhances the
ability to thoroughly examine both pass and fail conditions. (4943677)