Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

230 cml refactor contributor edit metadata form affiliations handling #720

Conversation

asuworks
Copy link
Contributor

TESTING CONTRIBUTORS

e2e (add some cypress tests later maybe?)

  1. ADD CONTRIBUTOR
    1. From Scratch (New Contributor Form)
      1. with a unique email (non existent in DB)
        1. assert frontend:
          1. all fields are correctly displayed in UI
          2. all fields are enabled (editable)
        2. assert backend:
          1. all fields are correctly stored in DB
          2. update/edit is allowed
      2. with an existing Contributor.email (Contributor is used in a release)
        1. assert frontend:
          1. error message displayed: Contributor with this email already exsists!
        2. assert backend:
          1. update/edit is not allowed: backend should not allow PUT request to /contributors/ if a Contributor with the same email exists and is used by another release
      3. with an existing Contributor.email (Contributor is NOT used in a release)
        1. assert frontend:
          1. all fields are correctly displayed in UI
          2. all fields are enabled (editable)
        2. assert backend:
          1. existing Contributor fields will overwritten, since it is not used in any release
          2. all fields are correctly stored in DB
          3. update/edit is allowed
      4. with an existing AuthUser.email [NOT IMPLEMENTED] [Current bahavior: a new Contributor is created and not linked to existing user]
        1. assert frontend:
          1. error message displayed: a User with this email exists, look it up!
        2. assert backend
          1. update/edit is not allowed
    2. From existing Contributor
      1. Contributor is USED in at least in ONE other release
        1. assert frontend:
          1. autocomplete fields in the Contributor lookup are correctly displayed in UI
          2. fields in the edit form are disabled
        2. assert backend:
          1. Contributor is NOT created
          2. ReleaseContributor IS created
          3. update/edit is not allowed
      2. Contributor is NOT USED in any release
        1. assert frontend:
          1. autocomplete fields in the Contributor lookup are correctly displayed in UI
        2. assert backend:
          1. Contributor is NOT created
          2. ReleaseContributor IS created
          3. update/edit is allowed
    3. From existing User:
      1. a Contributor with corresponding user_id EXISTS:
        1. assert frontend:
          1. fields are correctly prefilled
          2. all UI fields are disabled
        2. assert backend:
          1. Contributor is NOT created (existing Contributor is reused)
          2. ReleaseContributor IS created
          3. update/edit is not allowed
      2. Contributor with corresponding user_id DOESN'T EXIST:
        1. assert frontend:
          1. fields are correctly prefilled
          2. all UI fields are disabled
        2. assert backend:
          1. Contributor IS created
          2. ReleaseContributor IS created
          3. update/edit is not allowed
  2. CREATE DUPLICATE (by username, email...) CONTRIBUTORS is not allowed
    1. assert frontend:
      1. show error message
    2. assert backend:
      1. do not allow duplicates
  3. EDIT CONTRIBUTOR
    1. Contributor updates/edits are allowed ONLY if Contributor belongs ONLY TO ONE RELEASE (this release)
  4. DELETE CONTRIBUTOR
    1. check that Contributors can be deleted
    2. check that the last Contributor cannot be deleted
  5. REARRANGE CONTRIBUTORS
    1. check if drag and drop rearranging of Contributors in the UI is persisted in form of ReleaseContributor.index in the backend

@asuworks asuworks requested review from alee and sgfost May 17, 2024 18:34
@asuworks asuworks self-assigned this May 17, 2024
Copy link
Contributor

@sgfost sgfost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just rename json_affiliations to affiliations now and change the old affiliations to affiliation_tags and add a deprecation comment/help_text? A migration needs to be added anyways


times_used_in_codebases = distinct_codebases.count()
return times_used_in_codebases <= 1
def _is_not_used_or_used_by_current_release_only(self, contributor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're only allowing modifying on the release

def _is_exclusive_to_one_release(self, obj):
  ReleaseContributor.objects.filter(contributor=obj).values_list("release", flat=True).distinct().count() <= 1

would be a lot more concise and should do the same thing assuming its used on updates

# 'type'
# 'affiliations'
# 'json_affiliations'
user_id = validated_data.pop("user_id", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the user_id stay in validated_data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is from the update() action and this only deals with the case of updating a Contributor which was created from an existing user, I think we shouldn't allow changing Contributor-AuthUser relation (user_id) in this case, and also any of the other user related attributes for the sake of consistency of Contributor-AuthUser fields. .
When a new Contributor is created from an existing AuthUser the user_id will be set inside the create() action which gets the whole user as parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it remove the existing user relation from the contributor? It reads like that but idk if that actually happens

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it doesn't remove the user_id from contributor

validated_data.pop("given_name", None)
validated_data.pop("family_name", None)
if not self._is_not_used_or_used_by_current_release_only(instance):
if self._trying_to_modify_attributes(instance, validated_data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this update() get called when we do stuff like re-arranging/changing roles? I'm not sure what the is_trying_to_modify.. check is for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, update() is called when contributors are rearranged and roles are changed too.
If a contributor is already "used" for another codebase, we still need to be able to add him to any other codebase, just need to make sure that the PUT request (which is called on that newly added contributor) doesn't modify any of it's attributes. Hence the naming: _trying_to_modify_attributes()

we are trying to squeeze to much functionality through one route /contributors...
Next improvement could be to split CREATE and UPDATE actions into 2 different routes in DRF.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if its possible to change something in the ReleaseContributorSerializer so that update() on the contributor serializer doesn't get invoked when we are just changing the index/roles/etc

This probably works, the method to check whether we are updating in the update() method just felt wrong lol

@alee alee force-pushed the 230-cml-refactor-contributor-edit-metadata-form-affiliations-handling branch from b051254 to 58b7ec9 Compare May 22, 2024 19:16
@alee alee marked this pull request as ready for review May 23, 2024 04:37
Copy link
Member

@alee alee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM, thanks @asuworks !

after this gets merged and populate_contributor_affiliations is run we should set up a data migration to remove the old contributor.affiliations tags + models and rename json_affiliations -> affiliations as @sgfost mentioned but that can be in a followup PR + issue

@@ -214,7 +234,7 @@ def to_codemeta(self):
if self.orcid_url:
codemeta["@id"] = self.orcid_url
if self.affiliations.exists():
codemeta["affiliation"] = self.formatted_affiliations
codemeta["affiliation"] = self.formatted_json_affiliations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a rich object schema.org/Organization, maybe something like:

affiliation = self.json_affiliations[0] # assuming it exists
{
"@type": "Organization",
"@id": affiliation.ror_id,
"identifier": affiliation.ror_id,
"name": affiliation.name,
"url": affiliation.url,
"sameAs": affiliation.ror_id
}

@alee
Copy link
Member

alee commented May 23, 2024

a few minor notes:

  1. adding a new contributor while searching for an existing User should use their affiliations to populate the affiliations search box if they already have them (either in MemberProfile or Contributor) but this kind of brings up some messy data duplication. How should we handle Contributor -> User -> MemberProfile.affiliations vs Contributor.json_affiliations?
  2. (minor issue) saving seems to take a while on staging but this could be unique to staging
  3. in local testing using the "Search for an Existing User" enters details but disables all data entry fields which makes "adjustments" to the fields impossible and also makes the "Contributors can only be updated when exclusive to one codebase" error message unnecessary

image

  1. on staging it appears that "Search for an Existing User" still allows for updates to those fields, not sure why there's a difference between local and staging?

image

@sgfost
Copy link
Contributor

sgfost commented May 23, 2024

An affiliations model either expanding the json (don't remember why it did get turned into a json field, now that I think about it) or just wrapping the json field is probably the only way to reasonably eliminate duplication there. I don't think theres any need to worry about syncing things aside from copying over user.mp.affiliations on contributor creation

I'd bet the deletion and re-creation of ReleaseContributors for every request is making things a bit slower than they should be, one of the things to look into when refactoring the contributor route

@alee
Copy link
Member

alee commented May 25, 2024

duplicate set bug still present if you manage to click save twice (reproducible on staging where it takes a minute to process)

image

@asuworks
Copy link
Contributor Author

a few minor notes:

  1. adding a new contributor while searching for an existing User should use their affiliations to populate the affiliations search box if they already have them (either in MemberProfile or Contributor) but this kind of brings up some messy data duplication. How should we handle Contributor -> User -> MemberProfile.affiliations vs Contributor.json_affiliations?

It could be the fact that the fake Contributors with a user_id from our DB have totally different values from the corresponding Users table. Could you show an example of duplication you are talking about?

  1. (minor issue) saving seems to take a while on staging but this could be unique to staging
    yes...
  1. in local testing using the "Search for an Existing User" enters details but disables all data entry fields which makes "adjustments" to the fields impossible and also makes the "Contributors can only be updated when exclusive to one codebase" error message unnecessary

This is because of 'bad' data in our DB (you selected a user who already has a Contributor with that user_id, but their fields are totally different). Normally, this shouldn't happen in real life. So when you are trying to create add a Contributor from existing User following happens:

  1. The form is populated from AuthUser (Contributor.json_affiliations will also be populated from MemberProfile.affiliations)
  2. When you click "Save": a new Contributor and a new ReleaseContributor will be created with User data.

The idea behind disabling the UI fields is: we shouldn't allow changing fields if a Contributor is created from an existing User.

What we probably should do in PROD before deploying this, is to sync all fields for all existing Contributors with a user_id with the corresponding AuthUser fields + mp.affiliations. Copy fields from User to Contributor

  1. on staging it appears that "Search for an Existing User" still allows for updates to those fields, not sure why there's a difference between local and staging?

My bet is that James Martin has been created manually as a new Contributor and not through existing User. Can you try deleting him and adding him again. Are the fields still editable?

@asuworks
Copy link
Contributor Author

duplicate set bug still present if you manage to click save twice (reproducible on staging where it takes a minute to process)

image

True story! This is a repeated request bug. I will make a quick fix in frontend (move "Save" button disabling right after it has been clicked)

What about the backend: should we use redis to store requests based on ip? Create some sort of middleware for repeated requests?

@alee alee force-pushed the 230-cml-refactor-contributor-edit-metadata-form-affiliations-handling branch from 5f24068 to 364a346 Compare May 29, 2024 07:58
@asuworks
Copy link
Contributor Author

asuworks commented May 30, 2024

_is_not_used_or_used_by_current_release_only() was doing backend validation: is the user allowed to change the Contributor or not?

is_mutable() doesn't properly check whether the current user is allowed to modify the Contributor. Contributor.user_id could help us avoid a DB call (is_mutable will return False if user_id is not None) only in the case when Contributor was created from a User. What about other cases? (Manually created Contributor, contributor from existing Contributor without user_id?)

Example:
Say, User_A added to his CodebaseRelease_A a manually created Contributor_A (not used for any other Codebase).
Say, User_B wants to add Contributor_A. is_mutable() will return True, since it is used in only one Codebase (Codebase_A) thus allowing to modify a foreign Contributor (which should not happen).

I do not see a way to avoid a DB call in is_mutable(), since we do not have the info (wether the incoming Contributor is used somewhere else or not).

Any other ideas?

@sgfost
Copy link
Contributor

sgfost commented May 30, 2024

I think the only other way of doing something similar without a query was to disallow updating at all once the contributor has been created (obj.id is not None). The case where a contributor needs to be updated because of a mistake of something is probably rare, but on the other hand will create junk data since they'll need to create another.

Only other idea I have (not new, this is or is similar to one of the first things @asuworks proposed) is to redo the contributor structure in order to simplify everything by 'flattening' release contributor and contributor into one model which has

  • id
  • release
  • roles
  • index
  • include_in_citation
  • affiliations
  • user OR name and email (either don't store these for user-tied contributors or set it on create and make it immutable)

The downside to this was that it means 'duplicate' contributors for the same person, but all things considered now I think its much cleaner

@alee
Copy link
Member

alee commented May 30, 2024

_is_not_used_or_used_by_current_release_only() was doing backend validation: is the user allowed to change the Contributor or not?

is_mutable() doesn't properly check whether the current user is allowed to modify the Contributor. Contributor.user_id could help us avoid a DB call (is_mutable will return False if user_id is not None) only in the case when Contributor was created from a User. What about other cases? (Manually created Contributor, contributor from existing Contributor without user_id?)

Example: Say, User_A added to his CodebaseRelease_A a manually created Contributor_A (not used for any other Codebase). Say, User_B wants to add Contributor_A. is_mutable() will return True, since it is used in only one Codebase (Codebase_A) thus allowing to modify a foreign Contributor (which should not happen).

I do not see a way to avoid a DB call in is_mutable(), since we do not have the info (wether the incoming Contributor is used somewhere else or not).

Any other ideas?

I don't think we need to care - the only case where we might want to allow editing of an existing Contributor is for the person who originally submitted that Contributor. All other cases, make it immutable. But since we aren't capturing that information I think we should simplify it to, "All existing Contributors or Contributors created from a User should be immutable". The only other situation where it might make sense for mutability is if we are adding data, like Affiliations, but I think it's fine to simply leave it out.

@sgfost I think the flatten Contributors and ReleaseContributors path is a non-starter, we're already overdue on this particular task which was a bit of an iceberg tbf and I'm not entirely convinced that flattening them is a cleaner design either, just favoring one set of design tradeoffs for another.

@asuworks
Copy link
Contributor Author

asuworks commented May 30, 2024

That will make things a lot easier, but less intuitive: why can't I edit a Contributor, I have just manually created?
In the frontend: "All existing Contributors or Contributors created from a User should be immutable" will translate to basically not allowing any edits at all.

def is_mutable():
    return obj.id is None
    

Is that what we want?

@asuworks
Copy link
Contributor Author

asuworks commented May 30, 2024

in fact, we could drop the update() method completely

@sgfost
Copy link
Contributor

sgfost commented May 30, 2024

All existing Contributors or Contributors created from a User should be immutable

I think that is perfectly fine for now. Just show a note on the form or only show roles/citation when editing

in fact, we could drop the update() method completely

Yep, was just typing this out, instance.id should never be None in update(). Ideally completely disable update by raising an error but since this get called regardless I think we still need that 'is modifying' check

def update(self, instance, validated_data):
  if trying_to_modify(instance, validated_data):
    raise ValidationError(..) # or MethodNotAllowed?
  return instance

@alee alee force-pushed the 230-cml-refactor-contributor-edit-metadata-form-affiliations-handling branch 2 times, most recently from 127d445 to bcda087 Compare June 11, 2024 08:02
@@ -111,7 +111,7 @@
</form>
</div>
<div class="modal-footer border-0">
<button type="button" class="btn btn-outline-gray" @click="resetContributor">Reset</button>
<button v-if="!disableEditForm" type="button" class="btn btn-outline-gray" @click="resetContributor">Reset</button>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC this unfortunately makes the reset button disappear once you search for and select a user in the "Create New Contributor" pathway which the only use case you'd want for to begin with

asuworks and others added 6 commits June 12, 2024 14:07
CREATE: ReleaseContributors can be created from

- existing Contributors; editing Contributor fields is prohibited
- existing Users; editing Contributor fields prohibited
- from scratch; editing is permitted for fresh ReleaseContributors. if
  email is in use, notify user

only ReleaseContributor fields (Role, index) are editable for any
existing ReleaseContributors

- fix duplication bug in sortable plugin, closes
  comses/planning#230

- clean up Add Contributor Form and ContributorPage UI
- use collections.defaultdict(list) to manage multiple contributor
affiliations

TODO: add python request retry with HTTPAdapter
move extraction of ROR API return data into an affiliation dict
generated by the to_affiliation method

adjust scoring rubric to use ratio for both the ROR API score and
fuzz.partial_ratio and convert to an OR; for the most part any score
above 80 seems to be a reasonable match and an improvement on the
existing tag based affiliation data
- improve person / organization switching
- disable editing Contributor attributes from UI
- remove spinner on failed server-side validation
- only ReleaseContributor attributes (index, is_citable and roles) can
  be modified from UI
- fix bug in contributor serializer when submitting an organization with
  only a given name and no email
- use ROR affiliations search to generate Contributor.given_name,
  limit jsonAffiliations and ResearchOrgListField to a single entry
- add Reset button, only available for new ReleaseContributors, hidden
  through props when editing

closes comses/planning#209

Co-authored-by: Scott Foster <sgfost@users.noreply.github.com>
Co-authored-by: Allen Lee <alee@users.noreply.github.com>
tests cannot install django debug toolbar which is included in the dev
settings

```
ERRORS:
?: (debug_toolbar.E001) The Django Debug Toolbar can't be used with
tests
    HINT: Django changes the DEBUG setting to False when running tests.
    By default the Django Debug Toolbar is installed because DEBUG is
    set to True. For most cases, you need to avoid installing the
    toolbar when running tests. If you feel this check is in error, you
    can set `DEBUG_TOOLBAR_CONFIG['IS_RUNNING_TESTS'] = False` to bypass
    this check.
```
TODO:
1. populate json_affiliation with ROR lookup on data currently in
   affiliations via `./manage.py populate_contributor_affiliations`
2. add migrations to replace affiliations with json_affiliations

Co-authored-by: Anton Suharev <asuworks@users.noreply.github.com>
@alee alee force-pushed the 230-cml-refactor-contributor-edit-metadata-form-affiliations-handling branch from 1f79d75 to 719f05f Compare June 12, 2024 22:00
@alee alee merged commit 0f39e59 into comses:main Jun 12, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants