Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove multiple duplicates in choice sheets #303

Merged
merged 2 commits into from
Sep 27, 2024

Conversation

Sujanadh
Copy link
Contributor

Issue:

Description:

This PR addresses the issue of duplicate combinations of list_name and name fields in the "choices" sheet when allow_choices_duplicates is enabled. It ensures that only unique combinations are retained after the merging process, improving the accuracy of the choices sheet and preventing redundancy.

Key Changes:

Duplicate Removal in Choices Sheet:

Added logic to remove duplicates based on the list_name and name columns when processing the "choices" sheet.
Implemented using the drop_duplicates(subset=['list_name', 'name']) method, ensuring the choice lists have unique entries.

Before the Fix:
When allow_choices_duplicates was enabled, the "choices" sheet could have multiple duplicate entries for the same list_name and name combination, leading to redundant data in the final output.

After the Fix:
The function now ensures that only unique combinations of list_name and name are retained, improving the integrity of the choices data while retaining necessary survey structure.

@Sujanadh Sujanadh requested a review from spwoodcock September 26, 2024 12:13
@Sujanadh Sujanadh self-assigned this Sep 26, 2024
@github-actions github-actions bot added the bug Something isn't working label Sep 26, 2024
@nrjadkry nrjadkry merged commit 491c6e8 into main Sep 27, 2024
4 of 5 checks passed
@nrjadkry nrjadkry deleted the fix/remove-duplicates-choices branch September 27, 2024 04:55
@spwoodcock
Copy link
Member

Isn't this regressing to the same functionality / issue we had before I removed the duplicate check? @manjitapandey had a form where yes/no choices had a different list name, but they were stripped out as we already have yes/no choices in our mandatory fields (giving an invalid form). I removed the duplicate check as it's valid to have duplicates in the choices sheet. Is this causing issues somehow?

@Sujanadh
Copy link
Contributor Author

Sujanadh commented Sep 27, 2024

We used to check if name field in choice sheet is unique or not, and removed other duplicates despite different list_name and we changed allow_duplicate_fields to True resulting into multiple same choices resulted in to the error of not displaying the label , may be its due to missing translation but I am not sure, removing duplicated yes_no list name worked as expected.

@Sujanadh
Copy link
Contributor Author

So in this PR we check for not only unique name field but the unique combination of name and list_name , so that we can have multiple yes_no choices with different list_name.

@manjitapandey
Copy link

manjitapandey commented Sep 27, 2024

Since its not because of the translation, rather the duplication of option. I believe we are good to go ahead with merging the PR and we also tested it on local the questions are visible now and no duplication of options.
@Sujanadh

@spwoodcock
Copy link
Member

Noted Sujan, thanks for clarifying!

I can see in this form screenshot there is clearly an issue with the translation fields not being filled. Having a combination of both filled label and empty label::English etc will surely mean the label doesn't display at all (I could be wrong, but I thought I have seen this behaviour in xlsforms)
Screenshot_20240928-232243_1

@rsavoye
Copy link
Collaborator

rsavoye commented Sep 28, 2024

This may not make a difference, but you're supposed to have the 2 letter ISO abbreviation, so it'd be label::English(en), etc...

@Sujanadh
Copy link
Contributor Author

For now, i have released the recent fix to work around, we can see this translation issue later on if we encounter related issue again.

@Sujanadh
Copy link
Contributor Author

Sujanadh commented Sep 30, 2024

This issue was originally due to multiple label field for english lang, custom form already had label :: english(1) and our mandatory form has label :: english(en) . ODK Collect does not support mixed naming conventions for the label column.
The correct format for specifying labels in different languages is label::language(code), such as label::English (en) for English.

image

@Sujanadh
Copy link
Contributor Author

proposed solution would be to detect invalid label and convert them to valid labels while injecting mandatory fields.

@spwoodcock
Copy link
Member

The (1) is appended by Pandas when merging two dataframes that have the same column, but both have values present (so they can't merge into one column).

We have the same when we had conflicting values in the save_to column.

Perhaps we need to specify one of the columns to take precedent over the other, likely the user uploaded XLSForm value.
(no need to do column name correction, as this is too complicated and a bit too 'magic')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants