Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exportBulkRecords exportRecordsTyped choice string does not appear to be formatted for choices redcapAPI_2.8.5 #344

Closed
dpfindll opened this issue Mar 13, 2024 · 14 comments
Assignees
Labels

Comments

@dpfindll
Copy link

Afternoon,
I'm running into an issue exporting my data using the exportBulkRecords and exportRecordsTyped. I must be doing something wrong and this is a simple fix. Is this an error with my data in redcap or with the updated API? Any help would be appreciated.
Screenshot 2024-03-13 150850
Thanks,
David

@nutterb
Copy link
Collaborator

nutterb commented Mar 14, 2024

@dpfindll, could you get a screen shot of how the choices are defined for the giq_1a property.

The error indicates that there is something in the choices definition that violates the assumption in our text matching. It would help to see this definition in order to diagnose the problem.

@spgarbet
Copy link
Member

This will show it as well:

x <- rcon$metadata()
x$select_choices_or_calculations[x$field_name == "giq_1a"]

@dpfindll
Copy link
Author

Screenshot 2024-03-14 084210

@spgarbet
Copy link
Member

I think the code is confused because the value is repeated in the text description. Here's the source of the rejection of the select_choices in the library.

> REGEX_MULT_CHOICE_STRICT <- "^[^\\|]+,[^\\|]*(?:\\|[^\\|]+,[^\\|]*)*$"
> x <- "0,0 - Not at all|1|2|3|4|5|6, 6 - Very"
> grepl(REGEX_MULT_CHOICE_STRICT, x)
[1] FALSE

@spgarbet
Copy link
Member

Possible workaround for now: Change it to "0,0 - Not at all|1,|2,|3,|4,|5,|6, 6 - Very"

For @nutterb and I to consider is if this is an error in the library.

@dpfindll
Copy link
Author

Thank you for the quick response. Is there a way to make this change to the redcap project from the API or does this change need to be made from the redcap project? I have many fields formatted this way.

@spgarbet
Copy link
Member

This is a REDCap project change. I think the current version of REDCap does it in the format we expected, but older projects might have had definitions like this. I suspect you set this project up a few years back (please correct me if that's wrong). If we patch the library it'll take a few days to get something through the process.

@dpfindll
Copy link
Author

Correct, this is an older project. Thank you for all the quick responses and your time.

@nutterb
Copy link
Collaborator

nutterb commented Mar 14, 2024

We were so close to having this right! We have two regexes for evaluating the multiple choice strings

REGEX_MULT_CHOICE_STRICT <- "^[^\\|]+,[^\\|]*(?:\\|[^\\|]+,[^\\|]*)*$"
REGEX_MULT_CHOICE <- "^(^[^\\|]+,[^\\|]*(?:\\|[^\\|]+,[^\\|]*)*$|^(?:[^|,]+\\|)+[^|,]+$)$"

The strict form matches the current format of [code],[label]. The non-string form matches the older form of [label]. But this particular case uses both in the same definition.

I'll see if I can blend them, and then determine what kind of chaos it creates down stream.

@dpfindll you say you have a lot of fields formatted like this. Are they all in this mixed style. It looks like things would start to work if you only had to fix the mixed style (but if that's still a lot of fields, not very feasible).

@dpfindll
Copy link
Author

@nutterb we have several thousand fields, but it looks like its only this particular form with 50 fields in the mixed style. I will try standardizing on [code], [label]

@nutterb
Copy link
Collaborator

nutterb commented Mar 14, 2024

So I managed to put together a very ugly regex that, in principle, satisfies the constraint of matching choice string with a mix of legacy and strict formats. After really thinking about it though, permitting this mix essentially decimates any meaningful assumptions of what a choice string should look like. I think we could just as well reduce this check to "does not start nor end with either a comma or pipe."

This also means that if a user formats a set of choices that would be problematic in parsing the choices (for instance, if they put a pipe character in the label), instead of throwing an error that the string can't be interpreted as choices, create an incomprehensible choice matrix and spit out a lot of missing values. And it will do it quietly without any notice to the user about what is wrong. I'm actually quite confident that we would come to regret this change.

I think this may be a case where we are better off developing a script to convert the meta data into the strict format so that it can be reuploaded to the project and work correctly.

redcapAPI/R/constants.R

Lines 101 to 134 in 515c39c

# REGEX_MULT_CHOICE - matches acceptable formats for multiple choice options,
# to include formats that use only the label. See Issue 145.
# It's a good idea to trim whitespace before using this.
# Explanation -
# ^ : Start of string
#
# [^\\|]+ : A sequence of characters that does not start with a pipe
# , : A literal comma
# [^\\|] : A sequence of characters that does not end with a pipe
# [^\\|]+,[^\\|]* : The composition of the expected strict format. Specifically,
# characters, comma, characters. The * at the end makes it
# non greedy, meaning it will match if there are no characters
# after the comma.
#
# [^|,]+ : any number of characters, but the sequence may not
# include a pipe or comma
# [^\\|] : A set of characters that does not end in a pipe
# [^|,]+[^\\|]* : The composition of the legacy unstrict format. In earlier
# versions of REDCap, choices could be denoted to have the same
# code and label with the syntax 1|2|3
#
# (?:\\|...) : Match a repeating pattern of pipe, characters, up to the next pipe
#
# $ : end of string
#
# Now let's put these pieces together
# ([^\\|]+,[^\\|]*|[^|,]+[^\\|]*) : the regex to match either the strict or the non-strict pattern
# This will appear twice. First, we will use it to catch the
# first instance, then we will put it in the
# (?:\\|...) block to catch any remaining instances.
# The * at the end makes it non-greedy so that it will match even if
# there is only one choice defined.
REGEX_MULT_CHOICE <- "^([^\\|]+,[^\\|]*|[^|,]+[^\\|]*)(?:\\|([^\\|]+,[^\\|]*|[^|,]+[^\\|]*))*$"

@nutterb
Copy link
Collaborator

nutterb commented Mar 14, 2024

@nutterb we have several thousand fields, but it looks like its only this particular form with 50 fields in the mixed style. I will try standardizing on [code], [label]

I'm going to write up a function to convert your metadata. You should be able to upload it to your project (with approval from your REDCap Administrator) and convert these without quite as much effort. Stay tuned....

@nutterb
Copy link
Collaborator

nutterb commented Mar 14, 2024

@dpfindll

Try running this and let me know if that corrects the formats of the fields with the mixed formats. In fact, this should convert all of the legacy formatted variables to the modern format.

fixMultipleChoiceFormat <- function(metadata, 
                                    filename = NULL){
  w <- which(!grepl(REGEX_MULT_CHOICE, 
                   metadata$select_choices_or_calculations) & 
               metadata$field_type %in% c("checkbox", "dropdown", "radio"))
  
  for (i in w){
    ch <- unlist(strsplit(metadata$select_choices_or_calculations[i], "\\|"))
    ch_bad <- which(!grepl(",", ch))
    ch[ch_bad] <- sprintf("%s, %s", ch[ch_bad], ch[ch_bad])
    ch <- paste0(ch, collapse = " | ")
    
    metadata$select_choices_or_calculations[i] <- ch
  }
  
  if (is.null(filename)){
    return(metadata)
  }

  write.csv(metadata, 
            file = filename, 
            row.names = FALSE)
}

fixMultipleChoiceFormat(rcon$metadata())

@spgarbet spgarbet added question and removed bug labels Mar 14, 2024
@spgarbet
Copy link
Member

spgarbet commented Mar 14, 2024

I'm actually quite confident that we would come to regret this change.

That being the case, then maybe it should be documented how to repair it in the troubleshooting section. Could it be a support function, e.g. repairLegacyChoiceDefinitions that would become part of the library and the solution reference this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants