-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate output #43
Separate output #43
Conversation
…usSTR into readme_update
In addition to separating files, this PR changes how lusSTR deals with missing data. Previously, it would drop any allele with 0 reads. However, this data is necessary for EuroForMix... therefore now lusSTR does not drop any allele from the output. |
This is ready for review @standage. I also plan to release the next version of lusSTR... it's been a minute since I last did that. :) |
data = uas_load(infile, snp_type_arg) | ||
data_filt = data.loc[data['Reads'] != 0].reset_index(drop=True) | ||
data_filt = uas_load(infile, snp_type_arg).reset_index(drop=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is where you're retaining alleles with 0 reads.
if data_filt.loc[j, 'Typed Allele?'] == 'No': | ||
flag = 'Contains untyped allele' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the Typed Allele?
column refer to whether there were any reads for that allele?
In any case, space and punctuation in column names can be problematic. If you have just added the column in this PR, I'd recommend using IsTyped
instead, and boolean values (True
/False
) rather than "Yes"
/"No"
strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or AlleleIsTyped
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Typed Allele?
column is from the Sample Details Report... it doesn't necessarily indicate an allele with 0 reads, but an allele with reads below the various thresholds, so can have a low number of reads as well as 0 (i.e. is the allele considered to be a real allele). The Yes/No
is read directly from the Sample Details Report, so I'd prefer to leave that as is, but I can change the column name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
Typed Allele?
column is from the Sample Details Report
I see. Maybe worth just leaving it in then...
lusSTR/snps.py
Outdated
try: | ||
os.mkdir(output_dir) | ||
except FileExistsError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest:
os.makedirs(output_dir, exist_ok=True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
This PR provides the option to separate the final output files by sample and output separate files. This is useful when inputting directly into LLAMAS.