Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ftp: README review #2227

Open
ValWood opened this issue Sep 3, 2024 · 27 comments
Open

ftp: README review #2227

ValWood opened this issue Sep 3, 2024 · 27 comments

Comments

@ValWood
Copy link
Member

ValWood commented Sep 3, 2024

subtask from:
pombase/pombase-chado#720
#2058

@ValWood
Copy link
Member Author

ValWood commented Sep 3, 2024

  1. @kimrutherford add file names of new structure to this document
    https://docs.google.com/document/d/1TfvWngsI2U9-wkw2czxhHOZNmmQ8nxRwYa-TctrhMs0/edit

  2. @PCarme write READMEs (a lot of the info will be on the downlades website, do decide how much detail required and add referring URLs. If you don't know what it is tag me in the doc

  3. @ValWood / @kimrutherford to review fill in missing parts

  4. @PCarme to copy into place in Git

@kimrutherford
Copy link
Member

add file names of new structure to this document

I can do that but does it make sense to duplicate what is already in Git? Can't we edit the README text directly rather than copy into a Google doc and then back to the files in Git?

The READMEs are here:
https://github.com/pombase/pombase-scripts/tree/main/release_readme_files

There is one README file for each of the directories in the new structure:
https://www.pombase.org/public_releases/pombase-2024-06-01/

@ValWood
Copy link
Member Author

ValWood commented Sep 4, 2024

good point!

@PCarme
Copy link
Contributor

PCarme commented Sep 4, 2024

The READMEs are here:
https://github.com/pombase/pombase-scripts/tree/main/release_readme_files

Okay, thanks Kim ! I'll review the READMEs in there, and let you know when I'm done.

@PCarme
Copy link
Contributor

PCarme commented Sep 4, 2024

In https://github.com/pombase/pombase-scripts/blob/main/release_readme_files/exports_for_external_resources-README.txt, I have listed the files in the directory, but I don't really know what each of those corresponds to.

@kimrutherford
Copy link
Member

I have listed the files in the directory, but I don't really know what each of those corresponds to.

Thanks Pascal. I'll work on that one.

@PCarme
Copy link
Contributor

PCarme commented Sep 4, 2024

The "genome_sequence_and_features" directory contains several subdirectory. Should there be READMEs for all subdirectory, or a single README describing the content of all the subdirectories ?

@ValWood
Copy link
Member Author

ValWood commented Sep 4, 2024

The contents are quite diverse so I think each directory needs a README

@PCarme
Copy link
Contributor

PCarme commented Sep 4, 2024

Also, this file https://www.pombase.org/public_releases/pombase-2024-06-01/protein_features/transmembrane_domain_coords_and_seqs.tsv displays the entire sequence of each protein, not just the transmembrane domains sequences. Is it intended like that ?

@ValWood
Copy link
Member Author

ValWood commented Sep 4, 2024

It says coordinates and sequences, but it seems strange to put them together...

Maybe this wasn't a file for the public?

@kimrutherford ?

@kimrutherford
Copy link
Member

It says coordinates and sequences, but it seems strange to put them together...
Maybe this wasn't a file for the public?

This is all I can find about it:

@kimrutherford
Copy link
Member

This is all I can find about it:

I dug into my old email. This is from Snezhka. The thread is from April 2019, with the subject "transmembrane domains":


Hope everything is well - writing now to bug you with a question, sorry... Wonder if there is a way to, say, 'automatically' collect all transmembrane domains from all proteins. What I want to do is to compare the transmembrane domains (e.g. length distribution, unusual amino acids) in S. pombe to those in S. japonicus. Ideally so that I could do it separately for single spanners vs multispanners.


The file was created for Snezhka but it's updated nightly. Perhaps we don't need it in the new release directories?

@ValWood
Copy link
Member Author

ValWood commented Sep 5, 2024

Perhaps we don't need it in the new release directories?

agree, it's a bit random

@kimrutherford
Copy link
Member

agree, it's a bit random

OK, I've removed that file from the script that creates the new release directory structure.

@kimrutherford
Copy link
Member

The contents are quite diverse so I think each directory needs a README

I've added empty READMEs and checked that the script can process README files for sub-directories correctly.

@PCarme
Copy link
Contributor

PCarme commented Sep 5, 2024

@PCarme
Copy link
Contributor

PCarme commented Sep 5, 2024

@ValWood ValWood closed this as completed Sep 5, 2024
@ValWood ValWood reopened this Sep 5, 2024
@ValWood
Copy link
Member Author

ValWood commented Sep 5, 2024

There is a file with introns in CDS only (more important that we have these annotated), and one with CDS+UTRs
(we started adding UTR introns later, and we definitely don't have them all)

@PCarme
Copy link
Contributor

PCarme commented Sep 5, 2024

Oh right ! I hadn't thought about the UTR introns, it makes sense then. Thanks !

@kimrutherford
Copy link
Member

Also, this file isn't loaded properly https://www.pombase.org/public_releases/pombase-2024-06-01/genome_sequence_and_features/gff_format/Schizosaccharomyces_pombe_all_chromosomes_unstranded.gff3

I think that's OK. The file is empty because we don't have any unstranded features. Maybe we did have some years ago. I think it's best to remove it to prevent confusion.

@PCarme
Copy link
Contributor

PCarme commented Sep 6, 2024

I'm done writing the READMEs by the way.

@kimrutherford
Copy link
Member

I'm done writing the READMEs by the way.

Excellent. Thanks!

I haven't completed exports_for_external_resources-README.txt yet. Once I have, I'll make an example releases directory for 2024-09-01 so we can see if there is anything else needed.

@kimrutherford
Copy link
Member

Here's how the structure looks with the new READMEs and the latest release:
https://www.pombase.org/public_releases/pombase-2024-09-01/

We currently have the GPI/GPAD files for GO in this directory:
https://www.pombase.org/public_releases/pombase-2024-09-01/exports_for_external_resources/

Maybe they should be in the gene_ontology directory? It could be a sub-directory.

@kimrutherford
Copy link
Member

I've moved the allele_summaries.json file from exports_for_external_resources to the training_data_for_ML_and_AI directory since that's what is was created for (I think). There's nothing stopping us having files in more than one place so we could have a copy in exports_for_external_resources if it makes sense.

@kimrutherford
Copy link
Member

I haven't completed exports_for_external_resources-README.txt yet.

I've done that now:
https://www.pombase.org/public_releases/pombase-2024-09-01/exports_for_external_resources/PomBase_exports_for_external_resources_README.txt

As an experiment, the format is a bit different from the other READMEs. Let me know if you think it's better or worse.

Once I have, I'll make an example releases directory for 2024-09-01 so we can see if there is anything else needed.

I've done that too. Perhaps we can have a chat about it once we're all back from holiday.

https://www.pombase.org/public_releases/pombase-2024-09-01

@ValWood
Copy link
Member Author

ValWood commented Sep 11, 2024

I agree it makes sense to have the official GO release in the GO directory

@kimrutherford
Copy link
Member

I've moved the GPI/GPAD files into the gene_ontology directory.
https://www.pombase.org/public_releases/pombase-2024-09-01/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants