Skip to content

Commit

Permalink
Update Tutorial.md
Browse files Browse the repository at this point in the history
  • Loading branch information
johnpatramanis authored Oct 2, 2024
1 parent 05bdf2e commit dafac43
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion GitHub_Tutorial/Tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ For now, all you need to know is that activating the right environment is needed

To use this module of the pipeline we just need two input files:

a) 1 txt file, named ```Proteins.txt``` with the gene names of the proteins we are interested in. For example if we were interested in Enamelin, we would add ```ENAM``` into that file. Only one gene name per line should be provided. If you are interested in a specific [isoform](https://en.wikipedia.org/wiki/Protein_isoform) of a protein, you can select that by adding '::' after the gene name followed by the name of the isoform in Ensembl e.g. ```ENAM::ENAM-202```. If you want to get all possible isoforms follow the same strategy but add the word "ALL" e.g. ```ENAM::ALL```.
a) 1 txt file, named ```Proteins.txt``` with the gene names of the proteins we are interested in. For example if we were interested in Enamelin, we would add ```ENAM``` into that file. Only one gene name per line should be provided. If you are interested in a specific [isoform](https://en.wikipedia.org/wiki/Protein_isoform) of a protein, you can select that by adding '::' after the gene name followed by the name of the isoform in Ensembl e.g. ```ENAM::202```. If you want to get all possible isoforms follow the same strategy but add the word "ALL" e.g. ```ENAM::ALL```.

b) 1 txt file, named ```Organism.txt``` with the scientific names of the species we are interested in. For this version of the pipeline you can only select from species present in the [Ensembl database](https://www.ensembl.org/info/about/species.html). The species name should be without capital letters and with underscores instead of spaces e.g. ```homo_sapiens```. Once again only one species per line should be provided. Additionally if you want to use a specific reference version/assembly of a species, you can do that. Simply add a tab or a space after the species name and then write the assembly version e.g. ```homo_sapiens GRCh37```. If no specific assembly is provided the latest version of Ensembl will be used. Be aware of the proper name of each assembly as it needs to match perfectly with what Ensembl has in its database (e.g. "GRch37" would not work). You can look some of the assembly names [here](https://www.ensembl.org/info/website/archives/assembly.html). The safest way to find an valid name of an assembly is to look for it in the webpage of a gene:

Expand Down

0 comments on commit dafac43

Please sign in to comment.