Skip to content

A simple set of scripts to generate a file with all old sgd protein sequences

License

Notifications You must be signed in to change notification settings

pombase/all_previous_sgd_peptide_sequences

Repository files navigation

This repository creates a tsv file in which all old protein sequences from sgd genomes are listed (all_previous_seqs.tsv). By old protein sequence we mean sequences of proteins that existed in previous releases of SGD, but have currently changed.

The columns of the output file contain:

  1. Gene id
  2. Sequence
  3. Date when that sequence was introduced (extracted from the release date)

Run as github action

The code can be ran as a github action, and it will update the file. Instructions to run locally are provided below.

Run locally

Install dependencies

To install the dependencies, we used poetry (see poetry installation instructions).

In the source directory run:

poetry install

This should create a folder .venv with the python virtual environment. To activate the virtual environment, then run:

poetry shell

Now when you call python, it will be the one from the .venv.

Run the script

# activate virtual environment
poetry shell

# Run the script (see the comments)
bash run.sh

About

A simple set of scripts to generate a file with all old sgd protein sequences

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published