Releases: jonathanking/sidechainnet
v0.6.0
Add protein sequences as strings to Dataloaders via batch.str_seqs
.
v0.5.0: Merge pull request #32 from jonathanking/dev
- Support custom, user-specified datasets.
- Add original sequence information (3-letter AA codes before AAs are "standardized").
- Make it much easier to reproduce SidechainNet (see scn.create, scn.generate_all) by preprocessing and storing ProteinNet online for user access.
- Change
batch.ress
tobatch.resolutions
- Adds
pandas
requirement to handle ProteinNet dataset split info. - Smaller additions and improvements.
See PR #32 for more details.
pip install sidechainnet
This release makes sidechainnet pip installable (to be tagged/uploaded immediately following this release post). It also allows users to create SidechainNet data using scn.create
.
v0.3.1
v0.3.0: Merge pull request #16 from jonathanking/improve_batchedstructurebuilder
Added functionality to scn.load
and improved API. Running create.py
now requires ProDy v2.0.
See #16 for more information.
v0.2.2
v0.2.1
This version makes resolution information accessible through batching. Resolution information was previously accessible through SidechainNet's Python dictionaries and via scn.load(...filter_by_resolution)
.
Resolution information for proteins in a given batch is accessible via the ress
key when using DataLoaders to batch data, i.e. batch.ress == (1.3,2.5,None,3.2)
. None
represents structures that had no available X-ray resolution information.
SidechainNet v0.2
This release is the first release of SidechainNet that includes all intended functionality and data elements. Still, this repository is considered to be under development and is designed for research purposes only.
This version:
- fixes a handful of corner cases with regards to data parsing for the dataset itself (D-amino acids, non-standard amino acids, etc.),
- provides support for secondary structure and structure resolution information, (when available, and for training data only), and
- modifies the way data is batched when using the custom provided Dataloaders (i.e.
scn.load(...with_pytorch='dataloaders')
).
It is important to note that this release is not completely backward compatible with prior releases with respect to the manner in which items are yielded from Dataloaders during training. The SidechainNet datasets themselves have also been improved. Please see the updated Colab Walkthrough for updated usage examples.
In prior releases, Dataloaders yielded tuples that contained information for training such as ProteinNet/SidechainNet IDs, sequences, angles, etc. This required the user to know the exact order of the returned items. To make this easier, v0.2 uses Dataloaders that yield a single item for training, a Batch
collections.namedtuple
object. Please see the README for more information on accessing training information from the namedtuple
.