v0.2.0
motif
tokenization
- added functions for stemifying glycans (by removing rare modifications)
- added match_composition & match_composition_relaxed for finding glycan structures in stored or provided databases that match a provided composition. Can be narrowed down to, e.g., a species of interest.
graph
- added function to translate glycan graph back to IUPAC-condensed string
- added try_string_conversion function to check whether glycan graph describes valid glycan
- modified generate_graph_features to also work with networks
analysis
- update plot_embeddings to use representation dataframes as inputs in addition to dictionaries
- swap subplots in characterize_monosaccharide and modify labelling to enhance clarity
- get_pvals_motifs now allows for a custom motif_list via the optional motifs argument
- plot_embeddings now allows for a custom color palette
query
- added glytoucan_to_glycan function to interconvert GlyTouCan IDs and glycans
- get_insight now also yields the GlyTouCan ID of a glycan (if available) + the predicted taxonomy if no taxonomy is recorded in our database
annotate
- added get_trisaccharides to retrieve a subset of the trisaccharides occurring in a glycan
- added estimate_lower_bound to give make_heatmap + get_pvals_motifs a speedup option with estimate_speedup = True (warning: estimate_lower_bound is an estimate and might in theory lead to missed motifs in the motif annotation); typically results in a 3x speed-up
network
- beta version of completely new module that is still in active development
biosynthesis
- added functions to find neighbors in biosynthesis space (one reaction removed)
- added functions to plot biosynthetic network for a set of glycans
- added functions to combine/align biosynthetic networks
glycan_data
- replaced glyco_targets_species_seq_all_V3 (~13,000 species-specific glycans) and v3_sugarbase (~20,000 unique glycans) with glyco_targets_species_seq_all_V4 (~23,000 species-specific glycans) and v4_sugarbase (~47,000 unique glycans)
- correspondingly updated glycan ML models, representations, and substitution matrix
- next to all the new glycans, many pre-existing glycans are now better specified (e.g., Gal3S instead of GalOS, wherever location of modification is known)
- GlyTouCan IDs were added whenever possible
- motif_list was expanded by two new motifs (difucosylated N-glycan core & extended core fucose)
ml
train_test_split
- modified hierarchy_filter to ignore glycans with ‘undetermined’ taxonomy label