Skip to content

v0.2.0

Compare
Choose a tag to compare
@Bribak Bribak released this 09 Aug 10:44
· 711 commits to master since this release

motif
tokenization

  • added functions for stemifying glycans (by removing rare modifications)
  • added match_composition & match_composition_relaxed for finding glycan structures in stored or provided databases that match a provided composition. Can be narrowed down to, e.g., a species of interest.

graph

  • added function to translate glycan graph back to IUPAC-condensed string
  • added try_string_conversion function to check whether glycan graph describes valid glycan
  • modified generate_graph_features to also work with networks

analysis

  • update plot_embeddings to use representation dataframes as inputs in addition to dictionaries
  • swap subplots in characterize_monosaccharide and modify labelling to enhance clarity
  • get_pvals_motifs now allows for a custom motif_list via the optional motifs argument
  • plot_embeddings now allows for a custom color palette

query

  • added glytoucan_to_glycan function to interconvert GlyTouCan IDs and glycans
  • get_insight now also yields the GlyTouCan ID of a glycan (if available) + the predicted taxonomy if no taxonomy is recorded in our database

annotate

  • added get_trisaccharides to retrieve a subset of the trisaccharides occurring in a glycan
  • added estimate_lower_bound to give make_heatmap + get_pvals_motifs a speedup option with estimate_speedup = True (warning: estimate_lower_bound is an estimate and might in theory lead to missed motifs in the motif annotation); typically results in a 3x speed-up

network

  • beta version of completely new module that is still in active development

biosynthesis

  • added functions to find neighbors in biosynthesis space (one reaction removed)
  • added functions to plot biosynthetic network for a set of glycans
  • added functions to combine/align biosynthetic networks

glycan_data

  • replaced glyco_targets_species_seq_all_V3 (~13,000 species-specific glycans) and v3_sugarbase (~20,000 unique glycans) with glyco_targets_species_seq_all_V4 (~23,000 species-specific glycans) and v4_sugarbase (~47,000 unique glycans)
  • correspondingly updated glycan ML models, representations, and substitution matrix
  • next to all the new glycans, many pre-existing glycans are now better specified (e.g., Gal3S instead of GalOS, wherever location of modification is known)
  • GlyTouCan IDs were added whenever possible
  • motif_list was expanded by two new motifs (difucosylated N-glycan core & extended core fucose)

ml
train_test_split

  • modified hierarchy_filter to ignore glycans with ‘undetermined’ taxonomy label