Skip to content

v1.0.0

Compare
Choose a tag to compare
@Bribak Bribak released this 04 Dec 12:41
· 251 commits to master since this release
044e18d

Change Log

  • Added a Zenodo badge, to have a release-specific doi for glycowork

glycan_data

  • Updated sugarbase database; sugarbase is now pickled, so literal evaluations are necessary
  • Harmonized glycan column names across generated dataframes; all use ‘glycan’ now, ‘target’ has been deprecated

loader

  • Updated motif_list to be compatible with new position encoding
  • Added Internal_LewisX and Internal_LewisA to motif_list (renamed LewisX and LewisA to Terminal_LewisX and Terminal_LewisA, correspondingly)
  • Made df_species static again to speed up package import
  • Added find_nth_reverse helper function that finds the starting index of the nth occurrence of a substring from the end of the string
  • Added remove_unmatched_brackets helper function to strip unmatched opening or closing brackets from glycan strings

motif

  • Added more masses to mz_to_composition.csv / mass_dict: Acetonitrile, Formate, Cl-, HCO3-, and NH4+

processing

  • Extended canonicalize_iupac to cases like "NeuGcα3Galβ3(NeuAcα6)GalNAcol" and even more modification formulations, e.g., “6S-GlcNAc”
  • Added canonicalize_composition to convert compositions formatted either in the style of HexNAc2Hex1Fuc3Neu5Ac1 or N2H1F3A1 into dictionaries used by glycowork
  • Added GalNAc4S to permitted reducing end monosaccharides for O-linked glycans in enforce_class
  • MissForest now has a maximum number of iterations and will check for convergence each iteration (immediately finishing upon converging), yielding some speed-ups in most cases
  • The output of min_process_glycans no longer contains empty strings for glycans ending in a linkage
  • Updated choose_correct_isoform to be compatible with change in min_process_glycans
  • Added get_possible_linkages to retrieve linkages matching a wildcarded linkage
  • Added get_possible_monosaccharides to retrieve monosaccharides matching a monosaccharide type (HexNAc, etc.)
  • Added decorators, rescue_glycans and rescue_compositions, to canonicalize them in case a decorated function errors out
  • Added linearcode_to_iupac to support LinearCode as input format for glycowork (this will be called within canonicalize_iupac and the decorators); note that for now coverage may not be perfect yet
  • Added iupac_extended_to_condensed to support IUPAC-extended as input format for glycowork (this will be called within canonicalize_iupac and the decorators); note that for now coverage may not be perfect yet
  • Added glycoct_to_iupac to support GlycoCT as input format for glycowork (this will be called within canonicalize_iupac and the decorators); note that for now coverage may not be perfect yet
  • Added wurcs_to_iupac to support WURCS as input format for glycowork (this will be called within canonicalize_iupac and the decorators); note that for now coverage may not be perfect yet
  • Added oxford_to_iupac to support Oxford as input format for glycowork (this will be called within canonicalize_iupac and the decorators); note that for now coverage is limited
  • check_nomenclature (formerly in motif.tokenization) now handles outputting warning messages for trying to use non-string, non-graph nomenclatures or SMILES with glycowork functions
  • Expanded find_isomorphs to generate more isomorphic sequence variants and thereby increasing the chances that choose_correct_isoform will have access to the canonical sequence
  • Fixed a rare issue with canonicalize_iupac where sequences coming from structure_to_basic would sometimes be formatted incorrectly if they contained dHex
  • Fixed an issue in find_isomorphs in which double branches were not always correctly swapped

analysis

  • get_heatmap now no longer tries to convert data to relative abundances if negative values are detected in the input
  • All functions using dataframes as inputs in analysis can now also be used by providing full filepaths to the .csv file instead
  • Optimized some of the code for readability and speed (everything should be at least a bit faster now)

annotate

  • get_k_saccharides is now allowed to generate new dynamic motifs with tokens outside of lib (via expand_lib)
  • annotate_glycan and annotate_dataset now also support narrow wildcards
  • Fixed an issue in count_unique_subgraphs_of_size_k in which branched motifs were not always correctly formatted (i.e., opening/closing brackets)
  • get_k_saccharides now outputs dataframes with counts as default and can yield the old nested lists of motifs by setting the new keyword just_motifs to True
  • Fixed an edge case in which get_k_saccharides sometimes overcounted individual monosaccharides if their strings overlapped

graph

  • subgraph_isomorphism and compare_glycans now support using wildcards and position encoding at the same time. The extra keyword argument is now deprecated and the functions auto-detect whether anything has been specified in wildcards and/or termini_list
  • subgraph_isomorphism and compare_glycans now support automatically inferred narrow wildcards to allow for (i) matching linkages like a1-? to only specified linkages within that group (e.g., a1-3 but not b1-3 etc.) and (ii) matching monosaccharide types like HexNAc to only specified monosaccharides of that type (e.g., GlcNAc but not Glc, etc.)
  • The wildcard_list keyword argument in all graph & annotation functions is now deprecated as wildcards are inferred automatically via narrow wildcards and native full wildcards (?1-? and Monosaccharide)
  • subgraph_isomorphism now behaves as expected for testing motifs ending in linkages on glycans ending in linkages
  • subgraph_isomorphism can now return the matched subgraphs in the input glycan with the new return_matches keyword argument
  • glycan_to_nxGraph is now decorated with the rescue_glycans decorator, which auto-canonicalizes IUPAC strings if they are not in the format preferred by glycowork
  • Fixed mismatch of labels and string_labels in categorical_node_match_wildcard
  • Fixed an issue in subgraph_isomorphism in which, when using positional encoding, sometimes the mirror image of a motif was incorrectly captured if the termini aligned
  • termini_list within subgraph_isomorphism now only requires the specification of monosaccharide positions
  • Added expand_termini_list helper function to facilitate the expansion of monosaccharide-only termini_list into full termini_list behind the scenes
  • Added support for shorthand notation of position encoding, now either ‘terminal’ or ‘t’ will work
  • Improved handling of complex branching in graph_to_string; should be fewer unexpected translations now
  • Fixed an issue in graph_to_string in which induced subgraphs could cause errors due to unexpected or weirdly sorted node indices
  • Fixed an edge case in which the reducing end could be sometimes calculated as ‘internal’ when termini=’calc’ in glycan_to_nxGraph
  • Deprecated a duplicate character_to_label and string_to_labels
  • Deprecated categorical_termini_match; the functionality is now handled within categorical_node_match_wildcard
  • Deprecated the wildcards keyword argument from compare_glycans as this will now be detected internally, if wildcards are provided via wildcard_list

tokenization

  • Composition functions (e.g., composition_to_mass) are now decorated with rescue_compositions, which means that they can be used with compositions like “H3N2” (basically anything that canonicalize_composition can handle)
  • Deprecated character_to_label as it’s now handled within string_to_labels
  • Moved check_nomenclature into motif.processing
  • Optimized some of the code for readability and speed (most things should be at least a bit faster now)

draw

  • Support motif highlighting in GlycoDraw: by providing the highlight_motif keyword argument, motifs can be highlighted (everything else will be set to low opacity). Works with IUPAC-condensed motifs and named motifs from known
  • Support wildcards in motif highlighting with the highlight_wildcard_list keyword argument, for instance highlighting all Gal(?1-?)GlcNAc subunits (for Gal(b1-?)GlcNAc you don’t need highlight_wildcard_list, as narrow wildcards are handled automatically)
  • Support positional encoding in motif highlighting with the highlight_termini_list keyword argument, for instance highlighting all terminal, non-reducing end Gal(b1-?)GlcNAc subunits (yes, you can use both wildcards and positional encoding at the same time😊)
  • Support drawing of repeat structures (indicated by brackets and the number of repeats) via the new repeat keyword argument. Internal repeats can also be specified with the additional repeat_range keyword argument.
  • Optimized some of the code for readability and speed (most things should be at least a bit faster now)

network

biosynthesis

  • Optimized some of the code for readability and speed (everything should be up to 2x faster now)

evolution

  • Optimized some of the code for readability and speed (everything should be at least a bit faster now)

ml

  • Optimized some of the code for readability and speed (most things should be at least a bit faster now)