Releases: BojarLab/glycowork
Releases · BojarLab/glycowork
v1.4.0
Change Log
For Version 1.4.0
- Added an example workflow/tutorial for differential glycomics analysis to the Examples tab in the documentation
- Added additional tests via pytest
- Cleaned up repo with more stringent .gitignore, removing unnecessary files
- Added hover-over tooltips to the glycoworkGUI, describing how the input files should be formatted
- Exposed more keyword arguments of
get_heatmap
in GUI (CLR transformation + tick label control)
glycan_data
- Broadened the motif definition of “Mucin_elongated_core2” in
motif_list
- Refined the motif definitions of the O-glycan core motifs in
motif_list
to prevent overlaps - Larger (and cleaner) datasets for:
df_glycan
,df_species
,df_tissue
,df_disease
, andglycan_binding
- Updated
lib
from 2,366 to 2,565 glycoletters
loader
- Added the
glycoproteomics_data_loader
, to request stored glycoproteomics datasets - Added
human_milk_N_PMID34087070
andhuman_keratinocytes_PMID37956981
as example datasets forglycoproteomics_data_loader
(data are ID’ed in the “Glycosite” column in the format protein_site_composition) - Added
HexOS
andHexNAcOS
monosaccharide lists to be used in downstream functions - Added
modification_map
to map which monosaccharides can be modified with which post-biosynthetic modification - Added
DataFrameSerializer
to have a version-independent serializer for handlingdf_glycan
stats
- Added
get_glycoform_diff
to aggregate glycoforms differential expression across glycopeptides or glycoproteins via Fisher’s Combined Probability Test - Fixed a pandas deprecation warning in
replace_outliers_winsorization
(for pandas >= 2.2.2) - Added
get_glm
andprocess_glm_results
to fit and analyze generalized linear models, with interaction terms, to grouped glycoproteomics data - Added
partial_corr
to calculate regularized partial correlations - Added
estimate_technical_variance
andperform_tests_monte_carlo
to account for technical variation in glycomics data - Added the “cap_side” keyword argument to
replace_outliers_with_IQR_bounds
andreplace_outliers_winsorization
to allow users to cap outliers on “both”, “upper”, “lower” sides; default: “both” - Fixed the global NumPy RNG for
clr_transformation
andalr_transformation
to ensure reproducibility - Added the “correction_method” keyword argument to
correct_multiple_testing
, to allow users to switch between regular Benjamini-Hochberg and two-stage Benjamini-Hochberg
motif
processing
- Added support for sulfated monosaccharides to
get_possible_monosaccharides
- Added
parse_glycoform
,infer_features_from_composition
, andprocess_for_glycoshift
as helper functions in glycoproteomics data analysis - Expanded
canonicalize_composition
to deal with compositions of type “9 2 0 0” - Fine-tune
canonicalize_iupac
to not mess up formatting of sequences ending in “GlcOP-ol” - Added
de_wildcard_glycoletter
to retrieve a random specified monosaccharide/linkage of the general type present as a wildcard (e.g., Hex->Gal) - Added
get_class
to return the glycan class as a string, given a glycan sequence - If
choose_correct_isoform
is provided with isomers that have different amounts of ambiguities, it will now prioritize the isomers with the fewest ambiguities
graph
- Added support for mixing monosaccharide and modification wildcards in
compare_glycans
andsubgraph_isomorphism
(e.g., “HexNAcOS”) - Added the
handle_negation
decorator andsubgraph_isomorphism_with_negation
to process motif annotation with restrictions (e.g., “Gal(b1-3)[!GlcNAc(b1-6)]GalNAc” to prevent annotating core2 O-glycans as core1) subgraph_isomorphism
is now decorated withhandle_negation
, such that if the “motif” argument contains a negating operator (“!”), the function will actually executesubgraph_isomorphism_with_negation
- Added the “allowed_disaccharides” keyword argument to
get_possible_topologies
to support filtering possible extensions by physiological glycan extensions - Added a filter to
get_possible_topologies
to maintain chemically feasible structures by checking that the same carbon does not get two linkages - Support handling of post-biosynthetic modifications in
get_possible_topologies
, e.g., allowing things like “{6S}Gal(b1-3)[GlcNAc(b1-6)]GalNAc” as input, with uncertainty about where the sulfate is attached - Refactored
graph_to_string_int
to recursively construct a depth-first search tree to construct the IUPAC-condensed string - Supported monosaccharide-only graphs in
generate_graph_features
- Added
deduplicate_glycans
to remove duplicate glycans (with different IUPAC strings) from a list of glycans
analysis
- Added the “glycoproteomics” and “level” keyword arguments to
get_differential_expression
to support the analysis of glycoproteomics data if “glycoproteomics=True”. “level” indicates whether different glycoforms should be analyzed at the level of glycopeptides or glycoproteins - Added
get_glycoshift_per_site
to analyze whether, and in which way, glycosylation changes between conditions for each glycosylation site (controlling for protein expression etc.) via generalized linear models (GLM) adapted for compositional data (i.e., CLR-transformation) - Added
preprocess_data
as a centralization of data preprocessing for easier maintenance - Moved preprocessing code from
get_differential_expression
,get_glycanova
,get_biodiversity
, andget_roc
intopreprocess_data
- Fixed an issue in
clean_up_heatmap
in which sometimes the longer string instead of the longer sequence was picked for deduplication (e.g., “Internal_LewisX” vs “SialylLewisX”) - Moved
clean_up_heatmap
intomotif.annotate
- Added Omega-squared as an effect size output to
get_glycanova
- Fixed an issue in
get_heatmap
in which sometimes the function did not correctly rescue an input by transposing it, if the index contained special characters - Fixed an issue in
get_pca
in which the input of a dataframe for group specification resulted in an error - Disabled Levene’s test in
get_differential_expression
if either group has fewer than three samples, for numerical stability - Added the “partial_correlations” keyword argument to
get_SparCC
. If set to True, it will instead use regularized partial correlations to reduce multi-colinearity and enrich associations that represent direct effects (i.e., getting rid of bystander effects) - Added the “monte_carlo” keyword argument (default False) to
preprocess_data
andget_differential_expression
. If True, this will simulate technical variation by sampling 128 Monte Carlo instances from a Dirichlet distribution for each sample. Only works for sequences & CLR for now. This will substantially increase runtime and be considerably more conservative in yielding significant differences between conditions. Use with caution. - In
get_differential_expression
glycans that had been filtered out by variance filtering now still have their mean abundance and log2FC recorded in the output table - Added the “show_all” keyword argument to
get_heatmap
to force all tick labels to display, even if they visually overlap
annotate
- Added
annotate_glycan_topology_uncertainty
to probe whether motifs can be annotated in the case of structural ambiguity (e.g., {Fuc(a1-3)} in N-glycans, to still annotate Lewis X) - Expanded
annotate_dataset
to let it automatically switch betweenannotate_glycan
andannotate_glycan_topology_uncertainty
, depending on whether structural ambiguity is present in a glycan (the latter is much more costly in terms of computation) - Added the (default: True) keyword argument “remove_redundant” to
quantify_motifs
that will callclean_up_heatmap
on the output to remove redundant motifs - Dynamically generated terminal motifs now have the prefix “Terminal_” in all outputs
- Resolved a recent deprecation warning from pandas in
get_k_saccharides
- Added a warning to
annotate_dataset
that will print all features in “feature_set” that are not being recognized - Support the use of “terminal1” as a synonym to the original “terminal” in “feature_set”
draw
- Support the new “Terminal_” prefix in
GlycoDraw
andannotate_figure
tokenization
- Added support for sulfated HexA and HexN in
map_to_basic
- Added
calculate_adduct_mass
to calculate the mass for generic molecular formulae (e.g., C2H4O2) - Added support for chemical tags or adducts in
composition_to_mass
,glycan_to_mass
, andmz_to_composition
via the new “adduct” keyword argument - Added “Pen” to
get_core
- The default “glycan_class” in
mz_to_composition
is now “all” (but it can of course still be user-specified) - Added the new keyword argument “extras” to
mz_to_composition
, to allow users to switch off the consideration of adducts or doubly-charged input masses (the default now is to opt out of adducts but users can add that to “extras”) - Copy the input dictionary in
composition_to_mass
to prevent any in-place modification of the keys
network
biosynthesis
- Made network construction faster via code optimizations
- Added the “mode” keyword argument to
choose_path
,find_diamonds
,trace_diamonds
, andevoprune_network
to allow for biosynthetic motif analysis to use information from relative abundances - We now support the use of longitudinal data in
get_differential_biosynthesis
to analyze whether biosynthetic flows change over time - Fixed an issue in
get_differential_biosynthesis
in which N-glycans with high-mannose sequences caused errors (due to the backward direction of synthesis) - Fixed an issue in
get_differential_biosynthesis
in which N-glycans, containing many unobserved intermediate sequences, had capacity bottleneck issues - Added the “min_default” keyword argument to
estimate_weights
, to allow class-dependent fine-tuning of the minimum capacity - Modif...
v1.3.0
Change Log
For Version 1.3.0
- Added
get_heatmap
to the glycoworkGUI - Added an “About” tab to the glycoworkGUI, describing the glycowork version that it is running and pointers to the reference and documentation
- Added
get_lectin_array
to the glycoworkGUI - Added a progress bar to lengthier operations in the glycoworkGUI
- Reduced filesize of glycoworkGUI by ~20% and filesize of glycowork by >80%
- Removed inplace operations from pandas functions, because of PDEP-8
- PyTorch (
torch
) is now no longer a mandatory requirement for base glycowork. It has been shifted to the setup requirements for the optionalglycowork[ml]
install. Trying to do machine learning without that install will result in an appropriate ImportError gdown
is now a mandatory requirement for glycowork, to support hosting larger files outside the package itself
glycan_data
- Updated
glycan_binding
by averaging results from duplicate sequences with different formatting - Added processed example glycomics datasets that are available via
loader.glycomics_data_loader
- Added processed example lectin array datasets that are available via
loader.lectin_array_data_loader
- Added a bit of fuzziness to the motifs in
motif_list
to allow for broader capture (e.g., “GalOS” instead of “Gal6S” when appropriate, or “Sia” instead of "Neu5Ac”) - Fixed the definition of
Internal_LacNAc_type1
inmotif_list
loader
- Added
glycomics_data_loader
as an object for requesting glycomics data. Use dir(glycomics_data_loader) for displaying available glycomics datasets, and then request them via glycomics_data_loader.XXX (same goes for lectin array data, which is requestable vialectin_array_data_loader
) - Added
human_skin_O_PMC5871710
,human_skin_O_PMC5871710_BCC
,human_skin_O_PMC5871710_SCC
,human_colorectal_O_PMC9254241
,human_colorectal_N_PMID26085185
,human_colorectal_O_PMID19152289
,human_gastric_O_PMC4816881
,human_gastric_O_PMID28461410
,human_gastric_O_PMC5762837
,human_gastric_O_PMC7226152
,human_liver_O_PMC9254241
,human_liver_O_PMC5383776
,human_ovarian_O_PMC4468167
,human_prostate_O_PMC8010466
,human_prostate_N_PMC8010466
,human_retina_GSL_PMC5173345
,human_leukemia_O_PMID34646384
,human_leukemia_N_PMID34646384
,HIV_gagtransfection_N_PMID35112714
,HIV_gagtransfection_O_PMID35112714
,time_series_N_PMID32149347
,human_brain_GSL_PMID38343116
,human_brain_N_PMID38343116
,human_brain_O_PMID38343116
,human_platelets_O_PMID36952551
,human_platelets_N_PMID36952551
,human_serum_bacteremia_N_PMID33535571
,time_series_HMO_PMID22649065
, andtime_series_O_PMID32149347
as datasets forglycomics_data_loader
- Added
A549_influenza_PMID33046650
andHEK_XBP1_PMID30305426
as datasets forlectin_array_data_loader
- Added
lectin_specificity
as a resource for documented lectin specificities for lectin array analysis - Switch
glycan_binding
,df_species
, anddf_glycan
to lazyloading for improved package import etc. - Added
strip_suffixes
to strip a column of string values of suffixes such as “.1”, “.2” that pandas may assign to duplicate columns - Added
download_model
to download hosted large files, such as model weights, when needed
stats
- Fixed an issue in
test_inter_vs_intra_group
in which mean values were not correctly broadcast if “paired = False” and “grouped_BH = True” - Added
get_equivalence_test
to test for significant equivalence of group means via two one-sided t-tests - Added
clr_transformation
for the center log ratio transformation of a glycomics dataframe with the addition of scale uncertainty via a gamma parameter (see for instance https://arxiv.org/abs/2201.03616 for the theory behind this) - For
impute_and_normalize
, the default value for “min_samples” has been changed to 0.1, which now means that at least 10% of the samples (rounded down) need to be non-zero for a glycan to be retained. Further, features for which one group only has zero values will now be imputed with 1e-5 to avoid erroneous homogenization of effects byMissForest
- Changed the “min_feature_variance” default from 0.01 to 0.02 in
variance_based_filtering
and now it also outputs the discarded rows as a second output - Added
replace_outliers_winsorization
to cap outliers via Winsorization - Fixed numpy random seed to 0
- Added
anosim
for ANOSIM (Analysis of similarities) for the beta-diversity calculation inget_biodiversity
- Added
alpha_biodiversity_stats
for performing an ANOVA on alpha diversity metrics, if groups > 2 inget_biodiversity
- Fixed a warning if the standard deviation of a paired sample in
cohen_d
was exactly zero - Added
calculate_permanova_stat
andpermanova_with_permutation
for PERMANOVA (Permutational multivariate analysis of variance) for the beta-diversity calculation inget_biodiversity
- Added
alr_transformation
,get_procrustes_scores
, andget_additive_logratio_transformation
to find ALR reference component to perform the ALR transformation for compositional data analysis - Added
correct_multiple_testing
to centralize multiple testing correction and also add a warning if >90% of features are significant (in which case, Bonferroni correction will be applied to make results more conservative) - Raised tolerance of
MissForest
from 1e-6 to 1e-5 (as it’s applied to the sum of differences, it’s still very conservative) - Added
omega_squared
to calculate Omega squared, as an effect size for ANOVA-type analyses
motif
analysis
- Change
get_differential_expression
to only callTST_grouped_benjamini_hochberg
if “grouped_BH = True”, otherwise default to scipy two-stage Benjamini-Hochberg get_differential_expression
now also outputs equivalence tests for all cases in which the uncorrected p-value is above 0.05get_differential_expression
,get_glycanova
,get_time_series
, andget_jtk
now will internally CLR- or ALR-transform input glycomics data to appropriately handle compositional data. These functions also newly accept a “gamma” keyword argument to tune the scale uncertainty for lowering the potential for false-positivesget_heatmap
will now automatically transpose the input dataframe if it has been provided in the wrong orientation- Added the “transform” keyword argument to
get_heatmap
, to optionally CLR/ALR-transform the input data by setting ‘transform = “CLR”’ or ‘transform = “ALR”’ - The “transform” keyword argument also exists in most other analysis functions and accepts “ALR” and “CLR”, if users wish to override the automatically inferred type of transformation (“Nothing” is accepted for not transforming data at all but this is not recommended in most circumstances)
- Changed multiple testing correction to two-stage Benjamini-Hochberg, even if no grouped Benjamini-Hochberg test is being done
- Also change the “min_samples” default to 0.1 in
get_differential_expression
and other functions - Changed all analysis functions to use Winsorization (
glycan_data.stats.replace_outliers_winsorization
) instead of IQR capping (glycan_data.stats.replace_outliers_with_IQR_bounds
) for outlier treatment - Added
get_SparCC
to perform SparCC (Sparse Correlations for Compositional Data) to find pairwise associations between glycans sequences, or motifs, between two glycomics datasets, with the typical interface of.analysis
functions (note that you can also use a glycomics dataset together with an, e.g., metagenomics dataset, even if “motifs=True” is set) - Removed outlier treatment in
get_pvals_motifs
to avoid removing actual effects of effect-sparse glycan array data - Added beta-diversity measures (via Euclidean distance on CLR/ALR-transformed data) to
get_biodiversity
. This function now operates on a shopping cart principle, similar to “feature_set” in the annotation functions. The “metrics” shopping cart currently has “alpha” and “beta” as options. Beta-diversity is tested via ANOSIM (e.g., differences in central tendencies) and PERMANOVA (e.g., variations in dispersions between groups) - In
get_heatmap
a correct color mapping (ascending or contrastive) is now automatically chosen and applied depending on whether negative values are absent or present in the input data, respectively (transform=”CLR” will introduce negative values in the data and trigger contrastive coloring) - Added the “custom_scale” keyword argument to
get_differential_expression
,get_glycanova
,get_biodiversity
, andget_time_series
. Only use it if you know what you’re doing. Basically, if you know that the total amount of glycans goes up/down in your condition of interest (in the condition, not in the measurement), then provide the ratio of glycan signal as group2/group1 and that will be used for an informed scale model, as described in https://www.biorxiv.org/content/10.1101/2024.04.01.587602v1 . Alternatively, if you have more than two groups, “custom_scale” can be provided as a dictionary of type: group idx : mean(group)/min(mean(groups)). [In all these cases, “gamma” becomes a parameter describing experimental error in measuring this glycan signal] - In
get_volcano
the default for “x_thresh” has been changed to 0 (post-hoc filtering of results by fold-change invalidates the FDR guarantee) and a new “n” keyword argument exists to provide the sample-size for applying anget_alphaN
calculated alpha threshold - Added
get_roc
to calculate ROC AUC scores for all features and, optionally, plot the ROC curve of the best feature. Also works in multi-group mode (i.e., best feature to distinguish class A from all other classes) and can use “custom_scale” - Added
get_lectin_array
to analyze lectin array data to find out what kind of glycan motifs are increasing/decreasing between conditions - Added an optional number of keyword arguments to
get_volcano
that get directly passed onto the seaborn scatterplot function (**kwargs) - Added the “r...
v1.2.0
Change Log
For Version 1.2.0
- Added
glycoworkGUI.py
to build the .exe based GUI for important glycowork endpoint functions:GlycoDraw
,plot_glycans_excel
, andget_differential_expression
- Removed
python-louvain
as a required dependency forglycowork
glycan_data
loader
- Switched from
pkg_resources
toimportlib
for loading tabular data into the package
stats - Fixed an issue in
TST_grouped_benjamini_hochberg
that caused errors if nothing was significantly different in the entire dataset or in any group test_inter_vs_intra_grouping
is now robust to non-paired data and data with differing sample sizes per condition- Added
replace_outliers_with_IQR_bounds
to support outlier treatment inmotif.analysis
- Added
sequence_richness
,shannon_diversity_index
, andsimpson_diversity_index
to calculate diversity indices of glycomics data
motif
processing
- WURCS handling for universal input now encompass more monosaccharides
- GlycoCT handling for universal input now is robust to the declaration of substituents not immediately following their monosaccharide in the GlycoCT string
- Added
equal_repeats
to check whether two repeating units of a polysaccharide are the same, just shifted - Modified glycan nomenclature detection in
canonicalize_iupac
to be less prone of overidentifying Oxford when it’s just numbers etc. - Added “ß” to the typo detection in
canonicalize_iupac
and “(-)” as a variation of linkage uncertainty detection - Made
canonicalize_iupac
robust to the variation of using {} instead of () for linkages
graph
- Removed the required usage of lib in
glycan_to_nxGraph
,compare_glycans
,subgraph_isomorphism
, and all downstream functions (lib only remains for stemification and deep learning model training/inference) - The keyword argument “wildcards_ptm” now also works as intended when providing pre-calculated graphs as input to
compare_glycans
orsubgraph_isomorphism
- Fixed a rare issue in which
subgraph_isomorphism
, when “count = False”, would sometimes erroneously output “False” because of a greedy approach to evaluating potential matches
tokenization
- Added
get_unique_topologies
to retrieve all base topologies for a given composition that have been observed for a given taxonomic subset - Added the “obfuscate_ptm” keyword argument to
map_to_basic
, to allow for mapping Gal6S to Hex6S rather than the default HexOS, if that is required/advantageous - Support mapping of phosphorylated glycans in
map_to_basic
draw
- Fixed an issue where cross-ring fragments were not correctly rendered in
GlycoDraw
plot_glycans_excel
can now also be used with filepaths to .xlsx files (in addition to .csv files)plot_glycans_excel
now also supports compact glycan drawing with the “compact” keyword argument- Improved drawing resolution in
plot_glycans_excel
GlycoDraw
will now more strongly make use of nomenclature canonicalization in case of IUPAC dialects (still not 100%, if you suspect you use a dialect of IUPAC, pass your sequences throughcanonicalize_iupac
first)- If no filepath is specified,
GlycoDraw
will now also display drawn glycan structures in a non-Jupyter environment (as the classic matplotlib pop-up). Note that this functionality requires the cairosvg dependency (head to https://bojarlab.github.io/glycowork/examples.html#glycodraw-code-snippets if you’re unsure about that)
analysis
- Functions able to use .csv paths as input can now also deal with .xlsx paths as input
- The new “annotate_volcano” keyword argument now allows for the direct insertion of SNFG images within plots from
get_volcano
without having to subsequently rundraw.annotate_figure
get_pvals_motifs
,get_differential_expression
,get_glycanova
,get_time_series
, andget_jtk
now useglycan_data.stats.replace_outliers_with_IQR_bounds
to auto-smooth outliers- Moved
hotellings_t2
toglycan_data.stats
- All functions compatible with motif-level analysis now accept the “custom_motifs” keyword argument to be passed to
annotate_dataset
orquantify_motifs
if “custom” is included in “feature_set” - Changed the “mode” keyword argument in
get_heatmap
to “motifs” as a Boolean argument, like in all othermotif.analysis
functions - Added a call to
clean_up_heatmap
toget_jtk
to avoid redundant motifs - Added
get_biodiversity
to compare two groups of glycomics datasets with regard to the sequence diversity that is present (similar to comparable analyses for microbiome data)
regex
- Added
filter_dealbreakers
to allow for the exclusion of identified matches if they have illegal components beyond the identified match (e.g., the forbidden Fuc in "Fuc-([Gal|GalNAc])?-Gal-([!Fuc]){,1}-GlcNAc"). Before this, the sequence context except the Fuc was extracted and returned. - Fixed an edge case in
filter_matches_by_location
in which internal locations sometimes had to handle triple-nested lists which led to errors get_match
can now also use glycan graphs, such as derived fromglycan_to_nxGraph
, as input- Added
get_match_batch
to process a whole list of glycans at once, with some performance improvements via first pre-compiling the pattern - Fixed an edge case in
get_match
in which pattern components consisting of a single monosaccharide with a specified linkage (e.g., “Fuca3”) could sometimes erroneously output no matches - Added
motif_to_regex
to convert glycan motifs (e.g., in IUPAC-condensed) into a regular expression suitable forget_match
. Limited to simple queries for now.
annotate
get_terminal_structures
now has a “size” keyword argument with which users can control the size of the extracted terminal motifsget_k_saccharides
now has a “terminal” keyword argument with which users can filter to only count motifs at non-reducing endsannotate_dataset
and functions using it now can add the “terminal2” and “terminal3” option in “feature_set” to also annotate & analyze terminal motifs of size 2 (e.g., Neu5Ac(a2-3)Gal(b1-4)) or size 3 (e.g., Neu5Ac(a2-3)Gal(b1-4)GlcNAc)
network
biosynthesis
- Added the possibility of providing abundances to
construct_network
that are then stored as node attributes in the network - Added
add_high_man_removal
as a post-processing step inconstruct_network
to allow for the addition of reactions removing mannoses from high-Man N-glycans occurring during maturation - Added
estimate_weights
andget_edge_weight_by_abundance
to estimate reaction capacities from abundances + estimate missing abundances - Added
get_maximum_flow
,get_max_flow_path
, andget_reaction_flow
to calculate maximum flow paths between network root and endpoints as well as aggregate the flow by reaction type - Added
get_differential_biosynthesis
as a wrapper function to compare two groups of glycomes/networks with regard to their biosynthesis (differential flow paths or differential reaction flows) - Fixed an issue in
construct_network
in which sometimes nodes with outgoing but no incoming connections were not detected as unconnected nodes, leading to incomplete networks - Added the
rescue_glycans
decorator toconstruct_network
, to allow for auto-fixing nomenclature variations - Improved performance of
construct_network
by reducing wasteful computation
evolution
- Switched
get_communities
from usingpython-louvain
to the Louvain implementation innetworkx
v1.1.0
Change Log
glycan_data
- Updated sugarbase database and all models
stats
- Newly added module to glycowork
- Moved all the statistics functions from
motif.processing
into this module:cohen_d
,mahalanobis_distance
,mahalanobis_variance
,variance_stabilization
,MissForest
,impute_and_normalize
, andvariance_based_filtering
- Added
fast_two_sum
,two_sum
,expansion_sum
,hlm
,update_cf_for_m_n
,jtkdist
,jtkinit
,jtkstat
, andjtkx
helper functions for JTK test - Added
get_BF
to calculate Jeffreys' approximate Bayes factor based on sample size and p-value - Added
get_alphaN
to calculate sample size-appropriate significance cut-offs informed by Bayesian statistics - Added
pi0_tst
andTST_grouped_benjamini_hochberg
to perform a Two-Stage adaptive Benjamini-Hochberg procedure based on groups (e.g., https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3175141/ or https://www.biorxiv.org/content/10.1101/2024.01.13.575531v1) - Added
test_inter_vs_intra_group
to estimate intra- versus inter-group correlation with a mixed-effects model for groupings of glycans based on domain expertise
motif
regex
- Newly added module to glycowork
- Added the
get_match
function and associated functions to implement a regular expression system for glycans. This allows for powerful queries to detect and extract motifs of arbitrary complexity.
processing
- Moved
cohen_d
,mahalanobis_distance
,mahalanobis_variance
,variance_stabilization
,MissForest
,impute_and_normalize
, andvariance_based_filtering
intoglycan_data.stats
to re-focusprocessing
on processing glycan sequences - Extended
canonicalize_composition
to cases like ‘5_4_2_1’, ‘5421’, and ‘(Hex)2 (HexNAc)2 (Deoxyhexose)1 (NeuAc)2 + (Man)3(GlcNAc)2’ - GlycoCT and WURCS handling for universal input now encompass more monosaccharides and more modifications
- Expanded
oxford_to_iupac
to handle more complex sequences, including sulfation, LacdiNAc, hybrid structures, extended Neu5Ac, complex fucosylation, more custom linkage specifications enforce_class
can now deal with free glycans regardless of whether they end in ‘-ol’ or not
annotate
annotate_dataset
and downstream functions now accept a new keyword in “feature_set”, called “custom”. If “custom” is added to “feature_set”, a list of custom motifs can and must be added via the “custom_motifs” keyword argument. “custom” can be mixed and matched with all other keywords in “feature_set”annotate_dataset
now also accepts glyco-regular expressions via the “custom” keyword in “feature_set”. These expressions need to be added within the “custom_motifs” keyword argument and have to start with an “r”, such as "rHex-HexNAc-([Hex|Fuc]){1,2}-HexNAc". Normal motifs and glyco-regular expressions can be freely mixed within “custom_motifs”- Added
group_glycans_core
,group_glycans_sia_fuc
, andgroup_glycans_N_glycan_type
to group glycans by core structure (for O-glycans), Sia/Fuc/FucSia/Rest, or complex/hybrid/high-man/rest (for N-glycans) - Fixed a bug in
get_k_saccharides
, in which redundant columns were not always correctly removed
analysis
- Added
get_jtk
to analyze circadian expression of glycans in temporal glycomics datasets using the Jonckheere–Terpstra–Kendall (JTK) algorithm, with the typical interface for motifs and imputation etc analogous to differential expression. get_differential_expression
,get_glycanova
, andget_jtk
now useget_alphaN
to calculate a sample size-appropriate significance cut-off (see https://journals.sagepub.com/doi/10.1177/14761270231214429) and add a ‘significant’ column to the output to display whether the corrected p-values lie below this threshold- Added the “zscores” keyword argument to
get_pvals_motifs
to perform z-score transformation if used data are not yet z-score transformed, by setting “zscores” to False - For statistical calculations,
get_pval_motifs
will now weigh the motif occurrences by z-score magnitude, rather than only using a cut-off for enrichment calculations - Added effect size calculations to
get_pval_motifs
which are also in the output, as Cohen’s d - Changed
get_pval_motifs
such that now both enrichments and depletions will be tested (with depletions resulting in negative effect sizes) - Added
select_grouping
to find out which grouping of glycans has the highest intra- versus inter-group correlation, as estimated byglycan_data.stats.test_inter_vs_intra_group
- When “motifs = False” and “grouped_BH = True”,
get_differential_expression
now tries to use the Two-Stage adaptive Benjamini-Hochberg procedure based on groups for multiple testing correction, if meaningful groups can be found in the glycans [note this makes everything at least one order of magnitude slower, though most datasets should still finish in a few seconds]
draw
- In
GlycoDraw
, the “highlight_motif” keyword argument can now use glyco-regular expressions in addition to regular motifs (just add a single ‘r’ before your glyco-regular expression to indicate that it is indeed a regular expression) - Added
plot_glycans_excel
to allow for the automated insertion ofGlycoDraw
SNFG pictures into an Excel file containing glycan sequences
graph
categorical_node_match_wildcard
now uses string ID for matching, instead of integer ID, which means even two graphs, generated with two different libs, can now be successfully compared viacompare_glycans
orsubgraph_isomorphism
compare_glycans
orsubgraph_isomorphism
(and all functions using these functions) now support negation, by prepending “!”. For instance, “!Fuc(a1-?)Gal(b1-4)GlcNAc” will match subsequences that have a monosaccharide that is NOT Fuc before the Gal. It is highly recommend to generate your own lib viaget_lib
if you use negation, as monosaccharides such as !Fuc are not within lib and will cause indexing errors.- Added “?1-?” as another ultimate wildcard (promoting it from a strong narrow wildcard)
- Fixed some cases where “Monosaccharide” was not treated as an ultimate wildcard in graph operations
- Fixed an issue in
graph_to_string
in which glycans of size 1 (e.g., “GalNAc”) sometimes were missing their first character
network
- Updated pre-calculated biosynthetic networks for milk oligosaccharides
biosynthesis
- Refactored
find_diff
to make networks compatible with the automated, dynamic wildcards (i.e., ? behave as they should and don’t necessarily cause over-branching of the network) - In
highlight_network
, the “motif” keyword argument can now use glyco-regular expressions in addition to regular motifs (just add a single ‘r’ before your glyco-regular expression to indicate that it is indeed a regular expression)
ml
model_training
- In
training_setup
, upgraded the loss functions for all classification problems to PolyLoss with label smoothing (see https://arxiv.org/abs/2204.12511 for details). - In
training_setup
, number of classes (for multiclass or multilabel classification) can now be specified via the new “num_classes” keyword argument
v1.0.1
Change Log
motif
processing
- Slightly extended WURCS parsing in
wurcs_to_iupac
- Fixed an issue in
choose_correct_isoform
in which errors would be caused if the input list contained only duplicate glycans - Fixed an issue in
choose_correct_isoform
in which errors would be caused if the input list contained only glycans without branching
draw
- Adapted cairosvg imports so that, even without cairosvg dependencies, users can plot glycans inline and export as .svg files (only export as .pdf and export of
annotate_figure
is still restricted to cairosvg)
network
biosynthesis
- Fixed handling of empty outputs of
choose_correct_isoform
inconstruct_network
evolution
- Fixed dictionary handling in
get_communities
v1.0.0
Change Log
- Added a Zenodo badge, to have a release-specific doi for glycowork
glycan_data
- Updated sugarbase database; sugarbase is now pickled, so literal evaluations are necessary
- Harmonized glycan column names across generated dataframes; all use ‘glycan’ now, ‘target’ has been deprecated
loader
- Updated
motif_list
to be compatible with new position encoding - Added Internal_LewisX and Internal_LewisA to
motif_list
(renamed LewisX and LewisA to Terminal_LewisX and Terminal_LewisA, correspondingly) - Made
df_species
static again to speed up package import - Added
find_nth_reverse
helper function that finds the starting index of the nth occurrence of a substring from the end of the string - Added
remove_unmatched_brackets
helper function to strip unmatched opening or closing brackets from glycan strings
motif
- Added more masses to mz_to_composition.csv /
mass_dict
: Acetonitrile, Formate, Cl-, HCO3-, and NH4+
processing
- Extended
canonicalize_iupac
to cases like "NeuGcα3Galβ3(NeuAcα6)GalNAcol" and even more modification formulations, e.g., “6S-GlcNAc” - Added
canonicalize_composition
to convert compositions formatted either in the style of HexNAc2Hex1Fuc3Neu5Ac1 or N2H1F3A1 into dictionaries used by glycowork - Added GalNAc4S to permitted reducing end monosaccharides for O-linked glycans in
enforce_class
MissForest
now has a maximum number of iterations and will check for convergence each iteration (immediately finishing upon converging), yielding some speed-ups in most cases- The output of
min_process_glycans
no longer contains empty strings for glycans ending in a linkage - Updated
choose_correct_isoform
to be compatible with change inmin_process_glycans
- Added
get_possible_linkages
to retrieve linkages matching a wildcarded linkage - Added
get_possible_monosaccharides
to retrieve monosaccharides matching a monosaccharide type (HexNAc, etc.) - Added decorators,
rescue_glycans
andrescue_compositions
, to canonicalize them in case a decorated function errors out - Added
linearcode_to_iupac
to support LinearCode as input format for glycowork (this will be called withincanonicalize_iupac
and the decorators); note that for now coverage may not be perfect yet - Added
iupac_extended_to_condensed
to support IUPAC-extended as input format for glycowork (this will be called withincanonicalize_iupac
and the decorators); note that for now coverage may not be perfect yet - Added
glycoct_to_iupac
to support GlycoCT as input format for glycowork (this will be called withincanonicalize_iupac
and the decorators); note that for now coverage may not be perfect yet - Added
wurcs_to_iupac
to support WURCS as input format for glycowork (this will be called withincanonicalize_iupac
and the decorators); note that for now coverage may not be perfect yet - Added
oxford_to_iupac
to support Oxford as input format for glycowork (this will be called withincanonicalize_iupac
and the decorators); note that for now coverage is limited check_nomenclature
(formerly inmotif.tokenization
) now handles outputting warning messages for trying to use non-string, non-graph nomenclatures or SMILES with glycowork functions- Expanded
find_isomorphs
to generate more isomorphic sequence variants and thereby increasing the chances thatchoose_correct_isoform
will have access to the canonical sequence - Fixed a rare issue with
canonicalize_iupac
where sequences coming fromstructure_to_basic
would sometimes be formatted incorrectly if they contained dHex - Fixed an issue in
find_isomorphs
in which double branches were not always correctly swapped
analysis
get_heatmap
now no longer tries to convert data to relative abundances if negative values are detected in the input- All functions using dataframes as inputs in
analysis
can now also be used by providing full filepaths to the .csv file instead - Optimized some of the code for readability and speed (everything should be at least a bit faster now)
annotate
get_k_saccharides
is now allowed to generate new dynamic motifs with tokens outside of lib (viaexpand_lib
)annotate_glycan
andannotate_dataset
now also support narrow wildcards- Fixed an issue in
count_unique_subgraphs_of_size_k
in which branched motifs were not always correctly formatted (i.e., opening/closing brackets) get_k_saccharides
now outputs dataframes with counts as default and can yield the old nested lists of motifs by setting the new keywordjust_motifs
to True- Fixed an edge case in which
get_k_saccharides
sometimes overcounted individual monosaccharides if their strings overlapped
graph
subgraph_isomorphism
andcompare_glycans
now support using wildcards and position encoding at the same time. Theextra
keyword argument is now deprecated and the functions auto-detect whether anything has been specified in wildcards and/or termini_listsubgraph_isomorphism
andcompare_glycans
now support automatically inferred narrow wildcards to allow for (i) matching linkages like a1-? to only specified linkages within that group (e.g., a1-3 but not b1-3 etc.) and (ii) matching monosaccharide types like HexNAc to only specified monosaccharides of that type (e.g., GlcNAc but not Glc, etc.)- The
wildcard_list
keyword argument in all graph & annotation functions is now deprecated as wildcards are inferred automatically via narrow wildcards and native full wildcards (?1-? and Monosaccharide) subgraph_isomorphism
now behaves as expected for testing motifs ending in linkages on glycans ending in linkagessubgraph_isomorphism
can now return the matched subgraphs in the input glycan with the newreturn_matches
keyword argumentglycan_to_nxGraph
is now decorated with therescue_glycans
decorator, which auto-canonicalizes IUPAC strings if they are not in the format preferred by glycowork- Fixed mismatch of labels and string_labels in
categorical_node_match_wildcard
- Fixed an issue in
subgraph_isomorphism
in which, when using positional encoding, sometimes the mirror image of a motif was incorrectly captured if the termini aligned termini_list
withinsubgraph_isomorphism
now only requires the specification of monosaccharide positions- Added
expand_termini_list
helper function to facilitate the expansion of monosaccharide-onlytermini_list
into fulltermini_list
behind the scenes - Added support for shorthand notation of position encoding, now either ‘terminal’ or ‘t’ will work
- Improved handling of complex branching in
graph_to_string
; should be fewer unexpected translations now - Fixed an issue in
graph_to_string
in which induced subgraphs could cause errors due to unexpected or weirdly sorted node indices - Fixed an edge case in which the reducing end could be sometimes calculated as ‘internal’ when termini=’calc’ in
glycan_to_nxGraph
- Deprecated a duplicate
character_to_label
andstring_to_labels
- Deprecated
categorical_termini_match
; the functionality is now handled withincategorical_node_match_wildcard
- Deprecated the
wildcards
keyword argument fromcompare_glycans
as this will now be detected internally, if wildcards are provided viawildcard_list
tokenization
- Composition functions (e.g.,
composition_to_mass
) are now decorated withrescue_compositions
, which means that they can be used with compositions like “H3N2” (basically anything thatcanonicalize_composition
can handle) - Deprecated
character_to_label
as it’s now handled withinstring_to_labels
- Moved
check_nomenclature
into motif.processing - Optimized some of the code for readability and speed (most things should be at least a bit faster now)
draw
- Support motif highlighting in
GlycoDraw
: by providing thehighlight_motif
keyword argument, motifs can be highlighted (everything else will be set to low opacity). Works with IUPAC-condensed motifs and named motifs fromknown
- Support wildcards in motif highlighting with the
highlight_wildcard_list
keyword argument, for instance highlighting allGal(?1-?)GlcNAc
subunits (for Gal(b1-?)GlcNAc you don’t needhighlight_wildcard_list
, as narrow wildcards are handled automatically) - Support positional encoding in motif highlighting with the
highlight_termini_list
keyword argument, for instance highlighting all terminal, non-reducing endGal(b1-?)GlcNAc
subunits (yes, you can use both wildcards and positional encoding at the same time😊) - Support drawing of repeat structures (indicated by brackets and the number of repeats) via the new
repeat
keyword argument. Internal repeats can also be specified with the additionalrepeat_range
keyword argument. - Optimized some of the code for readability and speed (most things should be at least a bit faster now)
network
biosynthesis
- Optimized some of the code for readability and speed (everything should be up to 2x faster now)
evolution
- Optimized some of the code for readability and speed (everything should be at least a bit faster now)
ml
- Optimized some of the code for readability and speed (most things should be at least a bit faster now)
v0.8.1-zenodo
Literally no code changes at this point (0.9 is expected to come in December) but Zenodo requires a new release to mint a doi
v0.8.1
v0.8.0
Change Log
For Version 0.8.0
- Linted the package with flake8
- Increased code coverage
- Added another optional extras install, [chem], including glyles, requests, and pubchempy
glycan_data
- Changed
lib
to be a dict of type glycoletters:index, as it’s faster to index a dict vs. a long list; also adapted all functions usinglib
to reflect this change
loader
- Added
replace_every_second
helper function - Updated
linkages
list - Changed
linkages
andHex
etc to be sets instead of lists
motif
processing
- Added
variance_stabilization
for variance stabilization normalization, both globally and group-specific - Added
in_lib
helper function to check whether all glycoletters of glycan are in lib - Deprecated
small_motif_find
cohen_d
now also returns the variance of the effect size and supports paired samples as well (calculating Cohen’s dz in this case)- Added
mahalanobis_distance
to calculate Mahalanobis distance as an effect size for multivariate comparisons - Added
mahalanobis_variance
to estimate variance of Mahalanobis distance via bootstrapping - Added
MissForest
for random forest based data imputation - Cleaned up
canonicalize_iupac
and made it slightly faster - Added
variance_based_filtering
- Added
impute_and_normalize
and underlying helper functions - Fixed numpy random seed for reproducibility
- Sped-up
presence_to_matrix
tokenization
- Deprecated
mz_to_composition
mz_to_composition2
is now the newmz_to_composition
- Adapted
mz_to_structures
,compositions_to_structures
, andmatch_composition_relaxed
to work with this change
annotate
- Added
create_correlation_network
to identify clusters of highly correlated glycans/motifs - Added
count_unique_subgraphs_of_size_k
as a helper function withinget_k_saccharides
- Refactor
get_k_saccharides
to be faster and more complete (and be, effectively, a replacement ofmotif_matrix
) annotate_dataset
now usesget_k_saccharides
for mono- and disaccharides, instead ofmotif_matrix
- Deprecated
motif_matrix
annotate_dataset
now also creates relevant ?-containing motifs if ‘terminal’ in feature_set, even if they don’t explicitly occur in the glycan strings- Big speed-up for
annotate_dataset
if known=True, as we now cache the precalculated motif graphs - Added
quantify_motifs
as a wrapper aroundannotate_dataset
to adequately distribute relative abundances across extracted motifs - Deprecated
estimate_lower_bound
as speed-ups make it no longer necessary
analysis
- Renamed
make_heatmap
toget_heatmap
- Renamed
make_volcano
toget_volcano
- Deprecated
replace_zero_with_random_gaussian
(this is now handled byMissForest
in .processing withinimpute_and_normalize
) - Added
hotellings_t2
for multivariate comparisons - Changed multiple-testing correction method from Holm-Sidak to Benjamini-Hochberg
- Added
variance_stabilization
inget_differential_expression
- Added the option to analyze highly correlated sets of glycans/motifs (via
create_correlation_network
) withinget_differential_expression
- Implemented usage of
hotellings_t2
and the Mahalanobis distance (as effect size) for usage if sets are analyzed withinget_differential_expression
get_heatmap
andget_differential_expression
now scale abundances by the actual counts of motifs per glycan, not just absence/presence- Added
get_meta_analysis
to estimate combined effect sizes from the results of multiple studies (both fixed-effects and random-effects models can be estimated) - Added
variance_based_filtering
inget_differential_expression
- Effect size variances can now also be retrieved within
get_differential_expression
via the effect_size_variance keyword argument get_differential_expression
now also can handle paired samples when paired=Trueget_differential_expression
now also tests the homogeneity of variances using Levene’s test in all settings (also multiple-testing controlled)- Added
get_glycanova
to use ANOVA-based analyses on glycomics datasets (uses basically all the improvements ofget_differential_expression
, including analysis on the motif level) - Added
get_pca
to plot glycomics data (also has the motif interface) - Added
get_pval_distribution
to plot the distribution of p-values - Added
get_ma
to plot a Bland-Altman plot - Added
get_glycan_change_over_time
to detect significant changes in time-course data via OLS fitting - Added
get_time_series
as a wrapper aroundget_glycan_change_over_time
to do time series analyses, with all the motif & normalization functionality - Added
get_coverage
to visualize glycan expression across samples (ordered by average intensity) in a coverage plot
draw
- Added import warning if draw dependencies are not installed
- Removed
pycairo
from dependencies - Modified
annotate_figure
to be compatible with .svg files from older Matplotlib versions - Changed “output” to “filepath” in
GlycoDraw
- If there are “?” in the provided filepath for
GlycoDraw
, they will now be automatically replaced with “_” to avoid saving errors
graph
- Sped-up
glycan_to_graph
/glycan_to_nxGraph
(and all downstream functions, which are a lot) - Also improved the runtime of downstream functions, such as
subgraph_isomorphism
independent of these advances subgraph_isomorphism
now also accepts precalculated motif graph as inputs (in addition to the already supported precalculated glycan graphs)
ml
- Rephrased import warnings to reflect optional install strategy for extra dependencies
model_training
- Sped-up
train_ml_model
network
biosynthesis
create_neighbors
no longer uses the libr keyword
v0.7.0
Change Log
For Version 0.7.0
- Removed support for Python 3.7; as we use the walrus operator in some of the re-worked functions, Python 3.8+ is now required to use
glycowork
- Added optional installs for specialized
glycowork
usage (‘all’, ‘ml’, and ‘draw’; for now), which install additional dependencies for these usages; more details in docs
glycan_data
Updated datasets, models, lib to be bigger & better; removed many sequence duplicates with differently written branch orderings
loader
- Added
multireplace
helper function, to map a dictionary of changes to a string - Made
build_custom_df
faster
motif
draw
- Added
draw
as a new submodule of.motif
- Added
GlycoDraw
to draw glycans in SNFG style and save them as .svg/.pdf - Added
annotate_figure
to replace glycan text with glycan images in .svg figures (heatmaps, volcano plots, etc.) - Added
text_to_glycan
, which replaces glycan strings in figures with glycan images - Added
scale_in_range
to normalize a list of numbers within a range
tokenization
- Sped up
glycan_to_composition
by 1000x (avoiding explicit stemification and just doing stemification of the building blocks); also speeds up all functions usingglycan_to_composition
- Sped up
composition_to_mass
(independent of the above) glycan_to_composition
(and downstream functions) now can handle more post-biosynthetic modifications: Ac, PCho, PEtN- Renamed
calculate_theoretical_mass
toglycan_to_mass
- Sped up
mz_to_composition2
by (i) filtering out duplicate compositions and (ii) selecting compositions from a chosen taxonomic kingdom - Reprioritized
mz_to_composition2
by first searching for native compositions and only then looking for compositions + adducts and only then searching for doubly-charged compositions canonicalize_iupac
now also handles floating substituents and can handle many more typos / inconsistencies / IUPAC dialects (such as CFG-coded glycans), including improvements made by Kathryn Klarich- Moved
canonicalize_iupac
intomotif.processing
- Expanded
get_core
(and downstream functions) with HexA, HexNAc, dHex - Expanded
map_to_basic
to (some) post-biosynthetic modifications mz_to_structures
no longer outright fails if no m/z value can be matched- Deprecated
structures_to_motifs
;annotate_dataset
can do the same
processing
- Fixed bug in processing glycans with floating substituents in
small_motif_find
- Deprecated
seed_wildcard
choose_correct_isoform
has been updated to keep up with the improvedfind_isomorphs
- Added more informative error message to
IUPAC_to_SMILES
get_lib
is now slightly faster
graph
- Sped up
compare_glycans
with string inputs, by avoiding graph operations when the two glycans do not have the same composition - Added support for enabling modification wildcards in
compare_glycans
andsubgraph_isomorphism
(for instance matching GalOS and Gal6S) by setting wildcards_ptm = True - Speed-up
glycan_to_nxGraph_int
by optimizing node label/attribute assignments - Refactor
graph_to_string
to be a lot more robust, streamlined, and faster. Its new integration withcanonicalize_iupac
may also result in string improvement upon back-translation (e.g., branch order canonicalization) ensure_graph
now has **kwargs that get passed toglycan_to_nxGraph
get_possible_topologies
now supports internal additions as well, with the keyword argument ‘exhaustive’possible_topology_check
now supports wildcard matching via **kwargs passed on tocompare_glycans
- Made changes to make
glycowork
compatible with NetworkX 3.0 - Moved
bracket_removal
tomotif.processing
- Fixed a small inconsistency in handling floating substituents in
glycan_to_nxGraph_int
that could have caused issues with custom libs override_reducing_end
is no longer needed inglycan_to_nxGraph
to delineate linkage-ending glycans (e.g., Fuc(a1-2) ); this is auto-inferred withinglycan_to_nxGraph
now
annotate
- Deprecated
convert_to_counts_glycoletter
andglycoletter_count_matrix
;motif_matrix
can do both - Refactored
motif_matrix
to be substantially faster and more condensed in its output (also speeds upannotate_dataset
with the ‘exhaustive’ option in the feature_set argument) - Expanded
motif_matrix
to implicitly test for subsumption enrichment (e.g., previously we only explicitly looked for “Gal(b1-?)GlcNAc”; now we also count “Gal(b1-4)GlcNAc” as to the former) annotate_glycan
is now dual-compatible with string and networkx graph input- expanded feature_set in
annotate_dataset
by the option ‘terminal’, which callsget_terminal_structures
- This usage of
get_terminal_structures
inannotate_dataset
now also does the same implicit test for subsumption enrichment as described formotif_matrix
above annotate_dataset
now creates its own lib, based on the motif list and the provided glycans- Expanded
find_isomorphs
to also be able to re-shuffle (some) branched branches - Moved
find_isomorphs
intomotif.processing
- Linkages-only are no longer considered by
motif_matrix
/annotate_dataset
analysis
- All functions with the feature_set keyword argument now can also use the ‘terminal’ keyword for analyzing non-reducing end motifs exclusively
- Added
get_differential_expression
to compare glycomics data, including data cleaning and imputation get_pvals_motifs
andmake_heatmap
no longer have the lib keyword argument, asannotate_dataset
will generate a suitable lib internally- Fixed relative abundance summation in motif-mode for
make_heatmap
- Added the
clean_up_heatmap
helper function to remove redundant (i.e., identical) rows in heatmaps, with a prioritization of named motifs and longer motifs containing redundant shorter motifs - Added
make_volcano
, to generate a volcano plot from internally calculated differential expression using theget_differential_expression
function - Moved
cohen_d
intomotif.processing
ml
model_training
train_ml_model
no longer has the lib keyword argument, as annotate_dataset will generate a suitable lib internally
network
biosynthesis
- Refactored
construct_network
pipeline to be faster and more memory-efficient reducing_end
has been deprecated and is being handled internally- Added
infer_roots
to auto-inferpermitted_roots
(also does not need to be specified any longer inconstruct_network
) - Implemented distance limit, to prevent combinatorial explosion when outlier glycans are present
- Deprecated
subgraph_to_string
andmake_network_from_edges
- Deprecated
fill_with_virtuals
andmake_network_directed
- Minor speed-up of
process_ptm
, by pre-calculating stem_lib once instead of for every glycan in network