feat: continuous, embedded covariates re-injected during training #3032

ori-kron-wis · 2024-10-27T13:30:49Z

Added other covariates types (continuous, embedded) to be able to reinjected, and not just to the input layer

…njected, and not just to the input layer

ori-kron-wis · 2024-10-27T13:31:59Z

Hrovatin · 2024-10-27T20:14:59Z

src/scvi/nn/_base_components.py

-    """A helper class to build fully-connected layers for a neural network.
+    """FCLayers class of scvi-tools adapted to also inject continous covariates.
+
+    The only adaptation is addition of `n_cont` parameter in init and `cont` in forward,


Please remove this docstring part - I only added it as info for implementing the change.

Hrovatin · 2024-10-27T20:17:59Z

src/scvi/nn/_base_components.py

@@ -135,13 +145,18 @@ def _hook_fn_zero_out(grad):
                    b = layer.bias.register_hook(_hook_fn_zero_out)
                    self.hooks.append(b)

-    def forward(self, x: torch.Tensor, *cat_list: int):
+    def forward(
+        self, x: torch.Tensor, cat_list: list | None = None, cont: torch.Tensor | None = None


I changed the forward parametrization from (x, *cat_list) to (x,cat_list,cont) - so this will break with other Modules using FCLayers by passing cat_list in expanded * format. So either the parametrization here needs to be fixed or the use of FCLayers elsewhere.

See my comment in #3021

Thanks. I thought I would be able to finish this in one shot for all models, but its a bit more complex than I thought. So, I'll be focusing on scvi and scanvi for now only.

…ly (not done yet)

codecov · 2024-10-29T11:13:19Z

Codecov Report

Attention: Patch coverage is 97.82609% with 2 lines in your changes missing coverage. Please review.

Project coverage is 84.35%. Comparing base (795297e) to head (77df3e2).

Files with missing lines	Patch %	Lines
src/scvi/nn/_base_components.py	95.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3032      +/-   ##
==========================================
- Coverage   84.80%   84.35%   -0.46%     
==========================================
  Files         173      173              
  Lines       14797    14806       +9     
==========================================
- Hits        12549    12489      -60     
- Misses       2248     2317      +69

Files with missing lines	Coverage Δ
src/scvi/external/contrastivevi/_module.py	`98.61% <100.00%> (ø)`
src/scvi/external/methylvi/_base_components.py	`100.00% <100.00%> (ø)`
src/scvi/external/methylvi/_module.py	`81.08% <100.00%> (ø)`
src/scvi/module/_mrdeconv.py	`95.13% <100.00%> (ø)`
src/scvi/module/_multivae.py	`82.22% <100.00%> (ø)`
src/scvi/module/_peakvae.py	`96.22% <100.00%> (+4.71%)`	⬆️
src/scvi/module/_scanvae.py	`86.23% <100.00%> (+0.84%)`	⬆️
src/scvi/module/_totalvae.py	`87.54% <100.00%> (ø)`
src/scvi/module/_vae.py	`94.50% <100.00%> (+0.02%)`	⬆️
src/scvi/module/_vaec.py	`85.05% <100.00%> (-0.34%)`	⬇️
... and 1 more

... and 5 files with indirect coverage changes

…i, other modules work but w/o the option for continous covariates injection to deep layers, which might be a bit more complex to implement

ori-kron-wis · 2024-10-30T13:36:19Z

This kind of change requires touching the module for each model. I don't think there is anywhere around it. Its risky and this is why I implemented it only for scvi & scanvi at the moment (maybe others will work also but I didn't test thoroughly). the rest of the changed modules are mainly things in function headers and placeholders to able to cope with the major changes in base_components (i.e cont_cov is usually None and cant be injected to their hidden layers).

To summarise we have 2 parameters to think of:

deeply_inject (bool): covariates (continuous or categorial whether being encoded or not) are always injected to the first layer. If this parameter is True they will also be injected to the hidden layer of the encoder/decoder.
encode_covariates (bool): this is only relevant for the categorial covariates. If this parameter is false, they will be injected as one-hot, if true, they will be encoded as an embedded matrix (this is dependent on other meta-parameters such as the size of embedded vector and the batch_representation which should be "embedded" as well if we want it, otherwise it will remain one-hot)

@Hrovatin please try to use this branch

Hrovatin · 2024-11-01T15:29:37Z

src/scvi/module/_vae.py

@@ -372,17 +382,17 @@ def _regular_inference(
        if self.batch_representation == "embedding" and self.encode_covariates:
            batch_rep = self.compute_embedding(REGISTRY_KEYS.BATCH_KEY, batch_index)
            encoder_input = torch.cat([encoder_input, batch_rep], dim=-1)
-            qz, z = self.z_encoder(encoder_input, *categorical_input)
+            qz, z = self.z_encoder(encoder_input, cont_covs, *categorical_input)


I think here batch_rep should be also concat to cont_covs to be treated as continuous covariate no?

I would prefer to keep all seperated. So you could do deep injection of the batch embedding or covariate.

Hrovatin · 2024-11-01T15:30:14Z

src/scvi/module/_vae.py

@@ -360,7 +370,7 @@ def _regular_inference(
        if self.log_variational:
            x_ = torch.log1p(x_)

-        if cont_covs is not None and self.encode_covariates:
+        if cont_covs is not None:
            encoder_input = torch.cat((x_, cont_covs), dim=-1)


Why do you always concatenate continous cov to expression instead of handling them separately as a covariate (sam as for one-hot)?

Hrovatin · 2024-11-01T15:31:28Z

src/scvi/nn/_base_components.py

@@ -45,13 +47,26 @@ class FCLayers(nn.Module):
        Whether to inject covariates in each layer, or just the first (default).
    activation_fn
        Which activation function to use
+    encode_covariates


This is not used anywhere?

encode_covariates and batch_representation are used in Module (e.g. vae) for handling of covariates not here

Hrovatin · 2024-11-01T15:36:10Z

src/scvi/nn/_base_components.py

        layers_dim = [n_in] + (n_layers - 1) * [n_hidden] + [n_out]

-        if n_cat_list is not None:
-            # n_cat = 1 will be ignored
+        if n_cat_list is not None and self.batch_representation == "one-hot":


I do not think you should check batch_representation here - if n_cat_list is not empty the covariates need to be accounted for here irrespective of what batch_representation is - as the batch_representation is used in Module to determine if batch will be added to the cat list or not. So the input cat cov list will already contain batch if it is not embedded ,besides other cat covariates

Hrovatin · 2024-11-01T15:38:07Z

src/scvi/nn/_base_components.py

+    encode_covariates
+        If ``True``, covariates are concatenated to gene expression prior to passing through
+        the encoder(s). Else, only gene expression is used.
+    batch_representation


This should be also removed and only kept in Module

Hrovatin · 2024-11-01T15:39:52Z

src/scvi/nn/_base_components.py

+                            if i > 0 and self.inject_covariates and cont_covs is not None:
+                                # Need to inject the continous covariates to hidden layers
+                                x = torch.cat((x, cont_covs), dim=-1)
+                            if self.batch_representation == "one-hot":


this should be done irregardless of what is batch representation for all covariates that were passed as categorical

Hrovatin · 2024-11-01T15:45:50Z

@ori-kron-wis I think there are things that need to be fixed as per my review comments. To summarise:
I think the behaviour should be as follows:

Module passes to layers expression, continuous covariates (can be None), and categorical covariates that need to be one-hot encoded (can be empty list)
Module is the one that decides if batch will be embedded and concatenated to continuous covariates or added to the categorical list
Layers should not check how the batch is encoded as their init params already tell the number of continous/categorical covariates as determined by the Module.
The layers then either inject covariates (continuous and one-hote encoded categorical) into all layers or only add to the first one. - So Module should not be concatenating continuous covariates to expression in advance.

@canergen can you confirm?

canergen · 2024-11-12T19:38:07Z

src/scvi/external/methylvi/_module.py

@@ -163,7 +163,7 @@ def inference(self, mc, cov, batch_index, cat_covs=None, n_samples=1):
        else:
            categorical_input = ()

-        qz, z = self.z_encoder(methylation_input, batch_index, *categorical_input)
+        qz, z = self.z_encoder(methylation_input, None, batch_index, *categorical_input)


Can you add the None as an argument in line 148. It makes it cleaner where it comes from.

This is throughout this PR.

canergen · 2024-11-12T19:38:44Z

src/scvi/module/_multivae.py

@@ -51,9 +53,9 @@ def __init__(
        )
        self.output = torch.nn.Sequential(torch.nn.Linear(n_hidden, 1), torch.nn.LeakyReLU())

-    def forward(self, x: torch.Tensor, *cat_list: int):
+    def forward(self, x: torch.Tensor, cont_covs: torch.Tensor | None = None, *cat_list: int):


Here it's correct.

canergen · 2024-11-12T19:41:46Z

src/scvi/module/_multivae.py

@@ -581,7 +604,7 @@ def inference(
        mask_acc = x_chr.sum(dim=1) > 0
        mask_pro = y.sum(dim=1) > 0

-        if cont_covs is not None and self.encode_covariates:
+        if cont_covs is not None:


We shouldn't remove the self.encode covariates check.

canergen · 2024-11-12T19:42:14Z

src/scvi/module/_multivae.py

@@ -597,21 +620,21 @@ def inference(

        # Z Encoders
        qzm_acc, qzv_acc, z_acc = self.z_encoder_accessibility(
-            encoder_input_accessibility, batch_index, *categorical_input
+            encoder_input_accessibility, None, batch_index, *categorical_input


why is this hard-coded?

canergen · 2024-11-12T19:43:35Z

src/scvi/module/_peakvae.py

@@ -185,6 +189,7 @@ def __init__(
            n_output=self.n_latent,
            n_hidden=self.n_hidden,
            n_cat_list=encoder_cat_list,
+            n_cont=n_continuous_cov,


only if encode covariates

canergen · 2024-11-12T19:44:59Z

src/scvi/module/_scanvae.py

+            cont_covs: torch.Tensor = tensors[REGISTRY_KEYS.CONT_COVS_KEY]
+            cont_covs = broadcast_labels(cont_covs, n_broadcast=self.n_labels)[1]
+        else:
+            cont_covs = None


please explain.

canergen · 2024-11-12T19:45:27Z

src/scvi/module/_scanvae.py

-        qz2, z2 = self.encoder_z2_z1(z1s, ys)
-        pz1_m, pz1_v = self.decoder_z1_z2(z2, ys)
+        qz2, z2 = self.encoder_z2_z1(
+            torch.cat((z1s, cont_covs), dim=-1) if cont_covs is not None else z1s, cont_covs, ys


no this is independent of cont_covs

canergen · 2024-11-12T19:45:40Z

src/scvi/module/_scanvae.py

+            torch.cat((z1s, cont_covs), dim=-1) if cont_covs is not None else z1s, cont_covs, ys
+        )
+        pz1_m, pz1_v = self.decoder_z1_z2(
+            torch.cat((z2, cont_covs), dim=-1) if cont_covs is not None else z2, cont_covs, ys


this to independent of cont_covs

Added other covariates types (continuous, embedded) to be able to rei…

77c5c5c

…njected, and not just to the input layer

ori-kron-wis requested a review from Hrovatin October 27, 2024 13:30

ori-kron-wis self-assigned this Oct 27, 2024

ori-kron-wis added this to the scvi-tools 1.2 milestone Oct 27, 2024

ori-kron-wis changed the title ~~Added other covariates types (continuous, embedded) to be able to rei…~~ feat: continuous, embedded covariates re-injected during training Oct 27, 2024

ori-kron-wis requested a review from canergen October 27, 2024 13:32

remove n_cont=self.n_continuous_cov for several models

88e93d5

Hrovatin suggested changes Oct 27, 2024

View reviewed changes

ori-kron-wis added 2 commits October 28, 2024 18:07

revert some models and added few fixes for scvi & scanvi only current…

41ab94c

…ly (not done yet)

revert some models and added few fixes for scvi & scanvi only current…

d30de72

…ly (not done yet)

ori-kron-wis added 2 commits October 30, 2024 15:06

Revert previous changes and added the bug fix for scanvi, scvi, peakv…

2045dff

…i, other modules work but w/o the option for continous covariates injection to deep layers, which might be a bit more complex to implement

small fix

77df3e2

ori-kron-wis marked this pull request as ready for review October 30, 2024 13:35

Hrovatin suggested changes Nov 1, 2024

View reviewed changes

canergen reviewed Nov 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: continuous, embedded covariates re-injected during training #3032

feat: continuous, embedded covariates re-injected during training #3032

ori-kron-wis commented Oct 27, 2024 •

edited

Loading

ori-kron-wis commented Oct 27, 2024

Hrovatin Oct 27, 2024

Hrovatin Oct 27, 2024

Hrovatin Oct 27, 2024

ori-kron-wis Oct 28, 2024

codecov bot commented Oct 29, 2024 •

edited

Loading

ori-kron-wis commented Oct 30, 2024

Hrovatin Nov 1, 2024

canergen Nov 12, 2024

Hrovatin Nov 1, 2024

Hrovatin Nov 1, 2024

Hrovatin Nov 1, 2024

Hrovatin Nov 1, 2024

Hrovatin Nov 1, 2024

Hrovatin Nov 1, 2024

Hrovatin commented Nov 1, 2024 •

edited

Loading

canergen Nov 12, 2024

canergen Nov 12, 2024

canergen Nov 12, 2024

canergen Nov 12, 2024

canergen Nov 12, 2024

canergen Nov 12, 2024

canergen Nov 12, 2024

canergen Nov 12, 2024

canergen Nov 12, 2024

feat: continuous, embedded covariates re-injected during training #3032

Are you sure you want to change the base?

feat: continuous, embedded covariates re-injected during training #3032

Conversation

ori-kron-wis commented Oct 27, 2024 • edited Loading

ori-kron-wis commented Oct 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 29, 2024 • edited Loading

Codecov Report

ori-kron-wis commented Oct 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hrovatin commented Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ori-kron-wis commented Oct 27, 2024 •

edited

Loading

codecov bot commented Oct 29, 2024 •

edited

Loading

Hrovatin commented Nov 1, 2024 •

edited

Loading