Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of Conditional (possibly latent) variables (z) in predictor from original JEPA paper. #46

Open
Sharpz7 opened this issue Mar 22, 2024 · 1 comment

Comments

@Sharpz7
Copy link

Sharpz7 commented Mar 22, 2024

Hey Folks,

I have been looking through V-JEPA and its predecessors, and I am trying to find if V-JEPA is making use of the conditional variables in the predictor, as I am struggling to tell myself from the code (I am relatively new to ML). There is limited mentions of it in the I-JEPA and V-JEPA papers, so I was wondering if it is something for future research.

Thanks,
Adam

@ccaven
Copy link

ccaven commented Mar 27, 2024

Hi @Sharpz7. It is not obvious in the paper, but JEPA does use a conditional variable in the predictor. Given some encoding, ask yourself: how to I know which targets to predict? The answer is the positional encoding applied to the mask tokens before they are passed through the predictor.

Lines 213 - 217 in jepa/src/models/predictor.py

# These are all of our positional encodings
pos_embs = self.predictor_pos_embed.repeat(B, 1, 1)
# Select the encodings corresponding to the targets we want to predict
pos_embs = apply_masks(pos_embs, masks_tgt)
# Repeat over batch dimension
pos_embs = repeat_interleave_batch(pos_embs, B, repeat=len(masks_ctxt))
# Add to mask tokens before they get passed through predictor
pred_tokens += pos_embs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants