Large scale training #17

iejMac · 2023-11-22T05:16:02Z

Hey, wanted to start this comm channel as I'm looking to do a large scale training run using some of this code. I'm happy to share graphs/samples as I go along and wanted to ask a few things to start off:

Is the implementation of the original paper functionality complete?
What is the best configuration you have found? (I'm seeing talks about LFQ vs FSQ and I see code for diff transformers etc.)

As always, thanks for this! Very helpful

lucidrains · 2023-11-22T15:10:49Z

@iejMac oh hey! yea, should be complete, probably one or two more bugs left to iron out

i would stick with LFQ initially, as that was what the magvit2 paper proposed, although some people have reported better results with FSQ. i put it into one repository so we can test them against each other and find out

jpfeil · 2023-11-22T18:43:18Z

Hi @iejMac I'd like to follow along if that's okay. It would be great if you could share any changes you make to the codebase to allow for larger scale training. I'm happy to share any weights I generate to help people get started with pretrained models.

iejMac · 2023-11-26T08:38:52Z

@jpfeil will do!

Ok I think I'm mostly set up (had to port this code to a repo with a different style). My first question is - do we have some prepared configs (like what layers, how many frames, what fps etc.) which roughly correspond to some models they trained in the paper? Just so we can compare.

For reference, currently I'm using the equivalent of this:

tokenizer = VideoTokenizer(
    image_size = 128,
    init_dim = 64,
    max_dim = 512,
    layers = (
        'residual',
        'compress_space',
        ('consecutive_residual', 2),
        'compress_space',
        ('consecutive_residual', 2),
        'linear_attend_space',
        'compress_space',
        ('consecutive_residual', 2),
        'attend_space',
        'compress_time',
        ('consecutive_residual', 2),
        'compress_time',
        ('consecutive_residual', 2),
        'attend_time',
    )
)

iejMac · 2023-11-26T08:41:32Z

Oh I also noticed one thing - is there a reason we don't normalize the pixels before passing it into the model? Or did I just not catch where that's done?

lucidrains · 2023-11-26T17:43:20Z

@iejMac oh hey, what is the typical normalization for video? i think .ToTensor() here should bring it to [0, 1]?

lucidrains · 2023-11-26T18:19:19Z

@iejMac are you using the LFQ from this repo? the main claim of this paper is that this new quantization method helps them scale to more codes and better generation scores. if i had to sum up the paper, it would be, use independent binary latents + mostly convolutions

iejMac · 2023-11-26T23:38:07Z

@lucidrains ah yeah ToTensor does but your VideoDataset doesn't do that and thats what I was using to test (was getting loss ~O(1e5)).

magvit2-pytorch/magvit2_pytorch/data.py

Line 159 in b2f105b

def video_to_tensor(

iejMac · 2023-11-26T23:46:09Z

Yes I'm using LFQ from that. The main question I have about config is like can we figure out a parametrization of VideoTokenizer (given all params you added) that corresponds to like MAGVIT2-small so we can do some nice test runs.

Let's start out with - 8 frame videos at 25 FPS. Given that what are reasonable params for layers and other values in order to get decent results.

With the setup I sent above the loss curve/reconstructions look like this and it usually gets a 'nan' at some point (that's where it ended):

lucidrains · 2023-11-26T23:51:30Z

@iejMac shoot, i normalized for gifs, but not mp4s.. thank you Maciej!

lucidrains · 2023-11-26T23:54:36Z

@iejMac yup, i can get some of the hyperparameters inline with the paper's probably Tuesday (currently in the middle of another project)

iejMac · 2023-11-26T23:58:53Z

cool, was just wondering if you have something on hand. I'll try to read/play around and I'll report here if I come up with something. Also for lowish-effort video dataloading video2numpy could be a good option! It's pretty fast and does all the normal preprocessing for you. Maybe I'll make a PR for that if you're interested.

lucidrains · 2023-11-27T00:00:59Z

@iejMac would greatly appreciate it! 🙏

mudtriangle · 2023-11-27T17:46:31Z

Following in this thread since it's also related to video loading:
Shouldn't there be a frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) inside the video_to_tensor function?

lucidrains · 2023-11-27T17:49:33Z

@mudtriangle there's a BGR format? 😅

mudtriangle · 2023-11-27T17:54:32Z

Yep, and I think it's still default in cv2.

lucidrains · 2023-11-27T18:12:38Z

@mudtriangle got it, put in a quick fix here (just doing it in tensor space, as i'm not familiar with cv2 enough)

Jason3900 · 2024-03-07T02:37:56Z

@mudtriangle got it, put in a quick fix here (just doing it in tensor space, as i'm not familiar with cv2 enough)

Hello, I'm wondering if there's any progress of hyperparameter/architecture config alignment with the magvit-v2 paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large scale training #17

Large scale training #17

iejMac commented Nov 22, 2023

lucidrains commented Nov 22, 2023

jpfeil commented Nov 22, 2023

iejMac commented Nov 26, 2023 •

edited

Loading

iejMac commented Nov 26, 2023

lucidrains commented Nov 26, 2023

lucidrains commented Nov 26, 2023 •

edited

Loading

iejMac commented Nov 26, 2023

iejMac commented Nov 26, 2023 •

edited

Loading

lucidrains commented Nov 26, 2023

lucidrains commented Nov 26, 2023

iejMac commented Nov 26, 2023

lucidrains commented Nov 27, 2023

mudtriangle commented Nov 27, 2023

lucidrains commented Nov 27, 2023

mudtriangle commented Nov 27, 2023

lucidrains commented Nov 27, 2023

Jason3900 commented Mar 7, 2024

Large scale training #17

Large scale training #17

Comments

iejMac commented Nov 22, 2023

lucidrains commented Nov 22, 2023

jpfeil commented Nov 22, 2023

iejMac commented Nov 26, 2023 • edited Loading

iejMac commented Nov 26, 2023

lucidrains commented Nov 26, 2023

lucidrains commented Nov 26, 2023 • edited Loading

iejMac commented Nov 26, 2023

iejMac commented Nov 26, 2023 • edited Loading

lucidrains commented Nov 26, 2023

lucidrains commented Nov 26, 2023

iejMac commented Nov 26, 2023

lucidrains commented Nov 27, 2023

mudtriangle commented Nov 27, 2023

lucidrains commented Nov 27, 2023

mudtriangle commented Nov 27, 2023

lucidrains commented Nov 27, 2023

Jason3900 commented Mar 7, 2024

iejMac commented Nov 26, 2023 •

edited

Loading

lucidrains commented Nov 26, 2023 •

edited

Loading

iejMac commented Nov 26, 2023 •

edited

Loading