Does the current code supports generation of text + image + image ? #25

Wonder1905 · 2024-11-24T12:33:50Z

Wonder1905
Nov 24, 2024

Hi, does the current code supports generation of text + image + image ? as see in the paper? where they are predicting text image tokens and then again text tokens

Answered by lucidrains

Nov 24, 2024

@Wonder1905 it can do anything, any number of modalities, any order

the sky is the limit

View full answer

lucidrains · 2024-11-24T14:10:42Z

lucidrains
Nov 24, 2024
Maintainer

@Wonder1905 it can do anything, any number of modalities, any order

the sky is the limit

3 replies

lucidrains Nov 24, 2024
Maintainer

the only thing missing at the moment is allowing for learned encoder / decoder flanking the transformer (unet in paper, but can also be anything)

i'm getting that done this morning, hopefully

lucidrains Nov 24, 2024
Maintainer

@Wonder1905 once the dust has settled, i'll put together a small demo doing simple arithmetic (1 + 3 = 4), but with numbers substituted with mnist images

Wonder1905 Nov 24, 2024
Author

Niceeeeeeeeeeeeeeee!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does the current code supports generation of text + image + image ? #25

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Does the current code supports generation of text + image + image ? #25

Wonder1905 Nov 24, 2024

Replies: 1 comment · 3 replies

lucidrains Nov 24, 2024 Maintainer

lucidrains Nov 24, 2024 Maintainer

lucidrains Nov 24, 2024 Maintainer

Wonder1905 Nov 24, 2024 Author

Wonder1905
Nov 24, 2024

Replies: 1 comment 3 replies

lucidrains
Nov 24, 2024
Maintainer

lucidrains Nov 24, 2024
Maintainer

lucidrains Nov 24, 2024
Maintainer

Wonder1905 Nov 24, 2024
Author