multimodal-alignment

Here are 5 public repositories matching this topic...

[Reproduce] Code for the ACL2019 paper "Multimodal Transformer for Unaligned Multimodal Language Sequences".

A generalized self-supervised training paradigm for unimodal and multimodal alignment and fusion.

Multimodal alignment of images and point clouds on the Modelnet-40-C dataset

Using a 3D Nearby Self-Attention Transformer to leverage the spatiotemporal nature of video for representation learning.

Official implementation of "Diffusion-Inspired Truncated Sampler for Text-Video Retrieval (NeurIPS 2024)"

Add a description, image, and links to the multimodal-alignment topic page so that developers can more easily learn about it.

To associate your repository with the multimodal-alignment topic, visit your repo's landing page and select "manage topics."