An introduction to attention mechanisms and the vision transformer
-
Updated
Nov 9, 2024 - Python
An introduction to attention mechanisms and the vision transformer
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Implementation of the GPT decoder block in PyTorch finetuned on Shakespeare's works 🪶
Stock Price Prediction using Attention based LSTM
Pytorch MIL pipeline for breast ultrasound cancer research
A concise but complete full-attention transformer with a set of promising experimental features from various papers
About Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones
Implementation of ChatGPT, but tailored towards primary care medicine, with the reward being able to collect patient histories in a thorough and efficient manner and come up with a reasonable differential diagnosis
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
Homemade GPT( not that good ).
FactorizePhys: Matrix Factorization for Multidimensional Attention in Remote Physiological Sensing [NeurIPS 2024]
Asymmetric Multi-Task Attention Network for Prostate Bed Segmentation in CT Images
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.com/servers/agora-999382051935506503
Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.
Diffusion attentive attribution maps for interpreting Stable Diffusion for image-to-image attention.
Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"
Engineer-To-Order (ETO) Graph Neural Scheduling (GNS) Project
Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.
Add a description, image, and links to the attention-mechanism topic page so that developers can more easily learn about it.
To associate your repository with the attention-mechanism topic, visit your repo's landing page and select "manage topics."