Skip to content

Commit

Permalink
Add papers
Browse files Browse the repository at this point in the history
  • Loading branch information
emphasis10 committed Aug 27, 2024
1 parent f059336 commit 16d413c
Show file tree
Hide file tree
Showing 8 changed files with 206 additions and 0 deletions.
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
# Paper List
## 2408
#### [ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA Datasets with Large Language Models](summaries/2408.convkgyarn.md)
#### [SWE-bench-java: A GitHub Issue Resolving Benchmark for Java](summaries/2408.14354.md)
#### [Training-free Long Video Generation with Chain of Diffusion Model Experts](summaries/2408.13423.md)
#### [TVG: A Training-free Transition Video Generation Method with Diffusion Models](summaries/2408.13413.md)
#### [LLaVaOLMoBitnet1B: Ternary LLM goes Multimodal!](summaries/2408.13402.md)
#### [Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler](summaries/2408.13359.md)
#### [MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?](summaries/2408.13257.md)
#### [LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation](summaries/2408.13252.md)
#### [CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities](summaries/2408.13239.md)
Expand Down Expand Up @@ -33,6 +38,7 @@
#### [LLM Pruning and Distillation in Practice: The Minitron Approach](summaries/2408.11796.md)
#### [Critique-out-Loud Reward Models](summaries/2408.11791.md)
#### [FocusLLM: Scaling LLM's Context by Parallel Decoding](summaries/2408.11745.md)
#### [Efficient Detection of Toxic Prompts in Large Language Models](summaries/2408.11727.md)
#### [FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting](summaries/2408.11706.md)
#### [The Vizier Gaussian Process Bandit Algorithm](summaries/2408.11527.md)
#### [TrackGo: A Flexible and Efficient Method for Controllable Video Generation](summaries/2408.11475.md)
Expand All @@ -51,6 +57,7 @@
#### [Quantum Artificial Intelligence: A Brief Survey](summaries/2408.10726.md)
#### [Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search](summaries/2408.10635.md)
#### [Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information](summaries/2408.10615.md)
#### [Language Modeling on Tabular Data: A Survey of Foundations, Techniques and Evolution](summaries/2408.10548.md)
#### [MambaEVT: Event Stream based Visual Object Tracking using State Space Model](summaries/2408.10487.md)
#### [MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model](summaries/2408.10198.md)
#### [SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views](summaries/2408.10195.md)
Expand Down
35 changes: 35 additions & 0 deletions summaries/2408.10548.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Language Modeling on Tabular Data: A Survey of Foundations, Techniques and Evolution
## TL;DR
## Summary
- [https://arxiv.org/pdf/2408.10548.pdf](https://arxiv.org/pdf/2408.10548.pdf)

### 1. ๊ฐ ์„น์…˜์˜ ์ฃผ์š” ๋‚ด์šฉ ์š”์•ฝ

#### 1.1 ์„œ๋ก 
์ด ๋…ผ๋ฌธ์€ ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ ์–ธ์–ด ๋ชจ๋ธ๋ง์˜ ๋ฐœ์ „ ๊ณผ์ •์„ ์ข…ํ•ฉ์ ์œผ๋กœ ๊ฒ€ํ† ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, ์ตœ๊ทผ ๋“ฑ์žฅํ•œ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ(LLM)์ด ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐ ๋ชจ๋ธ๋ง์— ๋ฏธ์นœ ์˜ํ–ฅ์„ ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์€ ์ฃผ๋กœ 1์ฐจ์›(1D) ๋˜๋Š” 2์ฐจ์›(2D) ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐ์— ์ค‘์ ์„ ๋‘์—ˆ์ง€๋งŒ, ์ด ๋…ผ๋ฌธ์€ ๋‘ ๊ฐ€์ง€ ์œ ํ˜•์˜ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์ฒด๊ณ„์ ์ธ ๋ฆฌ๋ทฐ๋ฅผ ์ œ๊ณตํ•˜๋ฉฐ, ๋‹ค์–‘ํ•œ ํ•˜์œ„ ์ž‘์—…๊ณผ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋‹ค๋ฃน๋‹ˆ๋‹ค.

#### 1.2 ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐ์˜ ๊ธฐ์ดˆ
์ด ์„น์…˜์—์„œ๋Š” ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐ์˜ ๊ธฐ์ดˆ ๊ฐœ๋…์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ(1D์™€ 2D), ๋ฐ์ดํ„ฐ ์œ ํ˜•(์ˆซ์ž, ๋ฒ”์ฃผํ˜•, ์ด์ง„, ํ…์ŠคํŠธ, ํƒ€์ž„์Šคํƒฌํ”„ ๋“ฑ), ํ•˜์œ„ ์ž‘์—…(ํ…Œ์ด๋ธ” ์งˆ๋ฌธ ์‘๋‹ต, ํ…Œ์ด๋ธ” ๊ฒ€์ƒ‰, ํ…Œ์ด๋ธ” ์˜๋ฏธ ๋ถ„์„ ๋“ฑ) ๋ฐ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•œ ์„ค๋ช…์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์ฃผ์š” ๋ชฉํ‘œ๋Š” ์—ฐ๊ตฌ์ž๋“ค์ด ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ๊ณผ ์ด๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•๋ก ์„ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

#### 1.3 ์ž…๋ ฅ ์ฒ˜๋ฆฌ ๊ธฐ๋ฒ•
์–ธ์–ด ๋ชจ๋ธ์„ ์œ„ํ•œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—๋Š” ๋ฐ์ดํ„ฐ ๊ฒ€์ƒ‰, ํ…Œ์ด๋ธ” ์‹œ๋ฆฌ์–ผํ™”, ๋ฌธ๋งฅ ํ†ตํ•ฉ ๋“ฑ์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ๊ฐ ๋ฐฉ๋ฒ•๋ก ์˜ ์žฅ๋‹จ์ ์„ ๋ถ„์„ํ•˜์—ฌ ์–ด๋–ค ์ƒํ™ฉ์—์„œ ์–ด๋–ค ๋ฐฉ๋ฒ•์ด ํšจ์œจ์ ์ธ์ง€๋ฅผ ๊ฒ€ํ† ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ฐ„๋‹จํ•œ ํ…์ŠคํŠธ ํ…œํ”Œ๋ฆฟ์ด ๋งŽ์€ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค.

#### 1.4 ์ค‘๊ฐ„ ๋ชจ๋“ˆ
์ค‘๊ฐ„ ๋ชจ๋“ˆ์€ ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜ ๋‚ด์—์„œ ํ…Œ์ด๋ธ” ๋„๋ฉ”์ธ์— ์ ์‘ํ•˜๊ธฐ ์œ„ํ•ด ์–ด๋–ป๊ฒŒ ์ˆ˜์ •๋  ์ˆ˜ ์žˆ๋Š”์ง€๋ฅผ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—๋Š” ์œ„์น˜ ์ธ์ฝ”๋”ฉ๊ณผ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ชจ๋“ˆ์„ ํ†ตํ•ด ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

#### 1.5 ์–ธ์–ด ๋ชจ๋ธ๋ง ๊ธฐ์ˆ 
ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐ๋ฅผ ์œ„ํ•œ ์–ธ์–ด ๋ชจ๋ธ๋ง ๊ธฐ์ˆ ์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—๋Š” ์‚ฌ์ „ ํ•™์Šต๋œ ์–ธ์–ด ๋ชจ๋ธ(PLM)๊ณผ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ(LLM)์„ ํ™œ์šฉํ•œ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋ก ์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, GPT-3์™€ ๊ฐ™์€ ๋ชจ๋ธ์€ ์ตœ์†Œํ•œ์˜ ์ถ”๊ฐ€ ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ ๋ณต์žกํ•œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐœ์ „์€ ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐ ๋ชจ๋ธ๋ง์˜ ํšจ์œจ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.

#### 1.6 ํ•˜์œ„ ์ž‘์—…
ํ…Œ์ด๋ธ” ์งˆ์˜ ์‘๋‹ต(TQA), ํ…Œ์ด๋ธ” ๊ฒ€์ƒ‰(TR), ํ…Œ์ด๋ธ” ์˜๋ฏธ ๋ถ„์„(TSP), ํ…Œ์ด๋ธ” ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์˜ˆ์ธก(TMP) ๋“ฑ์˜ ๋‹ค์–‘ํ•œ ํ•˜์œ„ ์ž‘์—…์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ๊ฐ ์ž‘์—…์— ๋Œ€ํ•ด ํ˜„์žฌ ์‚ฌ์šฉ๋˜๋Š” ๋ชจ๋ธ๊ณผ ๊ธฐ๋ฒ•์„ ์„ค๋ช…ํ•˜๊ณ  ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, TaBERT๋Š” ํ…Œ์ด๋ธ” ์งˆ๋ฌธ ์‘๋‹ต์— ๋งค์šฐ ํšจ๊ณผ์ ์ธ ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค.

#### 1.7 ๊ฒฐ๋ก  ๋ฐ ๋ฏธ๋ž˜ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ
๊ฒฐ๋ก  ์„น์…˜์—์„œ๋Š” ๋…ผ๋ฌธ์˜ ์ฃผ์š” ๋ฐœ๊ฒฌ์„ ์š”์•ฝํ•˜๊ณ , ๋ฏธ๋ž˜ ์—ฐ๊ตฌ์˜ ๋ฐฉํ–ฅ์„ฑ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. LLM์„ ์ด์šฉํ•œ ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐ ๋ชจ๋ธ๋ง์˜ ์ž ์žฌ๋ ฅ๊ณผ ํ•ด๊ฒฐํ•ด์•ผ ํ•  ๊ณผ์ œ๋ฅผ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ณ„์‚ฐ ํšจ์œจ์„ฑ, ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ, ํŽธํ–ฅ์„ฑ, ๋ฐ์ดํ„ฐ ์œ ํ˜• ๋“ฑ ๋‹ค์–‘ํ•œ ๋„์ „ ๊ณผ์ œ๊ฐ€ ๋‚จ์•„ ์žˆ์Œ์„ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค.

---

### 2. ์ „์ฒด ์š”์•ฝ

์ด ๋…ผ๋ฌธ์€ ๋‹ค์–‘ํ•œ ์–ธ์–ด ๋ชจ๋ธ๋“ค์ด ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ๊ทธ ๋ฐœ์ „ ๊ณผ์ •์„ ์ข…ํ•ฉ์ ์œผ๋กœ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ์ดˆ๊ธฐ์—๋Š” 1์ฐจ์›(1D) ๋˜๋Š” 2์ฐจ์›(2D) ๋ฐ์ดํ„ฐ ๊ฐ๊ฐ์— ์ดˆ์ ์„ ๋งž์ถ”์—ˆ์œผ๋‚˜, ์ตœ๊ทผ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ(LLM)์˜ ๋“ฑ์žฅ์œผ๋กœ ๋‹ค์ฐจ์› ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์—ฐ๊ตฌ๊ฐ€ ๋ฐœ์ „ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์€ ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐ์˜ ๊ธฐ๋ณธ ๊ฐœ๋…, ์ž…๋ ฅ ์ฒ˜๋ฆฌ ๊ธฐ๋ฒ•, ์ค‘๊ฐ„ ๋ชจ๋“ˆ, ์–ธ์–ด ๋ชจ๋ธ๋ง ๊ธฐ์ˆ  ๋“ฑ์„ ์ข…ํ•ฉ์ ์œผ๋กœ ๋‹ค๋ฃจ๋ฉฐ, ๊ฐ ๊ธฐ์ˆ ์˜ ์žฅ๋‹จ์ ์„ ๋น„๊ต ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, GPT-3์™€ ๊ฐ™์€ ์ตœ๊ทผ์˜ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ์€ ์ตœ์†Œํ•œ์˜ ์ถ”๊ฐ€ ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ ๋ณต์žกํ•œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์–ด, ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ์˜ ํšจ์œจ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ค๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฏธ๋ž˜ ์—ฐ๊ตฌ๋Š” ๊ณ„์‚ฐ ํšจ์œจ์„ฑ, ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ, ํŽธํ–ฅ์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋‚˜์•„๊ฐ€์•ผ ํ•จ์„ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๋‚ด์šฉ์„ ํ†ตํ•ด, ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐ ๋ชจ๋ธ๋ง ๋ถ„์•ผ์—์„œ์˜ ์ค‘์š”ํ•œ ์ง„์ „๊ณผ ์•ž์œผ๋กœ์˜ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์— ๋Œ€ํ•œ ๋ช…ํ™•ํ•œ ๊ทธ๋ฆผ์„ ๊ทธ๋ฆด ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
5 changes: 5 additions & 0 deletions summaries/2408.11727.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Efficient Detection of Toxic Prompts in Large Language Models
## TL;DR
## Summary
- [https://arxiv.org/pdf/2408.11727.pdf](https://arxiv.org/pdf/2408.11727.pdf)

33 changes: 33 additions & 0 deletions summaries/2408.13359.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler
## TL;DR
## Summary
- [https://arxiv.org/pdf/2408.13359.pdf](https://arxiv.org/pdf/2408.13359.pdf)

### ๋…ผ๋ฌธ ์š”์•ฝ(Korean Summary)

#### 1. Introduction (์†Œ๊ฐœ)
- **์š”์•ฝ**: ๋…ผ๋ฌธ์€ ์ดˆ๊ฑฐ๋Œ€ ์–ธ์–ด ๋ชจ๋ธ(LLM)์˜ ์‚ฌ์ „ ํ•™์Šต์„ ์œ„ํ•œ ์ตœ์ ์˜ ํ•™์Šต๋ฅ ์„ ์ฐพ๋Š” ์–ด๋ ค์›€์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ์ฝ”์‚ฌ์ธ ํ•™์Šต๋ฅ  ์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ ์—ฌ๋Ÿฌ ๋ชจ๋ธ์—์„œ ํšจ๊ณผ์ ์ด๋‚˜, ๋ฏธ๋ฆฌ ์ •์˜๋œ ํ•™์Šต ๋‹จ๊ณ„ ์ˆ˜๊ฐ€ ํ•„์š”ํ•˜์—ฌ ์ค‘๊ฐ„ ์ฒดํฌํฌ์ธํŠธ์™€ ์—ฐ์† ํ•™์Šต์— ์žˆ์–ด ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค๊ณ  ์ง€์ ํ•ฉ๋‹ˆ๋‹ค.
- **์ฃผ์š” ๊ธฐ์—ฌ**: ์ตœ์  ํ•™์Šต๋ฅ ์„ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ํ•™์Šต๋ฅ  ์Šค์ผ€์ค„๋Ÿฌ์ธ Power ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์ œ์•ˆํ•˜์—ฌ, ๋ฐฐ์น˜ ํฌ๊ธฐ์™€ ํ† ํฐ ์ˆ˜์— ๋ฌด๊ด€ํ•˜๊ฒŒ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

#### 2. Background (๋ฐฐ๊ฒฝ)
- **์š”์•ฝ**: Maximum Update Parametrization (ยตP)์„ ์ด์šฉํ•˜์—ฌ ์ž‘์€ ํ”„๋ก์‹œ ๋ชจ๋ธ์—์„œ ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ๋กœ์˜ ํ•™์Šต๋ฅ  ์ „์ด ์—ฐ๊ตฌ๋ฅผ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. Warmup-Stable-Decay (WSD) ์Šค์ผ€์ค„๋Ÿฌ์˜ ์„ธ ๋‹จ๊ณ„(์›œ์—…, ์•ˆ์ •, ๊ฐ์‡ )๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.
- **์ฃผ์š” ๊ธฐ์—ฌ**: ยตP๊ฐ€ ๋ชจ๋ธ ๊ฐ„ ํ•™์Šต๋ฅ  ์ „์ด์— ํšจ์œจ์ ์ด๋ฉฐ, WSD ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ํ†ตํ•ด ์•ˆ์ •์ ์ธ ํ•™์Šต์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

#### 3. Optimal Learning Rate Search (์ตœ์  ํ•™์Šต๋ฅ  ํƒ์ƒ‰)
- **์š”์•ฝ**: ๋‹ค์–‘ํ•œ ๋ฐฐ์น˜ ํฌ๊ธฐ์™€ ํ† ํฐ ์ˆ˜์— ๋Œ€ํ•œ ์ตœ์  ํ•™์Šต๋ฅ ์˜ ๊ด€๊ณ„์„ฑ์„ ์—ฐ๊ตฌํ•˜์˜€๊ณ , ์ตœ์  ํ•™์Šต๋ฅ ์ด ํ† ํฐ ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ๊ฐ์†Œํ•จ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ํ•™์Šต๋ฅ ๊ณผ ๋ฐฐ์น˜ ํฌ๊ธฐ, ํ† ํฐ ์ˆ˜์˜ ๊ด€๊ณ„๋ฅผ ๋ชจํ˜•ํ™”ํ–ˆ์Šต๋‹ˆ๋‹ค.
- **์ฃผ์š” ๊ธฐ์—ฌ**: ์ตœ์  ํ•™์Šต๋ฅ ์ด ๋ฐฐ์น˜ ํฌ๊ธฐ์™€ ํ† ํฐ ์ˆ˜์˜ ๊ฑฐ๋“ญ์ œ๊ณฑ ๊ด€๊ณ„์— ์žˆ๋‹ค๋Š” ์ ์„ ๋ฐํžˆ๋ฉฐ, ์ด๋Š” ยตP๋ฅผ ์ด์šฉํ•ด ๋‹ค์–‘ํ•œ ๋ชจ๋ธ ํฌ๊ธฐ์—์„œ ์ „์ด ๊ฐ€๋Šฅํ•จ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

#### 4. Power Scheduler (ํŒŒ์›Œ ์Šค์ผ€์ค„๋Ÿฌ)
- **์š”์•ฝ**: PowerLR ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์ œ์•ˆํ•˜์—ฌ ๋ฐฐ์น˜ ํฌ๊ธฐ์™€ ํ† ํฐ ์ˆ˜์— ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š๊ณ  ๋‹ค์–‘ํ•œ ์ƒํ™ฉ์—์„œ ์ตœ์  ํ•™์Šต๋ฅ ์„ ์ „์ดํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํ•™์Šต ๋‹จ๊ณ„๋ฅผ ๋ฏธ๋ฆฌ ์ •์˜ํ•  ํ•„์š” ์—†์ด ํ•™์Šต๋ฅ ์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ๋Š” ์žฅ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
- **์ฃผ์š” ๊ธฐ์—ฌ**: ๋‹ค์–‘ํ•œ ์ƒํ™ฉ์—์„œ Power ์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ ๊ธฐ์กด์˜ WSD, ์ฝ”์‚ฌ์ธ ์Šค์ผ€์ค„๋Ÿฌ์™€ ๋น„๊ตํ•˜์—ฌ ์šฐ์ˆ˜ํ•˜๊ฑฐ๋‚˜ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ž„์„ ์‹คํ—˜์ ์œผ๋กœ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

#### 5. Pre-Training Experiments (์‚ฌ์ „ ํ•™์Šต ์‹คํ—˜)
- **์š”์•ฝ**: 1B ๋ฐ 3B ๋งค๊ฐœ๋ณ€์ˆ˜ ๋ชจ๋ธ์„ ๋‹ค์–‘ํ•œ ํ•™์Šต๋ฅ  ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์ด์šฉํ•ด ์‹คํ—˜ํ•˜์˜€๊ณ , Power ์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ ์—ฌ๋Ÿฌ ์–ธ์–ด ๋ชจ๋ธ๋ง ๋ฐ ๋‹ค์šด์ŠคํŠธ๋ฆผ ์ž‘์—…์—์„œ ๊พธ์ค€ํžˆ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์ž„์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.
- **์ฃผ์š” ๊ธฐ์—ฌ**: Power ์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ ๋‹ค์–‘ํ•œ ์กฐ๊ฑด์—์„œ๋„ ์ตœ์ ์˜ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์„ ์ž…์ฆํ•˜๋ฉฐ, ์ด๋Š” ํฐ ๋ชจ๋ธ์—์„œ๋„ ๋™์ผํ•œ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

#### 6. Conclusion (๊ฒฐ๋ก )
- **์š”์•ฝ**: ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด ํ•™์Šต๋ฅ , ๋ฐฐ์น˜ ํฌ๊ธฐ, ํ† ํฐ ์ˆ˜ ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ์—ฐ๊ตฌํ•˜์˜€๊ณ , ์ƒˆ๋กœ์šด Power ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์ œ์•ˆํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์ƒํ™ฉ์—์„œ ์ตœ๊ณ ์˜ ์„ฑ๋Šฅ์„ ๋ณด์ž„์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.
- **์ฃผ์š” ๊ธฐ์—ฌ**: Power ์Šค์ผ€์ค„๋Ÿฌ๋Š” ์•ˆ์ •์ ์ธ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ๋ฐฐ์น˜ ํฌ๊ธฐ์™€ ํ† ํฐ ์ˆ˜์— ๋…๋ฆฝ์ ์ธ ์ตœ์  ํ•™์Šต๋ฅ ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

### ์ „์ฒด ์š”์•ฝ (Overall Summary)
์ด ๋…ผ๋ฌธ์€ ์ดˆ๊ฑฐ๋Œ€ ์–ธ์–ด ๋ชจ๋ธ์˜ ์‚ฌ์ „ ํ•™์Šต์„ ์œ„ํ•œ ์ตœ์ ์˜ ํ•™์Šต๋ฅ ์„ ์ฐพ๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ƒˆ๋กœ์šด ํ•™์Šต๋ฅ  ์Šค์ผ€์ค„๋Ÿฌ์ธ Power ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ํ•™์Šต๋ฅ  ์Šค์ผ€์ค„๋Ÿฌ์ธ ์ฝ”์‚ฌ์ธ๊ณผ WSD๊ฐ€ ๊ฐ€์ง„ ๋ฌธ์ œ์ ์„ ๊ฐœ์„ ํ•˜์—ฌ, ํ•™์Šต ๋‹จ๊ณ„ ์ˆ˜๋ฅผ ๋ฏธ๋ฆฌ ์ •์˜ํ•  ํ•„์š” ์—†์ด, ๋‹ค์–‘ํ•œ ๋ฐฐ์น˜ ํฌ๊ธฐ์™€ ํ† ํฐ ์ˆ˜์— ๋ฌด๊ด€ํ•˜๊ฒŒ ์ตœ์ ์˜ ํ•™์Šต๋ฅ ์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฐ๊ตฌ๋Š” ยตP๋ฅผ ์ด์šฉํ•œ ํ•™์Šต๋ฅ  ์ „์ด ์‹คํ—˜์„ ํ†ตํ•ด, Power ์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ ๋‹ค์–‘ํ•œ ๋ชจ๋ธ ํฌ๊ธฐ์—์„œ๋„ ์•ˆ์ •์ ์ธ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•จ์„ ์ž…์ฆํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, Power ์Šค์ผ€์ค„๋Ÿฌ๋Š” ๋‹ค์–‘ํ•œ ์–ธ์–ด ๋ชจ๋ธ๋ง ๋ฐ ๋‹ค์šด์ŠคํŠธ๋ฆผ ์ž‘์—…์—์„œ ๊ธฐ์กด ์Šค์ผ€์ค„๋Ÿฌ์™€ ๋น„๊ตํ•˜์—ฌ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ดˆ๊ฑฐ๋Œ€ ์–ธ์–ด ๋ชจ๋ธ์˜ ํ•™์Šต ํšจ์œจ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.
33 changes: 33 additions & 0 deletions summaries/2408.13402.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# LLaVaOLMoBitnet1B: Ternary LLM goes Multimodal!
## TL;DR
## Summary
- [https://arxiv.org/pdf/2408.13402.pdf](https://arxiv.org/pdf/2408.13402.pdf)

### 1. ๊ฐ ์„น์…˜ ์š”์•ฝ

#### I. ์„œ๋ก 
์ด ๋…ผ๋ฌธ์€ LLaVa, NousResearch์˜ ์ž‘์—…์„ ๊ธฐ๋ฐ˜์œผ๋กœ, ์ตœ์ดˆ์˜ ํ…์ŠคํŠธ ๋ฐ ์ด๋ฏธ์ง€๋ฅผ ๋ชจ๋‘ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ์‚ผ์ง„ ๋‹ค์ค‘๋ชจ๋‹ฌ ๋Œ€ํ˜•์–ธ์–ด๋ชจ๋ธ(LLM)์„ ๊ตฌ์ถ•ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ฃผ๋œ ๊ธฐ์—ฌ๋Š” ๋ชจ๋ธ๊ณผ ํ•จ๊ป˜ ๊ฐ€์ค‘์น˜ ๋ฐ ํ›ˆ๋ จ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์˜คํ”ˆ์†Œ์Šค๋กœ ์ œ๊ณตํ•˜๊ณ , ์‚ผ์ง„ ๋ชจ๋ธ์˜ ์ฃผ๋ฅ˜ํ™”๋ฅผ ์œ„ํ•œ ๋„์ „๊ณผ ๊ธฐํšŒ๋ฅผ ๊ฐ•์กฐํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

#### II. ๊ด€๋ จ ์—ฐ๊ตฌ
Flamingo๋Š” ๋‹ค์ค‘๋ชจ๋‹ฌ ๋ชจ๋ธ์˜ ๊ธ‰์†ํ•œ ๋ฐœ์ „์˜ ์‹œ์ž‘์ ์„ ๋งˆ๋ จํ–ˆ์œผ๋ฉฐ, ๊ทธ ํ›„ ์—ฌ๋Ÿฌ ํŒŒ์ƒ ๋ชจ๋ธ๋“ค์ด ๊ฐœ๋ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. LLaVa๋Š” ํ…์ŠคํŠธ ์ „์šฉ GPT๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์ค‘๋ชจ๋‹ฌ ๋ฐ์ดํ„ฐ์„ธํŠธ๋ฅผ ํ™•์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜๋ฉฐ ์˜คํ”ˆ์†Œ์Šค ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๋„์ž…ํ–ˆ์Šต๋‹ˆ๋‹ค. ์‚ผ์ง„ ๋ชจ๋ธ์€ ๋งค์šฐ ๋‚ฎ์€ ์ •๋ฐ€๋„๋กœ ๊ฐ€์ค‘์น˜๋ฅผ ์–‘์žํ™”ํ•˜๋Š” BitNetb1.58์˜ ๋ฐฉ๋ฒ•์„ ๋”ฐ๋ž์œผ๋ฉฐ, ์„ฑ๋Šฅ ์ €ํ•˜๋Š” ์ตœ์†Œํ™”ํ•˜๋ฉด์„œ ์ง€์—ฐ ์‹œ๊ฐ„์„ ์ตœ๋Œ€ 4๋ฐฐ ์ค„์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด ๋ฐฉ๋ฒ•์€ ์•„์ง๊นŒ์ง€ ๋งŽ์€ ๋ฐ์ดํ„ฐ์™€ ๊ณ„์‚ฐ ์ž์›์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

#### III. ๋ชจ๋ธ ์„ธ๋ถ€์‚ฌํ•ญ
LLaVaOLMoBitNet1B ๋ชจ๋ธ์€ CLIP ๋น„์ „ ์ธ์ฝ”๋”, MLP ์—ฐ๊ฒฐ๊ธฐ, ์‚ผ์ง„ LLM์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€๋Š” ๋น„์ „ ์ธ์ฝ”๋”๋ฅผ ํ†ตํ•ด ์ฒ˜๋ฆฌ๋˜๊ณ  ์ดํ›„ MLP๋ฅผ ํ†ตํ•ด LLM ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์œผ๋กœ ๋‹ค์‹œ ํˆฌ์˜๋ฉ๋‹ˆ๋‹ค. ์ตœ์ข…์ ์œผ๋กœ, ํ…์ŠคํŠธ ์ฟผ๋ฆฌ๊ฐ€ ์‚ผ์ง„ LLM์„ ํ†ตํ•ด ์ฒ˜๋ฆฌ๋˜์–ด ์‘๋‹ต์ด ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.

#### III.B ํ›ˆ๋ จ ์„ธ๋ถ€์‚ฌํ•ญ
ํ›ˆ๋ จ์€ ๋‘ ๋‹จ๊ณ„๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค: (1) ํŠน์„ฑ ์ •๋ ฌ์„ ์œ„ํ•œ ์‚ฌ์ „ ํ›ˆ๋ จ ๋‹จ๊ณ„, (2) ์ข…๋‹จ ๊ฐ„์˜ ๋ช…๋ น ๋ฏธ์„ธ ์กฐ์ • ๋‹จ๊ณ„. ๊ฐ ๋‹จ๊ณ„์—์„œ๋Š” LLaVa1.5 ๋…ผ๋ฌธ์— ์ œ์‹œ๋œ ๋ฐฉ๋ฒ•๋ก ์„ ๋”ฐ๋ž์œผ๋ฉฐ, DeepSpeed ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•ด ๋‹ค์ค‘ GPU ํ›ˆ๋ จ์„ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.

#### IV. ๊ฒฐ๊ณผ
์ตœ์ข… ๋ชจ๋ธ LLaVaOLMoBitNet1B์˜ ์„ฑ๋Šฅ์„ ์งˆ์  ๋ฐ ์–‘์ ์œผ๋กœ ํ‰๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. ์งˆ์  ํ‰๊ฐ€์—์„œ๋Š” ์ฃผ๋กœ ์˜ฌ๋ฐ”๋ฅธ ์‘๋‹ต์„ ์ƒ์„ฑํ•˜์˜€์œผ๋‚˜, ์ผ๋ถ€ ์˜ค์ฐจ๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์–‘์  ํ‰๊ฐ€์—์„œ๋Š” ๋ฒค์น˜๋งˆํฌ ํ…Œ์ŠคํŠธ์—์„œ ๋น„์Šทํ•œ ํฌ๊ธฐ์˜ ๋ชจ๋ธ๋“ค์— ๋น„ํ•ด ๋‚ฎ์€ ์ ์ˆ˜๋ฅผ ๊ธฐ๋กํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋ชจ๋ธ์ด ๋‹ค๋ฅธ ์‚ผ์ง„ ๋˜๋Š” ์™„์ „ ์ •๋ฐ€ ๋ชจ๋ธ๋“ค์— ๋น„ํ•ด ์ƒ๋Œ€์ ์œผ๋กœ ์ ์€ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด ํ›ˆ๋ จ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

#### V. ๋ฏธ๋ž˜ ์ž‘์—…
ํ–ฅํ›„ ์—ฐ๊ตฌ์—์„œ๋Š” ๊ณต๊ฐœ ๊ฐ€์ค‘์น˜ ์‚ฌ์ „ ํ›ˆ๋ จ ๋ชจ๋ธ์„ ์‚ผ์ง„ ๋„๋ฉ”์ธ์œผ๋กœ ์–‘์žํ™”ํ•˜๋Š” ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ•์„ ์ฐพ๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ์‚ผ์ง„ ๋ชจ๋ธ์€ ๊ธฐ์กด ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ์˜ ๋ฌธ์ œ์ธ ํŽธํ–ฅ์„ฑ, ๋ถˆํ™•์‹ค์„ฑ, ํ™˜๊ฐ ๋“ฑ์˜ ๋ฌธ์ œ๋ฅผ ์—ฌ์ „ํžˆ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜๋“œ์›จ์–ด ์ธก๋ฉด์—์„œ๋„ ์‚ผ์ง„ ์—ฐ์‚ฐ์„ ํšจ์œจ์ ์œผ๋กœ ๋งคํ•‘ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•œ ๊ณผ์ œ์ž…๋‹ˆ๋‹ค.

#### VI. ๊ฐ์‚ฌ์˜ ๊ธ€
์ด ๋…ผ๋ฌธ์€ LLaVa ํ”„๋ ˆ์ž„์›Œํฌ, BitNetb1.58, NousResearch์˜ ์ง€์›์„ ๋ฐ›์•„ ์ž‘์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

### 2. ์ด๊ด„ ์š”์•ฝ

์ด ๋…ผ๋ฌธ์€ ์ตœ์ดˆ์˜ ํ…์ŠคํŠธ ๋ฐ ์ด๋ฏธ์ง€๋ฅผ ๋ชจ๋‘ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ์‚ผ์ง„ ๋‹ค์ค‘๋ชจ๋‹ฌ ๋Œ€ํ˜•์–ธ์–ด๋ชจ๋ธ(LLM)์ธ LLaVaOLMoBitNet1B๋ฅผ ์ œ์•ˆํ•˜๋ฉฐ, ์ด๋ฅผ ์˜คํ”ˆ์†Œ์Šค๋กœ ์ œ๊ณตํ•˜์—ฌ ์—ฐ๊ตฌ์ž๋“ค์ด ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. LLaVa ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌ์ถ•๋˜์—ˆ์œผ๋ฉฐ, ๋ชจ๋ธ์€ CLIP ์ด๋ฏธ์ง€ ์ธ์ฝ”๋”, MLP ์—ฐ๊ฒฐ๊ธฐ, ์‚ผ์ง„ LLM์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ํ›ˆ๋ จ์€ ๋‘ ๋‹จ๊ณ„๋กœ ์ง„ํ–‰๋˜๋ฉฐ, ์ตœ์ข… ๋ชจ๋ธ์€ ์ฃผ๋กœ ์˜ฌ๋ฐ”๋ฅธ ์‘๋‹ต์„ ์ƒ์„ฑํ•˜์ง€๋งŒ, ์ผ๋ถ€ ์˜ค์ฐจ๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ๋ฒค์น˜๋งˆํฌ ํ‰๊ฐ€์—์„œ๋Š” ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค์— ๋น„ํ•ด ๋‚ฎ์€ ์„ฑ๋Šฅ์„ ๊ธฐ๋กํ–ˆ์ง€๋งŒ, ์ด๋Š” ํ›ˆ๋ จ๋œ ๋ฐ์ดํ„ฐ ์–‘์ด ์ƒ๋Œ€์ ์œผ๋กœ ์ ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋ฏธ๋ž˜ ์—ฐ๊ตฌ์—์„œ๋Š” ์‚ผ์ง„ ๋„๋ฉ”์ธ์œผ๋กœ์˜ ์–‘์žํ™”๋ฅผ ํ†ตํ•œ ์„ฑ๋Šฅ ๊ฐœ์„ , ํŽธํ–ฅ์„ฑ ๋ฌธ์ œ ํ•ด๊ฒฐ, ํ•˜๋“œ์›จ์–ด ํšจ์œจ์„ฑ ํ–ฅ์ƒ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

์œ„ ์š”์•ฝ์„ ๋ฐ”ํƒ•์œผ๋กœ ๋ฐœํ‘œ ์ž๋ฃŒ๋ฅผ ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
27 changes: 27 additions & 0 deletions summaries/2408.13413.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# TVG: A Training-free Transition Video Generation Method with Diffusion Models
## TL;DR
## Summary
- [https://arxiv.org/pdf/2408.13413.pdf](https://arxiv.org/pdf/2408.13413.pdf)

### 1. ๊ฐ ์„น์…˜ ์š”์•ฝ ๋ฐ ์ฃผ์š” ๊ธฐ์—ฌ ๋‚ด์šฉ

#### Abstract
์ด ๋…ผ๋ฌธ์—์„œ๋Š” ๋น„๋””์˜ค ์ „ํ™˜ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜๋กœ, Gaussian Process Regression(GPR)๊ณผ ๋น„๋””์˜ค ์ˆ˜์ค€์˜ ํ™•์‚ฐ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ ์—†์ด๋„ ๋ถ€๋“œ๋Ÿฝ๊ณ  ๋™์ ์ธ ์ „ํ™˜ ๋น„๋””์˜ค๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ, ์‹œ๊ฐ„์  ์ œ์–ด๋ฅผ ๊ฐ•ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์กฐ๊ฑด๋ถ€ ์ปจํŠธ๋กค๊ณผ Frequency-aware Bidirectional Fusion(FBiF) ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋„์ž…ํ•˜์—ฌ ์ „ํ™˜ ๋น„๋””์˜ค์˜ ์‹ ๋ขฐ์„ฑ์„ ๋†’์˜€์Šต๋‹ˆ๋‹ค.

#### Introduction
์ „ํ†ต์ ์ธ ๋น„๋””์˜ค ์ „ํ™˜ ๊ธฐ์ˆ ์€ ์˜ˆ์ˆ ์  ๋งค๋ ฅ์ด ๋ถ€์กฑํ•˜๊ณ , ์ „๋ฌธ ๊ธฐ์ˆ ์ด ํ•„์š”ํ•˜๋ฉฐ, ์‹œ์ฒญ์ž๋ฅผ ์ถฉ๋ถ„ํžˆ ๋ชฐ์ž…์‹œํ‚ค์ง€ ๋ชปํ•˜๋Š” ํ•œ๊ณ„๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ™•์‚ฐ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๋น„๋””์˜ค ์ƒ์„ฑ์—์„œ ์ตœ๊ทผ์—๋Š” ์ด๋ฏธ์ง€์™€ ๋น„๋””์˜ค ๊ฐ„์˜ ์ค‘๊ฐ„ ํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•˜์—ฌ ์ „ํ™˜ ๋น„๋””์˜ค๋ฅผ ๋งŒ๋“ค์–ด๋‚ด์ง€๋งŒ, ์ด๋Š” ํ”„๋ ˆ์ž„ ๊ฐ„ ๊ด€๊ณ„ ๋ชจ๋ธ๋ง์ด ๋ถˆ์ถฉ๋ถ„ํ•˜๊ณ  ๋‚ด์šฉ์ด ๊ฐ‘์ž‘์Šค๋Ÿฝ๊ฒŒ ๋ฐ”๋€Œ๋Š” ๋ฌธ์ œ๊ฐ€ ์—ฌ์ „ํžˆ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

#### Preliminary
ํ™•์‚ฐ ๋ชจ๋ธ์€ ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ์— ์ ์ง„์ ์œผ๋กœ ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ  ์ด๋ฅผ ์—ญ์œผ๋กœ ์˜ˆ์ธกํ•˜์—ฌ ์›๋ณธ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณต์›ํ•˜๋Š” ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋‘ ๊ฐœ์˜ ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ ์ฃผ์–ด์ง„ ์ƒํƒœ์—์„œ ์ „ํ™˜ ๋น„๋””์˜ค๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์€ ์กฐ๊ฑด๋ถ€ ๋ถ„ํฌ๋ฅผ ๋ชจ๋ธ๋งํ•˜์—ฌ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค. ํŠนํžˆ Latent Diffusion Models(LDMs)์ด ์‚ฌ์šฉ๋˜๋ฉฐ, ์ด๋Š” ๊ณ„์‚ฐ ๋ณต์žก์„ฑ์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ์ž ์žฌ ๊ณต๊ฐ„์—์„œ ์กฐ๊ฑด๋ถ€ ๋ถ„ํฌ๋ฅผ ํ”ผํŒ…ํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

#### Method
์ด ๋ฐฉ๋ฒ•๋ก ์€ ์ฃผ๋กœ DynamiCrafter ๋ชจ๋ธ์— ๊ธฐ์ดˆํ•˜์—ฌ, ์„ธ ๊ฐ€์ง€ ์ฃผ์š” ์ตœ์ ํ™”๋ฅผ ๊ฑฐ์นฉ๋‹ˆ๋‹ค. ์ฒซ์งธ, ์กฐ๊ฑด๋ถ€ ์ด๋ฏธ์ง€์™€ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ •์ œํ•˜์—ฌ ๋น„๋””์˜ค ์ƒ์„ฑ ๊ณผ์ •์„ ์ œ์–ดํ•˜๊ณ  ์กฐ๊ฑด๋ถ€ ์ด๋ฏธ์ง€์˜ ๋ˆ„์ถœ์„ ์ค„์ž…๋‹ˆ๋‹ค. ๋‘˜์งธ, ํ”„๋ ˆ์ž„ ๊ฐ„ ์ผ๊ด€์„ฑ์„ ๊ฐ•ํ™”ํ•˜๊ณ  ๊ฐ‘์ž‘์Šค๋Ÿฌ์šด ์ „ํ™˜์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด Gaussian Process Regression(GPR)์„ ์ž ์žฌ ๊ณต๊ฐ„์— ํ†ตํ•ฉํ•ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์ฃผํŒŒ์ˆ˜ ๋„๋ฉ”์ธ ํŠน์ง• ์œตํ•ฉ์„ ํ†ตํ•œ ์–‘๋ฐฉํ–ฅ ์ƒ์„ฑ ๊ธฐ๋Šฅ์„ ๊ฒฐํ•ฉํ•˜๋Š” Frequence-aware Bidirectional Fusion(FBiF) ๊ตฌ์กฐ๋ฅผ ๋„์ž…ํ•ฉ๋‹ˆ๋‹ค.

#### Experiments
MorphBench์™€ TC-Bench-I2V ๋ฐ์ดํ„ฐ์…‹์„ ์ด์šฉํ•ด ์‹คํ—˜ํ•˜์˜€์œผ๋ฉฐ, ์ฃผ๋กœ ํ”„๋ ˆ์ž„ ๊ฐ„ ์ผ๊ด€์„ฑ ๋ฐ ๋ถ€๋“œ๋Ÿฌ์šด ์ „ํ™˜์„ ํ‰๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ๋™์  ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ๋ˆˆ์— ๋„๊ฒŒ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ด๋ฉฐ, ์ƒ์—… ์ œํ’ˆ๊ณผ์˜ ๋น„๊ต์—์„œ๋„ ๋ณด๋‹ค ์ž์—ฐ์Šค๋Ÿฝ๊ณ  ๋ถ€๋“œ๋Ÿฌ์šด ์ „ํ™˜ ๋น„๋””์˜ค๋ฅผ ์ƒ์„ฑํ•˜๋Š”๋ฐ ์„ฑ๊ณตํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ธ๊ฐ„ ํ‰๊ฐ€์—์„œ๋„ ๋†’์€ ์„ ํ˜ธ๋„๋ฅผ ๊ธฐ๋กํ–ˆ์Šต๋‹ˆ๋‹ค.

#### Conclusion
์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ถ”๊ฐ€ ํ›ˆ๋ จ ์—†์ด๋„ ํšจ์œจ์ ์œผ๋กœ ๋น„๋””์˜ค ์ „ํ™˜์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. Gaussian Process Regression(GPR)๊ณผ Frequency-aware Bidirectional Fusion(FBiF)์„ ํ†ตํ•ด ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ๊ธฐ์กด ๋ชจ๋ธ๋“ค๋ณด๋‹ค ์ผ๊ด€์„ฑ ์žˆ๊ณ  ๋ถ€๋“œ๋Ÿฌ์šด ์ „ํ™˜ ๋น„๋””์˜ค๋ฅผ ์ƒ์„ฑํ•˜๋Š”๋ฐ ์„ฑ๊ณตํ–ˆ์œผ๋ฉฐ, ํ–ฅํ›„์—๋Š” ๋” ๊ธด ๋น„๋””์˜ค ์‹œํ€€์Šค๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•œ ์ถ”๊ฐ€์ ์ธ ์—ฐ๊ตฌ๋ฅผ ๊ณ„ํšํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

### 2. ์ „์ฒด ์š”์•ฝ
์ด ๋…ผ๋ฌธ์€ ๋น„๋””์˜ค ์ „ํ™˜์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. Gaussian Process Regression(GPR)๊ณผ ๋น„๋””์˜ค ์ˆ˜์ค€ ํ™•์‚ฐ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ ์—†์ด๋„ ๋ถ€๋“œ๋Ÿฝ๊ณ  ๋™์ ์ธ ์ „ํ™˜ ๋น„๋””์˜ค๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ฃผํŒŒ์ˆ˜ ์ธ์‹ ์–‘๋ฐฉํ–ฅ ์œตํ•ฉ ๊ตฌ์กฐ(FBiF)๋ฅผ ํ†ตํ•ด ์ „ํ™˜ ๋น„๋””์˜ค์˜ ์‹ ๋ขฐ์„ฑ์„ ๋†’์˜€์Šต๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ๊ธฐ์กด ๋ชจ๋ธ๋“ค๋ณด๋‹ค ์ผ๊ด€์„ฑ ์žˆ๊ณ  ๋ถ€๋“œ๋Ÿฌ์šด ์ „ํ™˜์„ ์ œ๊ณตํ•˜๋ฉฐ, ์ธ๊ฐ„ ํ‰๊ฐ€์—์„œ๋„ ์ข‹์€ ํ‰๊ฐ€๋ฅผ ๋ฐ›์•˜์Šต๋‹ˆ๋‹ค. ํ–ฅํ›„ ์—ฐ๊ตฌ๋กœ๋Š” ๋” ๊ธด ๋น„๋””์˜ค ์‹œํ€€์Šค๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ๋ชจ์ƒ‰ํ•  ๊ณ„ํš์ž…๋‹ˆ๋‹ค.
Loading

0 comments on commit 16d413c

Please sign in to comment.