230 Branches 8 Tags

Name	Name	Last commit message	Last commit date
Latest commit arendu Merge branch 'main' into adithyare/mamba_dpo Nov 1, 2024 6db2b64 · Nov 1, 2024 History 156 Commits
.github	.github	ci: Sign-off cherry pick (#366 )	Oct 31, 2024
docs	docs	docs: fix code block rendering (#369 )	Oct 31, 2024
examples	examples	wip	Oct 30, 2024
nemo_aligner	nemo_aligner	wip	Oct 30, 2024
setup	setup	trt-llm integration (#194 )	Aug 30, 2024
tests	tests	fix: correct batch tokenization when sequence exceeds encoder length (#…	Oct 25, 2024
.gitignore	.gitignore	Add DPO and PPO presubmit tests (#344 )	Oct 22, 2024
.pre-commit-config.yaml	.pre-commit-config.yaml	ci(fix): Make pre-commit sign so that a rebase isn't required (#359 )	Oct 25, 2024
CHANGELOG.md	CHANGELOG.md	Added changelog	Oct 11, 2024
CITATION.cff	CITATION.cff	update citation and readme	Dec 4, 2023
CONTRIBUTING.md	CONTRIBUTING.md	add dev github action and update contributing guide (#77 )	Jan 9, 2024
Dockerfile	Dockerfile	build: use cached trtllm build when updating aligner tag (#358 )	Oct 28, 2024
LICENSE	LICENSE	initial commit	Nov 20, 2023
MANIFEST.in	MANIFEST.in	ci: Release automation (#282 )	Sep 24, 2024
README.md	README.md	docs: main readme and sft docs (#367 )	Oct 31, 2024
SECURITY.md	SECURITY.md	initial commit	Nov 20, 2023
pyproject.toml	pyproject.toml	initial commit	Nov 20, 2023
setup.py	setup.py	trt-llm integration (#194 )	Aug 30, 2024

Repository files navigation

NVIDIA NeMo-Aligner

Latest News

We released Nemotron-4-340B Base, Instruct, Reward. The Instruct and Reward variants are trained in Nemo-Aligner. Please see the Helpsteer2 paper for more details on the reward model training.
We are excited to announce the release of accelerated generation support in our RLHF pipeline using TensorRT-LLM. For more information, please refer to our RLHF documentation.
NeMo-Aligner Paper is now out on arxiv!

Introduction

NeMo-Aligner is a scalable toolkit for efficient model alignment. The toolkit has support for state-of-the- art model alignment algorithms such as SteerLM, DPO, and Reinforcement Learning from Human Feedback (RLHF). These algorithms enable users to align language models to be more safe, harmless, and helpful. Users can perform end-to-end model alignment on a wide range of model sizes and take advantage of all the parallelism techniques to ensure their model alignment is done in a performant and resource-efficient manner. For more technical details, please refer to our paper.

The NeMo-Aligner toolkit is built using the NeMo Framework, which enables scalable training across thousands of GPUs using tensor, data, and pipeline parallelism for all alignment components. Additionally, our checkpoints are cross-compatible with the NeMo ecosystem, facilitating inference deployment and further customization (https://github.com/NVIDIA/NeMo-Aligner).

The toolkit is currently in it's early stages. We are committed to improving the toolkit to make it easier for developers to pick and choose different alignment algorithms to build safe, helpful, and reliable models.

Key Features

SteerLM: Attribute Conditioned SFT as an (User-Steerable) alternative to RLHF
- Llama3-70B-SteerLM-Chat aligned with NeMo-Aligner.
- Corresponding reward model Llama3-70B-SteerLM-RM.
- Learn more at our SteerLM and HelpSteer2 papers.
Supervised Fine Tuning
Reward Model Training
Reinforcement Learning from Human Feedback using the PPO Algorithm
- Llama3-70B-PPO-Chat aligned with NeMo-Aligner using TRT-LLM.
Direct Preference Optimization as described in Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- Llama3-70B-DPO-Chat aligned with NeMo Aligner.
Self-Play Fine-Tuning (SPIN) as described in Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Learn More

Latest Release

For the latest stable release, please see the releases page. All releases come with a pre-built container. Changes within each release will be documented in CHANGELOG.

Install Your Own Environment

Requirements

NeMo-Aligner has the same requirements as the NeMo Toolkit Requirements with the addition of PyTriton.

Quick start inside NeMo container

NeMo Aligner comes included with NeMo containers. On a machine with NVIDIA GPUs and drivers installed run NeMo container:

docker run --gpus all -it --rm --shm-size=8g --ulimit memlock=-1 --ulimit stack=67108864  nvcr.io/nvidia/nemo:24.07

Once you are inside the container, NeMo-Aligner is already installed and together with NeMo and other tools can be found under /opt/ folder.

Install NeMo-Aligner

Please follow the same steps as outlined in the NeMo Toolkit Installation Guide. After installing NeMo, execute the following additional command:

pip install nemo-aligner

Alternatively, if you prefer to install the latest commit:

pip install .

Docker Containers

We provide an official NeMo-Aligner Dockerfile which is based on stable, tested versions of NeMo, Megatron-LM, and TransformerEngine. The primary objective of this Dockerfile is to ensure stability, although it might not always reflect the very latest versions of those three packages. You can access our Dockerfile here.

Alternatively, you can build the NeMo Dockerfile here NeMo Dockerfile and add RUN pip install nemo-aligner at the end.

Future work

We will continue improving the stability of the PPO learning phase.
Improve the performance of RLHF.
Add TRT-LLM inference support for Rejection Sampling.

Contribute to NeMo-Aligner

We welcome community contributions! Please refer to CONTRIBUTING.md for guidelines.

Cite NeMo-Aligner in Your Work

@misc{shen2024nemoaligner,
      title={NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment},
      author={Gerald Shen and Zhilin Wang and Olivier Delalleau and Jiaqi Zeng and Yi Dong and Daniel Egert and Shengyang Sun and Jimmy Zhang and Sahil Jain and Ali Taghibakhshi and Markel Sanz Ausin and Ashwath Aithal and Oleksii Kuchaiev},
      year={2024},
      eprint={2405.01481},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

License

This toolkit is licensed under the Apache License, Version 2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA NeMo-Aligner

Latest News

Introduction

Key Features

Learn More

Latest Release

Install Your Own Environment

Requirements

Quick start inside NeMo container

Install NeMo-Aligner

Docker Containers

Future work

Contribute to NeMo-Aligner

Cite NeMo-Aligner in Your Work

License

About

Releases 7

Packages

Contributors 31

Languages

License

NVIDIA/NeMo-Aligner

Folders and files

Latest commit

History

Repository files navigation

NVIDIA NeMo-Aligner

Latest News

Introduction

Key Features

Learn More

Latest Release

Install Your Own Environment

Requirements

Quick start inside NeMo container

Install NeMo-Aligner

Docker Containers

Future work

Contribute to NeMo-Aligner

Cite NeMo-Aligner in Your Work

License

About

Resources

License

Security policy

Citation

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 31

Languages

Packages