LLaVA-O1: Open Large Reasoning MLLMs Frameworks

LLaVA-O1 is an open-source framework designed for training, inference, and evaluation of large reasoning models (MLLMs) using PyTorch and HuggingFace. It aims to streamline the development of robust multimodal language models by integrating efficient workflows with popular libraries.

Features

Flexible Training Pipelines: Support for various training modes including single-GPU, multi-GPU, and distributed training.
Inference with HuggingFace Integration: Seamlessly conduct inference using HuggingFace’s model hub.
Extensive Evaluation: Built-in metrics and evaluation tools for assessing model performance.
Multimodal Compatibility: Designed to handle multimodal tasks, enabling diverse applications from text-only tasks to multimodal reasoning.

Installation

To install and set up LLaVA-O1, you need to have Python 3.8+ and PyTorch installed. Follow these steps to set up the environment:

# Clone the repository
git clone  https://github.com/White65534/LLaVA-O1.git
cd LLaVA-O1

# Create a virtual environment
python -m venv llava_env
source llava_env/bin/activate  # On Windows, use `llava_env\Scripts\activate`

# Install dependencies
pip install -r requirements.txt

Additional Requirements

If you plan to use GPU support, ensure you have the appropriate CUDA version installed.

Getting Started

After installation, you can quickly start by running a simple inference or training example. Here’s how:

Model Configuration: Define your model configuration by editing the config.yaml file.
Dataset Preparation: Organize your data under the data/ directory, following the example dataset structures provided in examples/.

Example Inference

To run a simple inference with LLaVA-O1, use the following command:

python scripts/inference.py --config config.yaml --input data/sample_input.txt

Training

LLaVA-O1 supports both single-GPU and distributed training using PyTorch. To train a model, modify the training configurations in config.yaml and then run:

python scripts/train.py --config config.yaml

Distributed Training

For distributed training, use:

torchrun --nproc_per_node=NUM_GPUS scripts/train.py --config config.yaml

Replace NUM_GPUS with the number of GPUs available for training.

Inference

After training, you can perform inference on new data. Specify the model path in config.yaml and run:

python scripts/inference.py --config config.yaml --input data/inference_data.txt

Evaluation

To evaluate the model on a test set, use the evaluation script:

python scripts/evaluate.py --config config.yaml --input data/test_data.txt

Supported Evaluation Metrics

Accuracy
Precision/Recall/F1 Scores
Task-Specific Metrics (e.g., BLEU, ROUGE for text, etc.)

Contributing

Contributions are welcome! To contribute:

Fork the repository.
Create a new branch with your feature/bugfix.
Submit a pull request with a clear description of your changes.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLaVA-O1: Open Large Reasoning MLLMs Frameworks

Table of Contents

Features

Installation

Additional Requirements

Getting Started

Example Inference

Training

Distributed Training

Inference

Evaluation

Supported Evaluation Metrics

Contributing

License

About

Releases

Packages

White65534/LLaVA-O1

Folders and files

Latest commit

History

Repository files navigation

LLaVA-O1: Open Large Reasoning MLLMs Frameworks

Table of Contents

Features

Installation

Additional Requirements

Getting Started

Example Inference

Training

Distributed Training

Inference

Evaluation

Supported Evaluation Metrics

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages