-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unsloth/README.md at main · unslothai/unsloth #625
Comments
Related issues#134: marker: Convert PDF to markdown quickly with high accuracy### DetailsSimilarity score: 0.9 - [ ] [https://github.com/VikParuchuri/marker#readme](https://github.com/VikParuchuri/marker#readme)MarkerMarker converts PDF, EPUB, and MOBI to markdown. It's 10x faster than nougat, more accurate on most documents, and has low hallucination risk.
More DetailsHow it worksMarker is a pipeline of deep learning models:
Relying on autoregressive forward passes to generate text is slow and prone to hallucination/repetition. From the nougat paper: Nougat is an amazing model, but I wanted a faster and more general purpose solution. Marker is 10x faster and has low hallucination risk because it only passes equation blocks through an LLM forward pass. Examples
PerformanceThe above results are with marker and nougat setup so they each take ~3GB of VRAM on an A6000. See below for detailed speed and accuracy benchmarks, and instructions on how to run your own benchmarks. LimitationsPDF is a tricky format, so marker will not always work perfectly. Here are some known limitations that are on the roadmap to address:
InstallationThis has been tested on Mac and Linux (Ubuntu and Debian). You'll need python 3.9+ and poetry. First, clone the repo:
Linux
Mac
UsageFirst, some configuration:
Convert a single fileRun
Make sure the Convert multiple filesRun
Convert multiple files on multiple GPUsRun
BenchmarksBenchmarking PDF extraction quality is hard. I've created a test set by finding books and scientific papers that have a pdf version and a latex source. I convert the latex to text, and compare the reference to the output of text extraction methods. Benchmarks show that marker is 10x faster than nougat, and more accurate outside arXiv (nougat was trained on arXiv data). We show naive text extraction (pulling text out of the pdf with no processing) for comparison. Speed
Accuracy First 3 are non-arXiv books, last 3 are arXiv papers.
Peak GPU memory usage during the benchmark is Throughput Marker takes about 2GB of VRAM on average per task, so you can convert 24 documents in parallel on an A6000. Running your own benchmarksYou can benchmark the performance of marker on your machine. First, download the benchmark data here and unzip. Then run
This will benchmark marker against other text extraction methods. It sets up batch sizes for nougat and marker to use a similar amount of GPU RAM for each. Omit Commercial usageDue to the licensing of the underlying models like layoutlmv3 and nougat, this is only suitable for noncommercial usage. I'm building a version that can be used commercially, by stripping out the dependencies below. If you would like to get early access, email me at marker@vikas.sh. Here are the non-commercial/restrictive dependencies: Other dependencies/datasets are openly licensed (doclaynet, byt5), or used in a way that is compatible with commercial usage (ghostscript). ThanksThis work would not have been possible without amazing open source models and datasets, including (but not limited to):
Thank you to the authors of these models and datasets for making them available to the community! #456: Baseline benchmark for 17 coding models : r/LocalLLaMA### DetailsSimilarity score: 0.89 - [ ] [Baseline benchmark for 17 coding models : r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/comments/19fc4uf/baseline_benchmark_for_17_coding_models/)Baseline Benchmark for 17 Coding ModelsDiscussionI am currently working on implementing some ideas for coding models inference strategies (prompting, control, context exploration, CoT, ToT, etc) and I needed a baseline benchmark on a bunch of models. Since I work on a 3060 12GB, I was limited in what I can test so I went for every model that is 7/13B and has an AWQ quant available, since that is what the inference library that I use supports. I thought I'd share some numbers. Notes:
f"""<s>You are a helpful and respectful assistant. Answer the following question: {question}""" ResultsI've plotted the results (with horrendous contrasting colors, but alas) to look for any interesting patterns in problem solving. You can find the plots here.
Suggested labels{ "label-name": "coding-models", "description": "Discussion and benchmark of coding models implementation strategies.", "confidence": 96.82 }#160: sid321axn/tinyllama-text2sql-finetuned at main### DetailsSimilarity score: 0.89 ## tiny-llama-text2sql ## safetensors - [ ] [sid321axn/tinyllama-text2sql-finetuned at main](https://huggingface.co/sid321axn/tinyllama-text2sql-finetuned/tree/main)adapterhttps://huggingface.co/sid321axn/tiny-llama-text2sql This model is a fine-tuned version of PY007/TinyLlama-1.1B-Chat-v0.3 on the None dataset. {
"_name_or_path": "PY007/TinyLlama-1.1B-Chat-v0.3",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.37.0.dev0",
"use_cache": false,
"vocab_size": 32003
}
```</details>
### #499: marella/ctransformers: Python bindings for the Transformer models implemented in C/C++ using GGML library.
<details><summary>### Details</summary>Similarity score: 0.89
- [ ] [marella/ctransformers: Python bindings for the Transformer models implemented in C/C++ using GGML library.](https://github.com/marella/ctransformers?tab=readme-ov-file#gptq)
# CTransformers
[![PyPI version](https://badge.fury.io/py/ctransformers.svg)](https://badge.fury.io/py/ctransformers)
[![Documentation](https://readthedocs.org/images/button/readthedocs-ci.svg)](https://ctransformers.readthedocs.io/)
[![Build and Test](https://github.com/ marella / ctransformers / actions / workflows / build.yml / badge.svg)](https://github.com/marella/ctransformers/actions/workflows/build.yml)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
Python bindings for the Transformer models implemented in C/C++ using GGML library. Also see [ChatDocs](https://github.com/marella/chatdocs)
## Supported Models
| Model | Model Type | CUDA | Metal |
| ------ | --------- | :--: | :--: |
| GPT-2 | gpt2 | | |
| GPT-J, GPT4All-J | gptj | | |
| GPT-NeoX, StableLM | gpt_neox | | |
| Falcon | falcon | ✅ | |
| LLaMA, LLaMA 2 | llamai | ✅ | ✅ |
| MPT | mpt | ✅ | |
| StarCoder, StarChat | gpt_bigcode | ✅ | |
| Dolly V2 | dolly-v2 | | |
| Replit | replit | | |
## Installation
To install via `pip`, simply run:
pip install ctransformers
Run in Google Colab To stream the output: for text in llm("AI is going to", stream=True):
print(text, end="", flush=True) You can load models from Hugging Face Hub directly: llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml") If a model repo has multiple model files ( llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", model_file="ggml-model.bin") 🤗 TransformersNote: This is an experimental feature and may change in the future. To use with 🤗 Transformers, create the model and tokenizer using: from ctransformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", hf=True)
tokenizer = AutoTokenizer.from_pretrained(model) Run in Google Colab You can use 🤗 Transformers text generation pipeline: from transformers import pipeline
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(pipe("AI is going to", max_new_tokens=256)) You can use 🤗 Transformers generation parameters: pipe("AI is going to", max_new_tokens=256, do_sample=True, temperature=0.8, repetition_penalty=1.1) You can use 🤗 Transformers tokenizers: from ctransformers import AutoModelForCausalLM
from transformers import AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", hf=True) # Load model from GGML model repo.
tokenizer = AutoTokenizer.from_pretrained("gpt2") # Load tokenizer from original model repo. LangChainIt is integrated into LangChain. See LangChain docs. GPUTo run some of the model layers on GPU, set the llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GGML", gpu_layers=50) Run in Google Colab CUDAInstall CUDA libraries using: pip install ctransformers[cuda] ROCmTo enable ROCm support, install the CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers MetalTo enable Metal support, install the CT_METAL=1 pip install ctransformers --no-binary ctransformers GPTQNote: This is an experimental feature and only LLaMA models are supported using [ExLlama](https Install additional dependencies using: pip install ctransformers[gptq] Load a GPTQ model using: llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GPTQ") Run in Google Colab If the model name or path doesn't contain the word It can also be used with LangChain. Low-level APIs are not fully supported. DocumentationFind the documentation on Read the Docs. Config
Find the URL for the model card for GPTQ here. Made with ❤️ by marella Suggested labelsnull#386: SciPhi/AgentSearch-V1 · Datasets at Hugging Face### DetailsSimilarity score: 0.89 - [ ] [SciPhi/AgentSearch-V1 · Datasets at Hugging Face](https://huggingface.co/datasets/SciPhi/AgentSearch-V1)Getting StartedThe AgentSearch-V1 dataset is a comprehensive collection of over one billion embeddings, produced using jina-v2-base. It includes more than 50 million high-quality documents and over 1 billion passages, covering a vast range of content from sources such as Arxiv, Wikipedia, Project Gutenberg, and includes carefully filtered Creative Commons (CC) data. Our team is dedicated to continuously expanding and enhancing this corpus to improve the search experience. We welcome your thoughts and suggestions – please feel free to reach out with your ideas! To access and utilize the AgentSearch-V1 dataset, you can stream it via HuggingFace with the following Python code: from datasets import load_dataset
import json
import numpy as np
# To stream the entire dataset:
ds = load_dataset("SciPhi/AgentSearch-V1", data_files="**/*", split="train", streaming=True)
# Optional, stream just the "arxiv" dataset
# ds = load_dataset("SciPhi/AgentSearch-V1", data_files="**/*", split="train", data_files="arxiv/*", streaming=True)
# To process the entries:
for entry in ds:
embeddings = np.frombuffer(
entry['embeddings'], dtype=np.float32
).reshape(-1, 768)
text_chunks = json.loads(entry['text_chunks'])
metadata = json.loads(entry['metadata'])
print(f'Embeddings:\n{embeddings}\n\nChunks:\n{text_chunks}\n\nMetadata:\n{metadata}')
break A full set of scripts to recreate the dataset from scratch can be found here. Further, you may check the docs for details on how to perform RAG over AgentSearch. LanguagesEnglish. Dataset StructureThe raw dataset structure is as follows: {
"url": ...,
"title": ...,
"metadata": {"url": "...", "timestamp": "...", "source": "...", "language": "..."},
"text_chunks": ...,
"embeddings": ...,
"dataset": "book" | "arxiv" | "wikipedia" | "stack-exchange" | "open-math" | "RedPajama-Data-V2"
} Dataset CreationThis dataset was created as a step towards making humanities most important knowledge openly searchable and LLM optimal. It was created by filtering, cleaning, and augmenting locally publicly available datasets. To cite our work, please use the following: @software{SciPhi2023AgentSearch, Source Data@online{wikidump, @misc{paster2023openwebmath, @software{together2023redpajama, LicensePlease refer to the licenses of the data subsets you use.
Suggested labels{ "key": "knowledge-dataset", "value": "A dataset with one billion embeddings from various sources, such as Arxiv, Wikipedia, Project Gutenberg, and carefully filtered Creative Commons data" }#383: deepseek-ai/deepseek-coder-5.7bmqa-base · Hugging Face### DetailsSimilarity score: 0.88 - [ ] [deepseek-ai/deepseek-coder-5.7bmqa-base · Hugging Face](https://huggingface.co/deepseek-ai/deepseek-coder-5.7bmqa-base)Deepseek Coder IntroductionDeepseek Coder is a series of code language models, each trained from scratch on 2T tokens with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on a project-level code corpus with a window size of 16K and an extra fill-in-the-blank task, supporting project-level code completion and infilling. Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks. Key Features
Model Summary
How to UseThis section provides examples of how to use the Deepseek Coder model for code completion, code insertion, and repository-level code completion tasks. Code Completionfrom transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True).cuda()
input_text = "#write a quick sort algorithm"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) Code Insertionfrom transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True).cuda()
input_text = """<|begin|>def quick_sort(arr):
if len(arr) <= 1:
return arr
pivot = arr[0]
left = []
right = []
<|hole|>
if arr[i] < pivot:
left.append(arr[i])
else:
right.append(arr[i])
return quick_sort(left) + [pivot] + quick_sort(right)<|end|>"""
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):]) Repository Level Code Completionfrom transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True).cuda()
input_text = """#utils.py
import torch
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
def load_data():
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Standardize the data
scaler = StandardScaler()
X = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Convert numpy data to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.int64)
y_test = torch.tensor(y_test, dtype=torch.int64)
return X_train, X_test, y_train, y_test
def evaluate_predictions(y_test, y_pred):
return accuracy_score(y_test, y_pred)
#model.py
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
class IrisClassifier(nn.Module):
def __init__(self):
super(IrisClassifier, self).__init__()
self.fc = nn.Sequential(
nn.Linear(4, 16),
nn.ReLU(),
nn.Linear(16, 3)
)
def forward(self, x):
return self.fc(x)
def train_model(self, X_train, y_train, epochs, lr, batch_size):
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(self.parameters(), lr=lr)
# Create DataLoader for batches
dataset = TensorDataset(X_train, y_train)
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
for epoch in range(epochs):
for batch_X, batch_y in dataloader:
optimizer.zero_grad()
outputs = self(batch_X)
loss = criterion(outputs, batch_y)
loss.backward()
optimizer.step()
def predict(self, X_test):
with torch.no_grad():
outputs = self(X_test)
_, predicted = outputs.max(1)
return predicted.numpy()
#main.py
from utils import load_data, evaluate_predictions
from model import IrisClassifier as Classifier
def main():
# Model training and evaluation
"""
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=140)
print(tokenizer.decode(outputs[0])) LicenseThis code repository is licensed under the MIT License. The use of Deepseek Coder models is subject to the Model License. DeepSeek Coder supports commercial use. See the LICENSE-MODEL for more details. ContactIf you have any questions, please raise an issue or contact us at agi_code@deepseek.com. Suggested labels{ "key": "llm-experiments", "value": "Experiments and results related to Large Language Models" } { "key": "AI-Chatbots", "value": "Topics related to advanced chatbot platforms integrating multiple AI models" } |
unsloth/README.md at main · unslothai/unsloth
Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory!
✨ Finetune for Free
All notebooks are beginner friendly! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.
🦥 Unsloth.ai News
unsloth/mistral-7b-bnb-4bit
🔗 Links and Resources
⭐ Key Features
🥇 Performance Benchmarking
View on GitHub
Suggested labels
The text was updated successfully, but these errors were encountered: