Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TabbyML: Self-hosted AI coding assistant. #642

Open
1 task
irthomasthomas opened this issue Feb 27, 2024 · 1 comment
Open
1 task

TabbyML: Self-hosted AI coding assistant. #642

irthomasthomas opened this issue Feb 27, 2024 · 1 comment
Labels
AI-Agents Autonomous AI agents using LLMs code-generation code generation models and tools like copilot and aider ml-inference Running and serving ML models. python Python code, tools, info Software2.0 Software development driven by AI and neural networks.

Comments

@irthomasthomas
Copy link
Owner

tabby/README.md at main · TabbyML/tabby

🐾 Tabby

latest release
PRs Welcome
Docker pulls
codecov

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features:

  • Self-contained, with no need for a DBMS or cloud service.
  • OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE).
  • Supports consumer-grade GPUs.

Open in Playground

Demo

🔥 What's New

Archived
  • 10/15/2023 RAG-based code completion is enabled by detail in v0.3.0🎉! Check out the blogpost explaining how Tabby utilizes repo-level context to get even smarter!
  • 11/27/2023 v0.6.0 released!
  • 11/09/2023 v0.5.5 released! With a redesign of UI + performance improvement.
  • 10/04/2023 Check out the model directory for the latest models supported by Tabby.
  • 09/18/2023 Apple's M1/M2 Metal inference support has landed in v0.1.1!
  • 08/31/2023 Tabby's first stable release v0.0.1 🥳.
  • 08/28/2023 Experimental support for the CodeLlama 7B.
  • 08/24/2023 Tabby is now on JetBrains Marketplace!

👋 Getting Started

You can find our documentation here.

Run Tabby in 1 Minute

The easiest way to start a Tabby server is by using the following Docker command:

```bash
docker run -it \
--gpus all -p 8080:8080 -v $HOME/.tabby:/data \
tabbyml/tabby \
serve --model TabbyML/StarCoder-1B --device cuda
```
For additional options (e.g inference type, parallelism), please refer to the documentation page.

🤝 Contributing

Full guide at CONTRIBUTING.md;

Get the Code

```bash
git clone --recurse-submodules https://github.com/TabbyML/tabby
cd tabby
```

If you have already cloned the repository, you could run the `git submodule update --recursive --init` command to fetch all submodules.

Build

  1. Set up the Rust environment by following this tutorial.

  2. Install the required dependencies:
    ```bash

For MacOS

brew install protobuf

For Ubuntu / Debian

apt-get install protobuf-compiler libopenblas-dev
```

  1. Now, you can build Tabby by running the command `cargo build`.

Start Hacking!

... and don't forget to submit a Pull Request

🌍 Community

  • 🎤 Twitter / X - engage with TabbyML for all things possible
  • 📚 LinkedIn - follow for the latest from the community
  • 💌 Newsletter - subscribe to unlock Tabby insights and secrets

🌟 Star History

Star History Chart

URL: tabby/README.md

Suggested labels

@irthomasthomas irthomasthomas added AI-Agents Autonomous AI agents using LLMs code-generation code generation models and tools like copilot and aider ml-inference Running and serving ML models. python Python code, tools, info Software2.0 Software development driven by AI and neural networks. labels Feb 27, 2024
@irthomasthomas
Copy link
Owner Author

irthomasthomas commented Feb 27, 2024

Related issues

#625: unsloth/README.md at main · unslothai/unsloth

DetailsSimilarity score: 0.87 - [ ] [unsloth/README.md at main · unslothai/unsloth](https://github.com/unslothai/unsloth/blob/main/README.md?plain=1)

unsloth/README.md at main · unslothai/unsloth

unsloth logo



Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory!

✨ Finetune for Free

All notebooks are beginner friendly! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.

Unsloth supports Free Notebooks Performance Memory use
Gemma 7b ▶️ Start on Colab 2.4x faster 58% less
Mistral 7b ▶️ Start on Colab 2.2x faster 62% less
Llama-2 7b ▶️ Start on Colab 2.2x faster 43% less
TinyLlama ▶️ Start on Colab 3.9x faster 74% less
CodeLlama 34b A100 ▶️ Start on Colab 1.9x faster 27% less
Mistral 7b 1xT4 ▶️ Start on Kaggle 5x faster* 62% less
DPO - Zephyr ▶️ Start on Colab 1.9x faster 19% less

🦥 Unsloth.ai News

🔗 Links and Resources

Type Links
📚 Wiki & FAQ Read Our Wiki
📜 Documentation Read The Doc
💾 Installation unsloth/README.md
  Twitter (aka X) Follow us on X
🥇 Benchmarking Performance Tables
🌐 Released Models Unsloth Releases
✍️ Blog Read our Blogs

⭐ Key Features

  • All kernels written in OpenAI's Triton language. Manual backprop engine.
  • 0% loss in accuracy - no approximation methods - all exact.
  • No change of hardware. Supports NVIDIA GPUs since 2018+. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc) Check your GPU! GTX 1070, 1080 works, but is slow.
  • Works on Linux and Windows via WSL.
  • Supports 4bit and 16bit QLoRA / LoRA finetuning via bitsandbytes.
  • Open source trains 5x faster - see Unsloth Pro for 30x faster training!
  • If you trained a model with 🦥Unsloth, you can use this cool sticker!  

🥇 Performance Benchmarking

1 A100 40GB 🤗Hugging Face Flash Attention 🦥Unsloth Open Source 🦥Unsloth Pro
Alpaca 1x 1.04x 1.98x 15.64x
LAION Chip2 1x 0.92x 1.61x 20.73x
OASST 1x 1.19x 2.17x 14.83x
Slim Orca 1x 1.18x 2.22x 14.82x
Free Colab T4 Dataset 🤗Hugging Face Pytorch 2.1.1 🦥Unsloth 🦥 VRAM reduction
Llama-2 7b OASST 1x 1.19x 1.95x -43.3%
Mistral 7b Alpaca 1x 1.07x 1.56x -13.7%
Tiny Llama 1.1b Alpaca 1x 2.06x 3.87x -73.8%
DPO with Zephyr Ultra Chat 1x 1.09x 1.55x -18.6%

View on GitHub

Suggested labels

#640: README.md · defog/sqlcoder-7b-2 at main

DetailsSimilarity score: 0.85 - [ ] [README.md · defog/sqlcoder-7b-2 at main](https://huggingface.co/defog/sqlcoder-7b-2/blob/main/README.md?code=true)

README.md · defog/sqlcoder-7b-2 at main

DESCRIPTION:

license: cc-by-sa-4.0
library_name: transformers
pipeline_tag: text-generation

Update notice

The model weights were updated at 7 AM UTC on Feb 7, 2024. The new model weights lead to a much more performant model – particularly for joins.

If you downloaded the model before that, please redownload the weights for best performance.

Model Card for SQLCoder-7B-2

A capable large language model for natural language to SQL generation.

image/png

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: Defog, Inc
  • Model type: [Text to SQL]
  • License: [CC-by-SA-4.0]
  • Finetuned from model: [CodeLlama-7B]

Model Sources [optional]

Uses

This model is intended to be used by non-technical users to understand data inside their SQL databases. It is meant as an analytics tool, and not as a database admin tool.

This model has not been trained to reject malicious requests from users with write access to databases, and should only be used by users with read-only access.

How to Get Started with the Model

Use the code here to get started with the model.

Prompt

Please use the following prompt for optimal results. Please remember to use do_sample=False and num_beams=4 for optimal results.

### Task
Generate a SQL query to answer [QUESTION]{user_question}[/QUESTION]
### Database Schema
The query will run on a database with the following schema:
{table_metadata_string_DDL_statements}
### Answer
Given the database schema, here is the SQL query that [QUESTION]{user_question}[/QUESTION]
[SQL]

Evaluation

This model was evaluated on SQL-Eval, a PostgreSQL based evaluation framework developed by Defog for testing and alignment of model capabilities.

You can read more about the methodology behind SQLEval here.

Results

We classified each generated question into one of 6 categories. The table displays the percentage of questions answered correctly by each model, broken down by category.

date group_by order_by ratio join where
sqlcoder-70b 96 91.4 97.1 85.7 97.1 91.4
sqlcoder-7b-2 96 91.4 94.3 91.4 94.3 77.1
sqlcoder-34b 80 94.3 85.7 77.1 85.7 80
gpt-4 72 94.3 97.1 80 91.4 80
gpt-4-turbo 76 91.4 91.4 62.8 88.6 77.1
natural-sql-7b 56 88.6 85.7 60 88.6 80
sqlcoder-7b 64 82.9 74.3 54.3 74.3 74.3
gpt-3.5 72 77.1 82.8 34.3 65.7 71.4
claude-2 52 71.4 74.3 57.1 65.7 62.9

Model Card Contact

Contact us on X at @defogdata, or on email at founders@defog.ai

URL: https://huggingface.co/defog/sqlcoder-7b-2/blob/main/README.md?code=true

Suggested labels

#498: CodeGPTPlus/deepseek-coder-1.3b-typescript · Hugging Face

### DetailsSimilarity score: 0.85 - [ ] [CodeGPTPlus/deepseek-coder-1.3b-typescript · Hugging Face](https://huggingface.co/CodeGPTPlus/deepseek-coder-1.3b-typescript)

CodeGPTPlus/deepseek-coder-1.3b-typescript

This is a fine-tuned model by the CodeGPT team, specifically crafted for generating expert code in TypeScript. It is fine-tuned from deepseek-ai/deepseek-coder-1.3b-base with a dataset of 0.5B tokens, making it an excellent choice for precise and efficient TypeScript code generation.

The model uses a 16K window size and an additional fill-in-the-middle task for project-level code completion.

How to Use

This model is for completion purposes only. Here are some examples of how to use the model:

Running the model on a GPU

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript", trust_remote_code=True).cuda()

input_text = """<|fim begin|>function quickSort(arr: number[]): number[] {
  if (arr.length <= 1) {
    return arr;
  }
  const pivot = arr[0];
  const left = [];
  const right = [];
<|fim hole|>
  return [...quickSort(left), pivot, ...quickSort(right)];
}<|fim end|>"""

inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Running with Ollama

Running with Ollama and CodeGPT Autocomplete in VSCode

Fill In the Middle (FIM)

<|fim begin|>function quickSort(arr: number[]): number[] {
  if (arr.length <= 1) {
    return arr;
  }
  const pivot = arr[0];
  const left = [];
  const right = [];
<|fim hole|>
  return [...quickSort(left), pivot, ...quickSort(right)];
}<|fim end|>

Training Procedure

The model was trained using the following hyperparameters:

  • learning_rate: 2e-05
  • train_batch_size: 20
  • eval_batch_size: 20
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 40
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 261
  • num_epochs: 1

For more information, visit the model page.

Suggested labels

{ "label-name": "TypeScript-Code-Generation", "description": "Model for generating TypeScript code", "repo": "CodeGPTPlus/deepseek-coder-1.3b-typescript", "confidence": 70.59 }

#309: openai/human-eval: Code for the paper "Evaluating Large Language Models Trained on Code"

### DetailsSimilarity score: 0.85 - [ ] [openai/human-eval: Code for the paper "Evaluating Large Language Models Trained on Code"](https://github.com/openai/human-eval)

HumanEval: Hand-Written Evaluation Set

This is an evaluation harness for the HumanEval problem solving dataset described in the paper "Evaluating Large Language Models Trained on Code".

Installation

Make sure to use python 3.7 or later:

$ conda create -n codex python=3.7
$ conda activate codex
Check out and install this repository:

$ git clone https://github.com/openai/human-eval
$ pip install -e human-eval
Usage

This program exists to run untrusted model-generated code. Users are strongly encouraged not to do so outside of a robust security sandbox. The execution call in execution.py is deliberately commented out to ensure users read this disclaimer before running code in a potentially unsafe manner. See the comment in execution.py for more information and instructions.

After following the above instructions to enable execution, generate samples and save them in the following JSON Lines (jsonl) format, where each sample is formatted into a single line like so:

{"task_id": "Corresponding HumanEval task ID", "completion": "Completion only without the prompt"}
We provide example_problem.jsonl and example_solutions.jsonl under data to illustrate the format and help with debugging.

Here is nearly functional example code (you just have to provide generate_one_completion to make it work) that saves generated completions to samples.jsonl.

from human_eval.data import write_jsonl, read_problems

problems = read_problems()

num_samples_per_task = 200
samples = [
dict(task_id=task_id, completion=generate_one_completion(problems[task_id]["prompt"]))
for task_id in problems
for _ in range(num_samples_per_task)
]
write_jsonl("samples.jsonl", samples)
To evaluate the samples, run

$ evaluate_functional_correctness samples.jsonl
Reading samples...
32800it [00:01, 23787.50it/s]
Running test suites...
100%|...| 32800/32800 [16:11<00:00, 33.76it/s]
Writing results to samples.jsonl_results.jsonl...
100%|...| 32800/32800 [00:00<00:00, 42876.84it/s]
{'pass@1': ..., 'pass@10': ..., 'pass@100': ...}
This script provides more fine-grained information in a new file ending in <input_path>_results.jsonl. Each row now contains whether the completion passed along with the execution result which is one of "passed", "timed out", or "failed".

As a quick sanity-check, the example samples should yield 0.5 pass@1.

$ evaluate_functional_correctness data/example_samples.jsonl --problem_file=data/example_problem.jsonl
Reading samples...
6it [00:00, 3397.11it/s]
Running example suites...
100%|...| 6/6 [00:03<00:00, 1.96it/s]
Writing results to data/example_samples.jsonl_results.jsonl...
100%|...| 6/6 [00:00<00:00, 6148.50it/s]
{'pass@1': 0.4999999999999999}
Because there is no unbiased way of estimating pass@k when there are fewer samples than k, the script does not evaluate pass@k for these cases. To evaluate with other k values, pass --k=. For other options, see

$ evaluate_functional_correctness --help
However, we recommend that you use the default values for the rest.

Known Issues

While evaluation uses very little memory, you might see the following error message when the system is running out of RAM. Since this may cause some correct programs to fail, we recommend that you free some memory and try again.

malloc: can't allocate region
Citation

Please cite using the following bibtex entry:

@Article{chen2021codex,
title={Evaluating Large Language Models Trained on Code},
author={Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Ponde de Oliveira Pinto and Jared Kaplan and Harri Edwards and Yuri Burda and Nicholas Joseph and Greg Brockman and Alex Ray and Raul Puri and Gretchen Krueger and Michael Petrov and Heidy Khlaaf and Girish Sastry and Pamela Mishkin and Brooke Chan and Scott Gray and Nick Ryder and Mikhail Pavlov and Alethea Power and Lukasz Kaiser and Mohammad Bavarian and Clemens Winter and Philippe Tillet and Felipe Petroski Such and Dave Cummings and Matthias Plappert and Fotios Chantzis and Elizabeth Barnes and Ariel Herbert-Voss and William Hebgen Guss and Alex Nichol and Alex Paino and Nikolas Tezak and Jie Tang and Igor Babuschkin and Suchir Balaji and Shantanu Jain and William Saunders and Christopher Hesse and Andrew N. Carr and Jan Leike and Josh Achiam and Vedant Misra and Evan Morikawa and Alec Radford and Matthew Knight and Miles Brundage and Mira Murati and Katie Mayer and Peter Welinder and Bob McGrew and Dario Amodei and Sam McCandlish and Ilya Sutskever and Wojciech Zaremba},
year={2021},
eprint={2107.03374},
archivePrefix={arXiv},
primaryClass={cs.LG}
}

Suggested labels

{ "key": "llm-evaluation", "value": "Evaluating Large Language Models performance and behavior through human-written evaluation sets" }

#628: LLaVA/README.md at main · haotian-liu/LLaVA

### DetailsSimilarity score: 0.85 - [ ] [LLaVA/README.md at main · haotian-liu/LLaVA](https://github.com/haotian-liu/LLaVA/blob/main/README.md?plain=1)

LLaVA/README.md at main · haotian-liu/LLaVA

🌋 LLaVA: Large Language and Vision Assistant

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.

📢 LLaVA-NeXT Blog Project Page Demo Data Model Zoo

🤝Community Contributions: llama.cpp Colab 🤗Space Replicate AutoGen BakLLaVA

Improved Baselines with Visual Instruction Tuning Paper HF

Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee

Visual Instruction Tuning (NeurIPS 2023, Oral) Paper HF

Haotian Liu*, Chunyuan Li*, Qingyang Wu, Yong Jae Lee (*Equal Contribution)

Release

  • [1/30] 🔥 LLaVA-NeXT (LLaVA-1.6) is out! With additional scaling to LLaVA-1.5, LLaVA-NeXT-34B outperforms Gemini Pro on some benchmarks. It can now process 4x more pixels and perform more tasks/applications than before. Check out the blog post, and explore the demo! Models are available in Model Zoo. Training/eval data and scripts coming soon.
  • [11/10] LLaVA-Plus is released: Learning to Use Tools for Creating Multimodal Agents, with LLaVA-Plus (LLaVA that Plug and Learn to Use Skills). Project Page Demo Code Paper
  • [11/2] LLaVA-Interactive is released: Experience the future of human-AI multimodal interaction with an all-in-one demo for Image Chat, Segmentation, Generation and Editing. Project Page Demo Code Paper
  • [10/26] 🔥 LLaVA-1.5 with LoRA achieves comparable performance as full-model finetuning, with a reduced GPU RAM requirement (ckpts) (script). We also provide a doc on how to finetune LLaVA-1.5 on your own dataset with LoRA.
  • [10/12] Check out the Korean LLaVA (Ko-LLaVA), created by ETRI, who has generously supported our research! 🤗 Demo
  • [10/5] 🔥 LLaVA-1.5 is out! Achieving SoTA on 11 benchmarks, with just simple modifications to the original LLaVA, utilizes all public data, completes training in ~1 day on a single 8-A100 node, and surpasses methods like Qwen-VL-Chat that use billion-scale data. Check out the technical report, and explore the demo! Models are available in Model Zoo. The training data and scripts of LLaVA-1.5 are released here, and evaluation scripts are released here.
  • [9/26] LLaVA is improved with reinforcement learning from human feedback (RLHF) to improve fact grounding and reduce hallucination. Check out the new SFT and RLHF checkpoints at project LLavA-RLHF.
  • [9/22] LLaVA is accepted by NeurIPS 2023 as oral presentation, and LLaVA-Med is accepted by NeurIPS 2023 Datasets and Benchmarks Track as spotlight presentation.
More
  • [11/6] Support Intel dGPU and CPU platforms. More details here.
  • [10/12] LLaVA is now supported in llama.cpp with 4-bit / 5-bit quantization support!
  • [10/11] The training data and scripts of LLaVA-1.5 are released here, and evaluation scripts are released here!
  • [10/10] Roboflow Deep Dive: First Impressions with LLaVA-1.5.
  • [9/20] We summarize our empirical study of training 33B and 65B LLaVA models in a note. Further, if you are interested in the comprehensive review, evolution and trend of multimodal foundation models, please check out our recent survey paper "Multimodal Foundation Models: From Specialists to General-Purpose Assistants".

  • [7/19] We release a major upgrade, including support for LLaMA-2, LoRA training, 4-/8-bit inference, higher resolution (336x336), and a lot more. We release LLaVA Bench for benchmarking open-ended visual chat with results from Bard and Bing-Chat. We also support and verify training with RTX 3090 and RTX A6000. Check out LLaVA-from-LLaMA-2, and our model zoo!
  • [6/26] CVPR 2023 Tutorial on Large Multimodal Models: Towards Building and Surpassing Multimodal GPT-4! Please check out Slides Notes YouTube Bilibli.
  • [6/11] We released the preview for the most requested feature: DeepSpeed and LoRA support! Please see documentations here.
  • [6/1] We released LLaVA-Med: Large Language and Vision Assistant for Biomedicine, a step towards building biomedical domain large language and vision models with GPT-4 level capabilities. Checkout the paper and page.
  • [5/6] We are releasing LLaVA-Lighting-MPT-7B-preview, based on MPT-7B-Chat! See here for more details.
  • [5/2] We are releasing LLaVA-Lighting! Train a lite, multimodal GPT-4 with just $40 in 3 hours! See here for more details.
  • [4/27] Thanks to the community effort, LLaVA-13B with 4-bit quantization allows you to run on a GPU with as few as 12GB VRAM! Try it out here.
  • [4/17] We released LLaVA: Large Language and Vision Assistant. We propose visual instruction tuning, towards building large language and vision models with GPT-4 level capabilities. Checkout the paper and demo.

Code License

Usage and License Notices: This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses, including but not limited to the OpenAI Terms of Use for the dataset and the specific licenses for base language models for checkpoints trained using the dataset (e.g. Llama community license for LLaMA-2 and Vicuna-v1.5). This project does not impose any additional constraints beyond those stipulated in the original licenses. Furthermore, users are reminded to ensure that their use of the dataset and checkpoints is in compliance with all applicable laws and regulations.

Contents

Suggested labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI-Agents Autonomous AI agents using LLMs code-generation code generation models and tools like copilot and aider ml-inference Running and serving ML models. python Python code, tools, info Software2.0 Software development driven by AI and neural networks.
Projects
None yet
Development

No branches or pull requests

1 participant