trying to understand LLMs. This is my journey so far:
- A failed experiment with LISA: "Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning", code, paper
- π οΈ Memory-efficient LLM Training with GaLore, yet another PEFT approach, code
- βοΈ Evaluating LLMs with Semantic Similarity, code
- π οΈ Finetune TinyLlama and StableLM 2, code
- π οΈ Finetune Microsoft's Phi-2, code
- π οΈ Finetune Mamba, code
- π οΈ Finetune Llama2 and Mistral using QLoRA, code
- βοΈ Evaluate LLM language capabilities with meta's Belebele benchmark, code
- βοΈ Evaluate LLM language capabilities with BLEU, code
- βοΈ Llama2-70B as a judge of LLMs performs almost as good as GPT-4, code
- βοΈ Validation loss is not a good metric for chatbot quality
- βοΈ Use GPT3.5 as a judge of open-source LLMs, code
- π οΈ Finetune Llama on podcast transripts with QLoRA, code
- π Use Stable Diffusion for sketch-guided image generation, code