Skip to content

Commit

Permalink
Merge pull request #1023 from JohnSnowLabs/chore/final_website_updates
Browse files Browse the repository at this point in the history
website updates
  • Loading branch information
chakravarthik27 authored May 15, 2024
2 parents ce7a06a + 3688c5c commit 9e84ae9
Show file tree
Hide file tree
Showing 10 changed files with 4,058 additions and 233 deletions.
2,417 changes: 2,417 additions & 0 deletions demo/tutorials/benchmarks/Benchmarking_with_Harness.ipynb

Large diffs are not rendered by default.

19 changes: 2 additions & 17 deletions demo/tutorials/benchmarks/Langtest_Cli_Eval_Command.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -46,24 +46,9 @@
"id": "OPPUwGvzyAoV",
"outputId": "670c68e7-83fe-418c-8e3e-094590f5b7f2"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m19.7/19.7 MB\u001b[0m \u001b[31m73.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h\u001b[33mWARNING: langtest 2.1.0rc2 does not provide the extra 'all'\u001b[0m\u001b[33m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m13.0/13.0 MB\u001b[0m \u001b[31m99.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.1/3.1 MB\u001b[0m \u001b[31m105.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m345.4/345.4 kB\u001b[0m \u001b[31m41.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
"google-colab 1.0.0 requires pandas==1.5.3, but you have pandas 2.2.1 which is incompatible.\u001b[0m\u001b[31m\n",
"\u001b[0m"
]
}
],
"outputs": [],
"source": [
"!pip install -q langtest[all]==2.1.0rc2"
"!pip install -q langtest[all]"
]
},
{
Expand Down
1,017 changes: 1,017 additions & 0 deletions demo/tutorials/llm_notebooks/NER Casual LLM.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions demo/tutorials/misc/Data_Augmenter_Notebook.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions demo/tutorials/misc/MultiPrompt_MultiDataset.ipynb

Large diffs are not rendered by default.

21 changes: 12 additions & 9 deletions docs/_includes/docs-langtest-pagination.html
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
<ul class="pagination owl-carousel pagination_big">
<li><a href="release_notes_2_1_0">2.1.0</a></li>
<li><a href="release_notes_2_0_0">2.0.0</a></li>
<li><a href="release_notes_1_10_0">1.10.0</a></li>
<li><a href="release_notes_1_9_0">1.9.0</a></li>
<li><a href="release_notes_1_8_0">1.8.0</a></li>
<li><a href="release_notes_1_7_0">1.7.0</a></li>
<li><a href="release_notes_1_6_0">1.6.0</a></li>
<li><a href="release_notes_1_5_0">1.5.0</a></li>
<li><a href="release_notes_1_4_0">1.4.0</a></li>
<li><a href="release_notes_1_3_0">1.3.0</a></li>
<li><a href="release_notes_1_2_0">1.2.0</a></li>
<li><a href="release_notes_1_1_0">1.1.0</a></li>
<li><a href="release_notes_1_0_0">1.0.0</a></li>
</ul>
<li><a href="release_notes_1_7_0">1.7.0</a></li>
<li><a href="release_notes_1_6_0">1.6.0</a></li>
<li><a href="release_notes_1_5_0">1.5.0</a></li>
<li><a href="release_notes_1_4_0">1.4.0</a></li>
<li><a href="release_notes_1_3_0">1.3.0</a></li>
<li><a href="release_notes_1_2_0">1.2.0</a></li>
<!-- <li><a href="release_notes_1_1_0">1.1.0</a></li>
<li><a href="release_notes_1_0_0">1.0.0</a></li> -->
</ul>
482 changes: 276 additions & 206 deletions docs/pages/docs/langtest_versions/latest_release.md

Large diffs are not rendered by default.

327 changes: 327 additions & 0 deletions docs/pages/docs/langtest_versions/release_notes_2_1_0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,327 @@
---
layout: docs
header: true
seotitle: LangTest - Deliver Safe and Effective Language Models | John Snow Labs
title: LangTest Release Notes
permalink: /docs/pages/docs/langtest_versions/release_notes_2_1_0
key: docs-release-notes
modify_date: 2024-04-02
---

<div class="h3-box" markdown="1">

## 2.1.0

## 📢 Highlights

John Snow Labs is thrilled to announce the release of LangTest 2.1.0! This update brings exciting new features and improvements designed to streamline your language model testing workflows and provide deeper insights.

- **🔗 Enhanced API-based LLM Integration:** LangTest now supports testing API-based Large Language Models (LLMs). This allows you to seamlessly integrate diverse LLM models with LangTest and conduct performance evaluations across various datasets.

- **📂 Expanded File Format Support:** LangTest 2.1.0 introduces support for additional file formats, further increasing its flexibility in handling different data structures used in LLM testing.

- **📊 Improved Multi-Dataset Handling:** We've made significant improvements in how LangTest manages multiple datasets. This simplifies workflows and allows for more efficient testing across a wider range of data sources.

- **🖥️ New Benchmarking Commands**: LangTest now boasts a set of new commands specifically designed for benchmarking language models. These commands provide a structured approach to evaluating model performance and comparing results across different models and datasets.

</div><div class="h3-box" markdown="1">

## 🔥 Key Enhancements:

### **🔗 Streamlined Integration and Enhanced Functionality for API-Based Large Language Models:**
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/Generic_API-Based_Model_Testing_Demo.ipynb)

This feature empowers you to seamlessly integrate virtually any language model hosted on an external API platform. Whether you prefer OpenAI, Hugging Face, or even custom vLLM solutions, LangTest now adapts to your workflow. `input_processor` and `output_parser` functions are not required for openai api compatible server.

#### Key Features:

- **Effortless API Integration:** Connect to any API system by specifying the API URL, parameters, and a custom function for parsing the returned results. This intuitive approach allows you to leverage your preferred language models with minimal configuration.

- **Customizable Parameters:** Define the URL, parameters specific to your chosen API, and a parsing function tailored to extract the desired output. This level of control ensures compatibility with diverse API structures.

- **Unparalleled Flexibility:** Generic API Support removes platform limitations. Now, you can seamlessly integrate language models from various sources, including OpenAI, Hugging Face, and even custom vLLM solutions hosted on private platforms.

#### How it Works:

**Parameters:**
Define the `input_processer` function for creating a payload and the `output_parser` function is used to extract the output from the response.

```python
GOOGLE_API_KEY = "<YOUR API KEY>"
model_url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key={GOOGLE_API_KEY}"

# headers
headers = {
"Content-Type": "application/json",
}

# function to create a payload
def input_processor(content):
return {"contents": [
{
"role": "user",
"parts": [
{
"text": content
}
]
}
]}

# function to extract output from model response
def output_parser(response):
try:
return response['candidates'][0]['content']['parts'][0]['text']
except:
return ""
```

To take advantage of this feature, users can utilize the following setup code:

```python
from langtest import Harness

# Initialize Harness with API parameters
harness = Harness(
task="question-answering",
model={
"model": {
"url": url,
"headers": headers,
"input_processor": input_processor,
"output_parser": output_parser,
},
"hub": "web",
},
data={
"data_source": "OpenBookQA",
"split": "test-tiny",
}
)
# Generate, Run and get Report
harness.generate().run().report()
```
![image](https://github.com/JohnSnowLabs/langtest/assets/23481244/9754c506-e715-4e2c-8b9d-dfd98f0695e5)


### 📂 Streamlined Data Handling and Evaluation

This feature streamlines your testing workflows by enabling LangTest to process a wider range of file formats directly.

#### Key Features:

- **Effortless File Format Handling:** LangTest now seamlessly ingests data from various file formats, including pickles (.pkl) in addition to previously supported formats. Simply provide the data source path in your harness configuration, and LangTest takes care of the rest.

- **Simplified Data Source Management**: LangTest intelligently recognizes the file extension and automatically selects the appropriate processing method. This eliminates the need for manual configuration, saving you time and effort.

- **Enhanced Maintainability**: The underlying code structure is optimized for flexibility. Adding support for new file formats in the future requires minimal effort, ensuring LangTest stays compatible with evolving data storage practices.

#### How it works:

```python
from langtest import Harness

harness = Harness(
task="question-answering",
model={
"model": "http://localhost:1234/v1/chat/completions",
"hub": "lm-studio",
},
data={
"data_source": "path/to/file.pkl", #
},
)
# generate, run and report
harness.generate().run().report()
```
### 📊 Multi-Dataset Handling and Evaluation
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Multiple_dataset.ipynb)

This feature empowers you to efficiently benchmark your language models across a wider range of datasets.

#### Key Features:

- **Effortless Multi-Dataset Testing:** LangTest now seamlessly integrates and executes tests on multiple datasets within a single harness configuration. This streamlined approach eliminates the need for repetitive setups, saving you time and resources.

- **Enhanced Fairness Evaluation**: By testing models across diverse datasets, LangTest helps identify and mitigate potential biases. This ensures your models perform fairly and accurately on a broader spectrum of data, promoting ethical and responsible AI development.

- **Robust Accuracy Assessment:** Multi-dataset support empowers you to conduct more rigorous accuracy testing. By evaluating models on various datasets, you gain a deeper understanding of their strengths and weaknesses across different data distributions. This comprehensive analysis strengthens your confidence in the model's real-world performance.

#### How it works:

Initiate the Harness class
```python
harness = Harness(
task="question-answering",
model={"model": "gpt-3.5-turbo-instruct", "hub": "openai"},
data=[
{"data_source": "NQ-open", "split": "test-tiny",},
{"data_source": "MedQA", "split": "test-tiny"},
{"data_source": "LogiQA", "split": "test-tiny"},
],
)
```
Configure the accuracy tests in Harness class
```python
harness.configure(
{
"tests": {
"defaults": {"min_pass_rate": 0.65},

"accuracy": {
"llm_eval": {"min_score": 0.60},
"min_rouge1_score": {"min_score": 0.60},
"min_rouge2_score": {"min_score": 0.60},
"min_rougeL_score": {"min_score": 0.60},
"min_rougeLsum_score": {"min_score": 0.60},
},
}
}
)
```
harness.generate() generates testcases, .run() executes them, and .report() compiles results.
```python
harness.generate().run().report()
```
![image](https://github.com/JohnSnowLabs/langtest/assets/23481244/0d48be2f-e5bc-4971-b0a1-2756a10d3f24)

### 🖥️ Streamlined Evaluation Workflows with Enhanced CLI Commands
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/benchmarks/Langtest_Cli_Eval_Command.ipynb)

LangTest's evaluation capabilities, focusing on report management and leaderboards. These enhancements empower you to:

- **Streamlined Reporting and Tracking:** Effortlessly save and load detailed evaluation reports directly from the command line using `langtest eval`, enabling efficient performance tracking and comparative analysis over time, with manual file review options in the `~/.langtest` or `./.langtest` folder.

- **Enhanced Leaderboards:** Gain valuable insights with the new langtest show-leaderboard command. This command displays existing leaderboards, providing a centralized view of ranked model performance across evaluations.

- **Average Model Ranking:** Leaderboard now include the average ranking for each evaluated model. This metric provides a comprehensive understanding of model performance across various datasets and tests.

### How it works:

First, create the `parameter.json` or `parameter.yaml` in the working directory

**JSON Format**
```json
{
"task": "question-answering",
"model": {
"model": "google/flan-t5-base",
"hub": "huggingface"
},
"data": [
{
"data_source": "MedMCQA"
},
{
"data_source": "PubMedQA"
},
{
"data_source": "MMLU"
},
{
"data_source": "MedQA"
}
],
"config": {
"model_parameters": {
"max_tokens": 64,
"device": 0,
"task": "text2text-generation"
},
"tests": {
"defaults": {
"min_pass_rate": 0.70
},
"robustness": {
"add_typo": {
"min_pass_rate": 0.70
}
}
}
}
}
```
**Yaml Format**
```yaml
task: question-answering
model:
model: google/flan-t5-base
hub: huggingface
data:
- data_source: MedMCQA
- data_source: PubMedQA
- data_source: MMLU
- data_source: MedQA
config:
model_parameters:
max_tokens: 64
device: 0
task: text2text-generation
tests:
defaults:
min_pass_rate: 0.70
robustness:
add_typo:
min_pass_rate: 0.7

```
And open the terminal or cmd in your system
```bash
langtest eval --model <your model name or endpoint> \
--hub <model hub like hugging face, lm-studio, web ...> \
-c < your configuration file like parameter.json or parameter.yaml>
```
Finally, we can know the leaderboard and rank of the model.
![image](https://github.com/JohnSnowLabs/langtest/assets/23481244/a405d0c6-5ef1-4efb-924c-0ba8667ebe43)

----

To visualize the leaderboard anytime using the CLI command
```bash
langtest show-leaderboard
```
![image](https://github.com/JohnSnowLabs/langtest/assets/23481244/f357c173-e4b1-4dc8-86ad-98438046b89c)

## 📒 New Notebooks

{:.table2}
| Notebooks | Colab Link |
|--------------------|-------------|
| Generic API-based Model Testing | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/Generic_API-Based_Model_Testing_Demo.ipynb)|
| Multi-Dataset | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Multiple_dataset.ipynb) |
| Langtest Eval Cli Command | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/benchmarks/Langtest_Cli_Eval_Command.ipynb) |


## 🐛 Fixes

- Fixed multi-dataset support for accuracy task [#998]
- Fixed bugs in langtest package [#1003][#1004]


## ⚡ Enhancements
- Improved the error handling in Harness run method [#990]
- Websites Updates [#1001]
- Updated new version for dependencies [#992]
- Improved the data augmentation for Question-Answering task [#991]

## What's Changed

* Feautre/integration with web api by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/986
* Refactor TestFactory class to handle exceptions in async tests by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/990
* data augmentation support for question-answering task by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/991
* Updated dependencies by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/992
* Fix/implement the multiple dataset support for accuracy tests by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/998
* Feature/add support for other file formats by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/993
* Bug Fix: Generated results are none by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/1000
* Feature/implement load & save for benchmark reports by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/999
* Fix/bug fixes langtest 2 1 0 rc1 by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/1003
* website updates by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/1001
* Fix/bug fixes langtest 2 1 0 rc1 by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/1004
* Release/2.0.1 by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/1005


**Full Changelog**: https://github.com/JohnSnowLabs/langtest/compare/2.0.0...2.1.0

## ⚒️ Previous Versions
</div>
{%- include docs-langtest-pagination.html -%}
Loading

0 comments on commit 9e84ae9

Please sign in to comment.