Release/2.4.0 #1122

chakravarthik27 · 2024-09-20T08:35:54Z

📢 Highlights

John Snow Labs is excited to announce the release of LangTest 2.4.0! This update introduces cutting-edge features and resolves key issues further to enhance model testing and evaluation across multiple modalities.

🔗 Multimodality Testing with VQA Task: We are thrilled to introduce multimodality testing, now supporting Visual Question Answering (VQA) tasks! With the addition of 10 new robustness tests, you can now perturb images to challenge and assess your model’s performance across visual inputs.

📝 New Robustness Tests for Text Tasks: LangTest 2.4.0 comes with two new robustness tests, AddNewLines and AddTabs, applicable to text classification, question-answering, and summarization tasks. These tests push your models to handle text variations and maintain accuracy.

🔄 Improvements to Multi-Label Text Classification: We have resolved accuracy and fairness issues affecting multi-label text classification evaluations, ensuring more reliable and consistent results.

🛡 Basic Safety Evaluation with Prompt Guard: We have added basic safety evaluation tests using the prompt_guard model, which provides initial layers of protection to identify and mitigate harmful or unintended responses from your language models.

🛠 NER Accuracy Test Fixes: LangTest 2.4.0 addresses and resolves issues within the Named Entity Recognition (NER) accuracy tests, improving reliability in performance assessments for NER tasks.

🔒 Security Enhancements: We have upgraded various dependencies to address security vulnerabilities, making LangTest more secure for users.

…w lines.

…ewlines-test-in-robustness-category Added: implemeted the breaking sentence by newline in robustness.

…ement-the-addtabs-test-in-robustness-category

…cumentation

…abs-test-in-robustness-category Feature/implement the addtabs test in robustness category

…ement-the-support-for-multimodal-with-new-vqa-task

…ed upto generating testcases.

…modal VQA task

… class

… perturbations

… class for huggingface DataSource.

…nstallation of torch

…rue and y_pred

…s-for-ner-task Fix/error in accuracy tests for ner task

Update transformer's version to 4.44.2

…ement-the-support-for-multimodal-with-new-vqa-task

…ort-for-multimodal-with-new-vqa-task Feature/implement the support for multimodal with new vqa task

…cation in text-classification task and predictions in ner task

…s-for-multi-label-classification Fix/AttributeError in accuracy tests for multi label classification

This commit refactors the PromptGuard class in the modelhandler/promptguard.py module. The changes include: - Simplifying the initialization process by using a singleton pattern - Loading the model and tokenizer from Hugging Face - Preprocessing the input text to remove spaces and mitigate prompt injection tactics - Calculating class probabilities for a single or batch of texts - Adding methods to get jailbreak scores and indirect injection scores for a single input text or a batch of texts - Processing texts in batches to improve efficiency The commit also includes changes in the safety.py module: - Importing the PromptGuard class from the modelhandler/promptguard.py module - Replacing the pipeline usage with the PromptGuard class to get indirect injection scores Lastly, the commit includes changes in the output.py and sample.py modules: - Adding a greater than or equal to comparison method in the MaxScoreOutput class - Updating the comparison method in the QASample class to use the new comparison method in MaxScoreOutput

…lassification task and predictions in ner task

…s-for-multi-label-classification Refactor fairness test to handle multi-label classification

…ests-with-promptguard Feature/enhance safety tests with promptguard

chakravarthik27 added 30 commits September 14, 2024 14:58

Added: implemeted the breaking sentence by newline in robustness.

f331b69

refactor the add_new_lines and while random selection of number of ne…

f274765

…w lines.

parameter: number_of_lines -> max_lines.

0414f71

Merge pull request #1109 from JohnSnowLabs/feature/implement-the-addn…

a3986b4

…ewlines-test-in-robustness-category Added: implemeted the breaking sentence by newline in robustness.

Implemented the add_tabs test in robustness category

3160b1d

Merge remote-tracking branch 'origin/release/2.4.0' into feature/impl…

8179145

…ement-the-addtabs-test-in-robustness-category

implemented: basic structured to handle visualQA

c8a9511

Refactor VisualQASample class to include additional attributes and do…

f7b53e6

…cumentation

Refactor llm_modelhandler.py to include PretrainedModelForVisualQA class

6eec7ca

Refactor VisualQA class to fix typo in base class name

b95ecf3

Merge pull request #1110 from JohnSnowLabs/feature/implement-the-addt…

ca2f9d6

…abs-test-in-robustness-category Feature/implement the addtabs test in robustness category

Merge remote-tracking branch 'origin/release/2.4.0' into feature/impl…

adf18db

…ement-the-support-for-multimodal-with-new-vqa-task

updated: image handling while loading dataset.

d3e6fa5

implemented the different tests under robusntess category and support…

3ee5f8f

…ed upto generating testcases.

Refactor image handling in robustness tests

3dd6770

Refactor image handling in robustness tests and add support for multi…

d95e558

…modal VQA task

Refactor image handling in robustness tests and update VisualQASample…

ebd7bfd

… class

Refactor image handling in robustness tests and exclude image-related…

4538490

… perturbations

fixed: format issues.

41f0db2

Refactor image handling in robustness tests and remove commented code

3521927

Refactor image handling in robustness tests and update VisualQASample…

a87e96c

… class for huggingface DataSource.

- added new tests in image robustness.

04e18e3

Add pillow library to pyproject.toml

8039ef8

Update transformers version to 4.44.2

febf855

Update transformers version to 4.43.1

101305a

Update pyproject.toml to force CPU installation of torch

96cc4f1

Update accelerate version to 0.22.0

d64312d

Update accelerate version to 0.33.0 and pyproject.toml to force CPU i…

4780cf0

…nstallation of torch

Now handles the multi-label in accuracy tests.

0c7c9b0

Refactor accuracy tests to handle multi-label classification

54f235d

chakravarthik27 added 25 commits September 17, 2024 11:49

Update transformers version to 4.44.2 and mlflow version to 2.16.2

2d0f0d8

Refactor calculate_f1_score function to handle different types of y_t…

3745e6a

…rue and y_pred

formatted.

bcdfc92

Merge pull request #1116 from JohnSnowLabs/fix/error-in-accuracy-test…

b0a1a26

…s-for-ner-task Fix/error in accuracy tests for ner task

Merge pull request #1112 from JohnSnowLabs/update/fixing-security-issues

d3a4663

Update transformer's version to 4.44.2

Merge remote-tracking branch 'origin/release/2.4.0' into feature/impl…

a5ae26a

…ement-the-support-for-multimodal-with-new-vqa-task

Refactor security.py to add new security checks

10aa4b3

resolve OutofMemory issues

b29f9dd

updated the notebook

16a3aa5

Update pillow version to 10.0.0 and make it a required dependency

b337d2b

Merge pull request #1111 from JohnSnowLabs/feature/implement-the-supp…

67c641d

…ort-for-multimodal-with-new-vqa-task Feature/implement the support for multimodal with new vqa task

Refactor typing imports in accuracy.py and safety.py

62b77b1

Refactor prepare_model_response method to handle multi-label classifi…

409cb96

…cation in text-classification task and predictions in ner task

fixed: circular import errors

d98a9d3

Refactor test type in safety.py and add decimal formatting in output.py

7a58067

Refactor multi-label handling in TestResultManager

5e482e1

fixed: formatted issue

e9c54e9

Merge pull request #1118 from JohnSnowLabs/fix/error-in-accuracy-test…

4664bbf

…s-for-multi-label-classification Fix/AttributeError in accuracy tests for multi label classification

Refactor fairness test to handle multi-label classification in text c…

092b3e9

…lassification task and predictions in ner task

fixed: format and liniting issues.

f362a62

Merge pull request #1121 from JohnSnowLabs/fix/error-in-fairness-test…

7e2b232

…s-for-multi-label-classification Refactor fairness test to handle multi-label classification

Merge pull request #1119 from JohnSnowLabs/feature/enhance-security-t…

d89477a

…ests-with-promptguard Feature/enhance safety tests with promptguard

Refactor security.py: Remove unused classes and methods

da9f58b

update version to 2.4.0 in pyproject.toml for release

90e902f

chakravarthik27 self-assigned this Sep 20, 2024

jailbreak and injection tests supports for text-classification.

551cc12

chakravarthik27 requested a review from RakshitKhajuria September 22, 2024 13:41

RakshitKhajuria approved these changes Sep 22, 2024

View reviewed changes

chakravarthik27 merged commit 1b9c7db into main Sep 22, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release/2.4.0 #1122

Release/2.4.0 #1122

chakravarthik27 commented Sep 20, 2024

Release/2.4.0 #1122

Release/2.4.0 #1122

Conversation

chakravarthik27 commented Sep 20, 2024

📢 Highlights