Chunking for confidence if context exceeds context length #635

rajasbansal · 2023-11-17T00:12:17Z

In this PR, we will chunk confidence prompts in order to support getting confidence from a model having context length smaller than the context length supported by the generation model. We do this by chunking the document into smaller chunks and sending the prompt along with the smaller chunks of the document. we support multiple aggregation functions over the confidence chunks.

For extraction, max makes more sense than mean which might make more sense for a classification task

nihit

lgtm

nihit · 2023-11-20T17:25:36Z

src/autolabel/labeler.py

+        """Returns the number of tokens in the prompt"""
+        return len(self.confidence_tokenizer.encode(str(inp)))
+
+    def chunk_string(self, inp: str, chunk_size: int) -> List[str]:


move to utils.py ?

uses the tokenizer, for encoding and decoding so kept it here

Chunking for confidence

d98abf9

rajasbansal requested review from nihit, yadavsahil197 and Abhinav-Naikawadi November 17, 2023 00:12

nihit requested a review from DhruvaBansal00 November 17, 2023 05:30

rajasbansal added 5 commits November 17, 2023 00:09

tokenize using huggingface

d95a64d

remove print

e9875f4

empty prompt just replaces key to chunk

3bfcc54

empty prompt just replaces key to chunk

74de9fa

empty prompt just replaces key to chunk

28b8fe5

nihit approved these changes Nov 20, 2023

View reviewed changes

rajasbansal merged commit d4b0c0c into main Nov 20, 2023
2 checks passed

rajasbansal deleted the confidence_chunking branch November 20, 2023 19:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunking for confidence if context exceeds context length #635

Chunking for confidence if context exceeds context length #635

rajasbansal commented Nov 17, 2023 •

edited

Loading

nihit left a comment

nihit Nov 20, 2023

rajasbansal Nov 20, 2023

Chunking for confidence if context exceeds context length #635

Chunking for confidence if context exceeds context length #635

Conversation

rajasbansal commented Nov 17, 2023 • edited Loading

nihit left a comment

Choose a reason for hiding this comment

nihit Nov 20, 2023

Choose a reason for hiding this comment

rajasbansal Nov 20, 2023

Choose a reason for hiding this comment

rajasbansal commented Nov 17, 2023 •

edited

Loading