Question about ignoration of <|endoftext|> #18

lxysl · 2024-05-08T03:29:46Z

Thanks for your nice work. I have a question about whether to predict the <STOP> token. As seen in the original LLaVA paper, they predict these stop tokens:

and their preprocess codes are:
https://github.com/haotian-liu/LLaVA/blob/3e337ad269da3245643a2724a1d694b5839c37f9/llava/train/train.py#L470-L481
but your codes are, which seem not to predict these stop tokens:

llava-phi/llava_phi/train/train.py

Lines 363 to 370 in 5cb6ed1

    
           if has_image: 
        
               round_len = len(tokenizer_image_token(rou, tokenizer)) + 1  # +1 for <|endoftext|> 
        
               instruction_len = len(tokenizer_image_token(parts[0], tokenizer)) - 1 
        
           else: 
        
               round_len = len(tokenizer(rou).input_ids) + 1  # +1 for <|endoftext|> 
        
               instruction_len = len(tokenizer(parts[0]).input_ids) - 1 
        
           target[cur_len: cur_len + instruction_len] = IGNORE_INDEX

Could you please give some explanations or insights into these differences?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about ignoration of <|endoftext|> #18

Question about ignoration of <|endoftext|> #18

lxysl commented May 8, 2024 •

edited

Loading

Question about ignoration of <|endoftext|> #18

Question about ignoration of <|endoftext|> #18

Comments

lxysl commented May 8, 2024 • edited Loading

lxysl commented May 8, 2024 •

edited

Loading