[P1] Cannot load trained model anymore - "type must be tuple of ints,but got NoneType" #125

chris-aeviator · 2024-08-02T06:01:38Z

After updating pyreft recently I'm encountering errors when loading a trained model. This applies to newly trained models as well as prev. trained models. I'm loading from disk.

the error happens due to seemingly my config not beeing read correctly. The error originates since kwargs['low_rank_dimension'] is None and if I set it to my correct value of e.g. 8 or 12 and the intervention type to my class, the model loads.

Name: pyvene
Version: 0.1.2

Name: pyreft
Version: 0.0.6 --> there seems to be only a version 0.0.5 ?! can't explain this, maybe due to direct code edits in site-packages?

File ~/micromamba/envs/trtf/lib/python3.9/site-packages/pyreft/interventions.py:40, in LoreftIntervention.__init__(self, **kwargs)
---> [40](site-packages/pyreft/interventions.py:40) rotate_layer = LowRankRotateLayer(
     [41](site-packages/pyreft/interventions.py:41)     self.embed_dim, kwargs["low_rank_dimension"], init_orth=True)
     [42](site-packages/pyreft/interventions.py:42) self.rotate_layer = torch.nn.utils.parametrizations.orthogonal(rotate_layer)
     [43](site-packages/pyreft/interventions.py:43) self.learned_source = torch.nn.Linear(
     [44](site-packages/pyreft/interventions.py:44)     self.embed_dim, kwargs["low_rank_dimension"]).to(
     [45](site-packages/pyreft/interventions.py:45)     kwargs["dtype"] if "dtype" in kwargs else torch.bfloat16)

File ~/micromamba/envs/trtf/lib/python3.9/site-packages/pyreft/interventions.py:19, in LowRankRotateLayer.__init__(self, n, m, init_orth)
     [17](site-packages/pyreft/interventions.py:17) super().__init__()
     [18](site-packages/pyreft/interventions.py:18) # n > m
---> [19](site-packages/pyreft/interventions.py:19) self.weight = torch.nn.Parameter(torch.empty(n, m), requires_grad=True)

     [21](site-packages/pyreft/interventions.py:21)     torch.nn.init.orthogonal_(self.weight)

sample config file

{
  "intervention_constant_sources": [
    true
  ],
  "intervention_dimensions": [
    4096
  ],
  "intervention_types": [
    "<class 'transforms.autobrew.trft.raft.subspace.SubloreftIntervention'>"
  ],
  "mode": "parallel",
  "representations": [
    [
      16,
      "block_output",
      "pos",
      1,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null,
      null
    ]
  ],
  "sorted_keys": [
    "layer.16.comp.block_output.unit.pos.nunit.1#0"
  ],
  "transformers_version": "4.43.3"
}

The text was updated successfully, but these errors were encountered:

dhruvbpai · 2024-08-04T07:43:45Z

Could you clarify how you fixed this error?

frankaging · 2024-08-04T07:58:51Z

@chris-aeviator Thanks for raising the issue. Could you try to install from the source code directly, and see if you are still getting the error.

pip install git+https://github.com/stanfordnlp/pyreft.git

dhruvbpai · 2024-08-04T08:09:59Z

I am still getting the error on my end after installing from source. Replicating Alpaca training script with the following additional tidbit after:
@masonwang025

reft_model.save(
    save_directory=training_args.output_dir,
)

new_model = pyreft.ReftModel.load(
    training_args.output_dir, model
)

frankaging · 2024-08-05T07:38:40Z

@chris-aeviator @dhruvbpai Hey, thanks for the follow-ups. I tested the following code on my end by installing from the source for both pyvene and pyreft.

import torch, pyvene, pyreft
import transformers

model_name_or_path = "meta-llama/Meta-Llama-3-8B"
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name_or_path, torch_dtype=torch.bfloat16, device_map="cuda")

class DummyIntervention(pyvene.SourcelessIntervention):

    def forward(self, base, source=None, subspaces=None):
        return base

reft_config = pyreft.ReftConfig(representations=[{
    "layer": 15, "component": "block_output",
    "low_rank_dimension": 1,
    "intervention": DummyIntervention(
        embed_dim=model.config.hidden_size, low_rank_dimension=1)} for l in [15]])
reft_model = pyreft.get_reft_model(model, reft_config)
reft_model.set_device("cuda")
reft_model.print_trainable_parameters()

reft_model.save("./tmp_test")

pyreft.ReftModel.load("./tmp_test", model)

On my end, this block of code can run without an error. Could you see if this work? And could you check if your code is doing something like this?

Thanks.

frankaging · 2024-08-05T07:40:31Z

One thing to note is that for the pyreft.ReftConfig, since you are using a customized intervention, please make sure you pass in your low_rank_dimension outside your intervention initialization.

reft_config = pyreft.ReftConfig(representations=[{
    "layer": 15, "component": "block_output",
    "low_rank_dimension": 1, # <--- this is dummy but needed.
    "intervention": DummyIntervention(
        embed_dim=model.config.hidden_size, 
        low_rank_dimension=1 # <--- this is for intervention init.
    )} for l in [15]])

frankaging · 2024-08-06T02:10:19Z

++ Adding more evidence that this should work: here is our huggingface live demo for the emoji intervention <link>.

This demo loads a saved model from huggingface model hub, and respond to user requests. The code for loading interventions can be found here.

chris-aeviator · 2024-08-06T07:07:53Z

I can’t speak for loading from HF, I’m loading from disk.

…

Am 06.08.2024 um 04:15 schrieb Zen ***@***.***>: ++ Adding more evidence that this should work: here is our huggingface live demo for the emoji intervention <link>. This demo loads a saved model from huggingface model hub, and respond to user requests. The code for loading interventions can be found here. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

frankaging · 2024-08-06T07:33:21Z

I can’t speak for loading from HF, I’m loading from disk.
…
Am 06.08.2024 um 04:15 schrieb Zen @.***>: ++ Adding more evidence that this should work: here is our huggingface live demo for the emoji intervention . This demo loads a saved model from huggingface model hub, and respond to user requests. The code for loading interventions can be found here. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

Hey @chris-aeviator, when you create your reft config, did you specify the low rank dimension as the following example in two places?

reft_config = pyreft.ReftConfig(representations=[{
    "layer": 15, "component": "block_output",
    "low_rank_dimension": 1, # <--- this is dummy but needed.
    "intervention": DummyIntervention(
        embed_dim=model.config.hidden_size, 
        low_rank_dimension=1 # <--- this is for intervention init.
    )} for l in [15]])

Thanks.

frankaging · 2024-08-06T08:59:01Z

I am still getting the error on my end after installing from source. Replicating Alpaca training script with the following additional tidbit after: @masonwang025
reft_model.save(
    save_directory=training_args.output_dir,
)

new_model = pyreft.ReftModel.load(
    training_args.output_dir, model
)

Hey @dhruvbpai, I just checked in a fix for this issue. Please (1) pull from the ToT and (2) pip install again, and retry the alpaca setting. Let me know if the issue still exists.

For the change, I fixed it by adding the change above^ here (i.e., the dummy specification of low_rank_dimension is needed for now):
https://github.com/stanfordnlp/pyreft/blob/main/examples/alpaca/train.py#L107

I also changed the train.py to test model loading at the end. I successfully run the script with the following command after my change:

python train.py --model_name_or_path yahma/llama-7b-hf \
	--data_path ./alpaca_data.json \
	--output_dir ./test/ \
	--layers "8;19" \
	--rank 4 \
	--position "f1+l1" \
	--num_train_epochs 1 \
	--per_device_train_batch_size 4 \
	--per_device_eval_batch_size 4 \
	--gradient_accumulation_steps 8 \
	--evaluation_strategy "no" \
	--save_strategy "no" \
	--learning_rate 2e-5 \
	--weight_decay 0. \
	--warmup_ratio 0.03 \
	--lr_scheduler_type "cosine" \
	--logging_steps 1 \
	--max_n_train_example 100

"""
{'loss': 1.3949, 'grad_norm': 0.9871354103088379, 'learning_rate': 2e-05, 'epoch': 0.32}                                                                                   
{'loss': 1.2105, 'grad_norm': 1.0076665878295898, 'learning_rate': 1e-05, 'epoch': 0.64}                                                                                   
{'loss': 1.1549, 'grad_norm': 1.0621919631958008, 'learning_rate': 0.0, 'epoch': 0.96}                                                                                     
{'train_runtime': 13.6936, 'train_samples_per_second': 7.303, 'train_steps_per_second': 0.219, 'train_loss': 1.253439982732137, 'epoch': 0.96}                             
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.64s/it]
Directory './test/' already exists.
WARNING:root:The key is provided in the config. Assuming this is loaded from a pretrained module.
WARNING:root:The key is provided in the config. Assuming this is loaded from a pretrained module.
wandb: \ 0.012 MB of 0.036 MB uploaded
wandb: Run history:
wandb:         train/epoch ▁▅██
wandb:   train/global_step ▁▅██
wandb:     train/grad_norm ▁▃█
wandb: train/learning_rate █▅▁
wandb:          train/loss █▃▁
wandb: 
wandb: Run summary:
wandb:               total_flos 0.0
wandb:              train/epoch 0.96
wandb:        train/global_step 3
wandb:          train/grad_norm 1.06219
wandb:      train/learning_rate 0.0
wandb:               train/loss 1.1549
wandb:               train_loss 1.25344
wandb:            train_runtime 13.6936
wandb: train_samples_per_second 7.303
wandb:   train_steps_per_second 0.219
"""

chris-aeviator changed the title ~~Cannot load model trained anymore - "type must be tuple of ints,but got NoneType"~~ Cannot load trained model anymore - "type must be tuple of ints,but got NoneType" Aug 2, 2024

frankaging self-assigned this Aug 4, 2024

frankaging changed the title ~~Cannot load trained model anymore - "type must be tuple of ints,but got NoneType"~~ [P0] Cannot load trained model anymore - "type must be tuple of ints,but got NoneType" Aug 4, 2024

frankaging added the question Further information is requested label Aug 5, 2024

frankaging changed the title ~~[P0] Cannot load trained model anymore - "type must be tuple of ints,but got NoneType"~~ [P1] Cannot load trained model anymore - "type must be tuple of ints,but got NoneType" Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P1] Cannot load trained model anymore - "type must be tuple of ints,but got NoneType" #125

[P1] Cannot load trained model anymore - "type must be tuple of ints,but got NoneType" #125

chris-aeviator commented Aug 2, 2024 •

edited

Loading

dhruvbpai commented Aug 4, 2024

frankaging commented Aug 4, 2024

dhruvbpai commented Aug 4, 2024 •

edited

Loading

frankaging commented Aug 5, 2024

frankaging commented Aug 5, 2024 •

edited

Loading

frankaging commented Aug 6, 2024 •

edited

Loading

chris-aeviator commented Aug 6, 2024 via email

frankaging commented Aug 6, 2024

frankaging commented Aug 6, 2024 •

edited

Loading

[P1] Cannot load trained model anymore - "type must be tuple of ints,but got NoneType" #125

[P1] Cannot load trained model anymore - "type must be tuple of ints,but got NoneType" #125

Comments

chris-aeviator commented Aug 2, 2024 • edited Loading

dhruvbpai commented Aug 4, 2024

frankaging commented Aug 4, 2024

dhruvbpai commented Aug 4, 2024 • edited Loading

frankaging commented Aug 5, 2024

frankaging commented Aug 5, 2024 • edited Loading

frankaging commented Aug 6, 2024 • edited Loading

chris-aeviator commented Aug 6, 2024 via email

frankaging commented Aug 6, 2024

frankaging commented Aug 6, 2024 • edited Loading

chris-aeviator commented Aug 2, 2024 •

edited

Loading

dhruvbpai commented Aug 4, 2024 •

edited

Loading

frankaging commented Aug 5, 2024 •

edited

Loading

frankaging commented Aug 6, 2024 •

edited

Loading

frankaging commented Aug 6, 2024 •

edited

Loading