Zero.Init() for cpu-offloading of T5 models #3880

apzl · 2023-07-05T11:46:30Z

apzl
Jul 5, 2023

I'm facing -

RuntimeError: `<class 'transformers.models.t5.modeling_t5.T5DenseGatedActDense'>' was not properly set up for sharding by zero.Init(). A subclass of torch.nn.Module must be defined before zero.Init() where an instance of the class is created.

when trying to initialize model like -

    with deepspeed.zero.Init(config=ds_config):
        model = AutoModelForSeq2SeqLM.from_pretrained(flan-t5-xl)

I couldn't find any proper eamples for using Zero.Init() for cpu-offloading. Also I want to run deepspeed training as a function, without using the launcher. Can someone help me with examples on how to run this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero.Init() for cpu-offloading of T5 models #3880

{{title}}

Replies: 0 comments

Select a reply

Zero.Init() for cpu-offloading of T5 models #3880

apzl Jul 5, 2023

Replies: 0 comments

apzl
Jul 5, 2023