-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
paddle2.4.1与paddlenlp的兼容性问题 #4593
Comments
您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档、常见问题、历史Issue、AI社区来寻求解答。祝您生活愉快~ Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API,FAQ,Github Issue and AI community to get the answer.Have a nice day! |
你安装的是PaddlePaddle cpu版本,建议安装GPU版本,pip install paddlepaddle-gpu==2.4.1,参考https://www.paddlepaddle.org.cn/ |
在aistudio上调整为: |
另外,安装完成后,有时候会出现以下的报错。 |
pip list |grep paddle 看下是否正确安装 可能是paddle安装到了不同的python版本上; 如果有多个版本的paddle,建议卸载其他,只保留想要的paddle版本 |
在aistudio上,运行: |
确实是10.1,应该是没有安装成功。 查过,仅有一个paddle版本。 |
这个报错是paddle没有安装成功,已经遇到过了
先nvcc -V看下cuda版本是否满足条件,然后根据cuda版本安装paddle 比如 cuda11.2: |
按照上述方法,对代码重构如下: train_data_loader=create_dataloader(train_ds, trans_fn = trans_func, mode="train", batch_size = batch_size, batchify_fn= batchify_fn) 训练轮次epochs = 1 训练过程中保存模型参数的文件夹saved_dir="skep_ckpt" len(train_data_loader)一轮训练所需要的step数num_training_steps = len(train_data_loader) * epochs 误差函数criterion = paddle.nn.loss.CrossEntropyLoss() 优化器optimizer = paddle.optimizer.AdamW( accuracy评价指标metric = paddle.metric.Accuracy() 6,开启训练#CUDA_VISIBLE_DEVICES=0,1,2,3
paddle相关版本的安装应该是没问题了,但是还是出现如题的问题,一开始训练GPU就爆了,所以感觉还是paddle 2.4.1与paddlenlp的兼容性问题。采用的环境是BML CodeLab(V100 32G显存)。 |
抱歉,aistudio项目地址如下: |
@VBPython 代码有一个小问题, |
bug描述 Describe the Bug
在Aistudio上,我使用paddle2.2.2运行ernie的SKEP(paddlenlp)预训练模型时,模型训练速度为0.8step/s,在同等条件下(V100 32G显存),paddle升级到2.4.1,就会运行非常慢,共运行过四次,有一次是0.01step/s,有三次是直接内存溢出。请帮忙排查一下是否是paddle 2.4.1和paddlenlp 2.5.0之间的兼容性方面存在问题或者是上述框架与Aistudio之间存在兼容性问题,因为按照训练惯例,我们在进行深度学习训练前一般都是将paddle和paddlenlp upgrade到最新版本的,因此,如果出现兼容性问题,对训练影响会比较大,因此,还请排查,谢谢。
下附运行代码:
'''python
!pip install --upgrade paddlenlp -i https://pypi.tuna.tsinghua.edu.cn/simple
!pip install --upgrade paddlepaddle==2.4.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
from paddlenlp.datasets import load_dataset
train_ds, dev_ds, test_ds = load_dataset("chnsenticorp", splits=["train", "dev", "test"])
from paddlenlp.transformers import SkepForSequenceClassification, SkepTokenizer
指定模型名称,一键加载模型
model = SkepForSequenceClassification.from_pretrained(pretrained_model_name_or_path="skep_ernie_1.0_large_ch", num_classes=len(train_ds.label_list))
同样地,通过指定模型名称一键加载对应的Tokenizer,用于处理文本数据,如切分token,转token_id等。
tokenizer = SkepTokenizer.from_pretrained(pretrained_model_name_or_path="skep_ernie_1.0_large_ch")
import os
from functools import partial
import numpy as np
import paddle
import paddle.nn.functional as F
from paddlenlp.data import Stack, Tuple, Pad
from utils import create_dataloader
def convert_example(example,
tokenizer,
max_seq_length=512,
is_test=False):
批量数据大小
batch_size = 32
文本序列最大长度
max_seq_length = 256
将数据处理成模型可读入的数据格式
trans_func = partial(
convert_example,
tokenizer=tokenizer,
max_seq_length=max_seq_length)
将数据组成批量式数据,如
将不同长度的文本序列padding到批量式数据中最大长度
将每条数据label堆叠在一起
batchify_fn = lambda samples, fn=Tuple(
Pad(axis=0, pad_val=tokenizer.pad_token_id), # input_ids
Pad(axis=0, pad_val=tokenizer.pad_token_type_id), # token_type_ids
Stack() # labels
): [data for data in fn(samples)]
train_data_loader = create_dataloader(
train_ds,
mode='train',
batch_size=batch_size,
batchify_fn=batchify_fn,
trans_fn=trans_func)
dev_data_loader = create_dataloader(
dev_ds,
mode='dev',
batch_size=batch_size,
batchify_fn=batchify_fn,
trans_fn=trans_func)
import time
from utils import evaluate
训练轮次
epochs = 1
训练过程中保存模型参数的文件夹
ckpt_dir = "skep_ckpt"
len(train_data_loader)一轮训练所需要的step数
num_training_steps = len(train_data_loader) * epochs
Adam优化器
optimizer = paddle.optimizer.AdamW(
learning_rate=2e-5,
parameters=model.parameters())
交叉熵损失函数
criterion = paddle.nn.loss.CrossEntropyLoss()
accuracy评价指标
metric = paddle.metric.Accuracy()
开启训练
global_step = 0
tic_train = time.time()
for epoch in range(1, epochs + 1):
for step, batch in enumerate(train_data_loader, start=1):
input_ids, token_type_ids, labels = batch
# 喂数据给model
logits = model(input_ids, token_type_ids)
# 计算损失函数值
loss = criterion(logits, labels)
# 预测分类概率值
probs = F.softmax(logits, axis=1)
# 计算acc
correct = metric.compute(probs, labels)
metric.update(correct)
acc = metric.accumulate()
'''
其他补充信息 Additional Supplementary Information
No response
The text was updated successfully, but these errors were encountered: