上一年12月2日,Pytorch团队将研制中的1.14版别,改成了2.0,官宣了Pytorch新的一个大版别的诞生。本年3月19日,Pytorch2.0总算从Preview (Nightly)变成了Stable版别。

笔者最近才有空,来试一下Pytorch2中强壮的一行代码让练习提速的能力!

我的系统是Ubuntu 22.04,显卡是RTX3060,CUDA版别12.0

下面来体会一下Pytorch2.0吧!

环境准备

创建一个新的conda环境:

conda create -n pt2 python=3.10

目前(2023.4)Pytorch2.0的加快功用,还不支撑python3.11。所以我们的环境要指定运用python3.10。

进入这个环境:

conda activate pt2

装置Pytorch2.0:

pip3 install torch torchvision torchaudio –index-urldownload.pytorch.org/whl/cu118

这个是根据的Pytorch官方指引中的命令。目前知友11.7和11.8两个CUDA版别,尽管我本机是12.0的CUDA,可是没关系。

项目准备

还是运用我之前写过多次文章的那个BERT项目。

GPU服务器初体会:从零建立Pytorch GPU开发环境

下面是一些必要的依靠。

conda install scikit-learn boto3 regex tqdm chardet

在Pytorch1.x时代,该项目练习一次的时间为43分钟。

Epoch [1/3]
Iter:      0,  Train Loss:   2.4,  Train Acc:  9.38%,  Val Loss:   2.4,  Val Acc:  9.08%,  Time: 0:00:26 *
Iter:    100,  Train Loss:  0.37,  Train Acc: 89.06%,  Val Loss:  0.36,  Val Acc: 89.24%,  Time: 0:01:31 *
Iter:    200,  Train Loss:  0.39,  Train Acc: 87.50%,  Val Loss:  0.32,  Val Acc: 90.63%,  Time: 0:02:36 *
Iter:    300,  Train Loss:  0.33,  Train Acc: 89.84%,  Val Loss:  0.32,  Val Acc: 90.52%,  Time: 0:03:41 *
Iter:    400,  Train Loss:   0.4,  Train Acc: 88.28%,  Val Loss:  0.28,  Val Acc: 91.38%,  Time: 0:04:49 *
Iter:    500,  Train Loss:  0.24,  Train Acc: 92.97%,  Val Loss:  0.26,  Val Acc: 91.86%,  Time: 0:05:56 *
Iter:    600,  Train Loss:  0.27,  Train Acc: 90.62%,  Val Loss:  0.25,  Val Acc: 91.87%,  Time: 0:07:02 *
Iter:    700,  Train Loss:  0.21,  Train Acc: 90.62%,  Val Loss:  0.24,  Val Acc: 92.47%,  Time: 0:08:08 *
Iter:    800,  Train Loss:  0.15,  Train Acc: 94.53%,  Val Loss:  0.23,  Val Acc: 92.60%,  Time: 0:09:16 *
Iter:    900,  Train Loss:  0.21,  Train Acc: 93.75%,  Val Loss:  0.23,  Val Acc: 92.72%,  Time: 0:10:22 *
Iter:   1000,  Train Loss:  0.18,  Train Acc: 93.75%,  Val Loss:  0.23,  Val Acc: 92.71%,  Time: 0:11:25
Iter:   1100,  Train Loss:  0.21,  Train Acc: 94.53%,  Val Loss:  0.21,  Val Acc: 93.09%,  Time: 0:12:48 *
Iter:   1200,  Train Loss:  0.21,  Train Acc: 92.19%,  Val Loss:  0.21,  Val Acc: 93.00%,  Time: 0:13:57 *
Iter:   1300,  Train Loss:  0.23,  Train Acc: 90.62%,  Val Loss:  0.21,  Val Acc: 93.06%,  Time: 0:15:04
Iter:   1400,  Train Loss:  0.31,  Train Acc: 91.41%,  Val Loss:   0.2,  Val Acc: 93.56%,  Time: 0:16:19 *
Epoch [2/3]
Iter:   1500,  Train Loss:   0.2,  Train Acc: 92.97%,  Val Loss:   0.2,  Val Acc: 93.30%,  Time: 0:17:41 *
Iter:   1600,  Train Loss:  0.17,  Train Acc: 93.75%,  Val Loss:   0.2,  Val Acc: 93.72%,  Time: 0:18:51
Iter:   1700,  Train Loss:  0.16,  Train Acc: 95.31%,  Val Loss:  0.19,  Val Acc: 93.94%,  Time: 0:20:16 *
Iter:   1800,  Train Loss:  0.12,  Train Acc: 95.31%,  Val Loss:   0.2,  Val Acc: 93.91%,  Time: 0:21:21
Iter:   1900,  Train Loss:  0.11,  Train Acc: 96.09%,  Val Loss:   0.2,  Val Acc: 93.78%,  Time: 0:22:25
Iter:   2000,  Train Loss:  0.14,  Train Acc: 96.88%,  Val Loss:   0.2,  Val Acc: 93.82%,  Time: 0:23:28
Iter:   2100,  Train Loss:  0.16,  Train Acc: 95.31%,  Val Loss:   0.2,  Val Acc: 93.86%,  Time: 0:24:36
Iter:   2200,  Train Loss:  0.13,  Train Acc: 94.53%,  Val Loss:   0.2,  Val Acc: 93.93%,  Time: 0:25:43
Iter:   2300,  Train Loss:   0.1,  Train Acc: 95.31%,  Val Loss:   0.2,  Val Acc: 93.75%,  Time: 0:26:48
Iter:   2400,  Train Loss: 0.052,  Train Acc: 98.44%,  Val Loss:   0.2,  Val Acc: 93.92%,  Time: 0:27:57
Iter:   2500,  Train Loss:  0.11,  Train Acc: 96.09%,  Val Loss:   0.2,  Val Acc: 93.87%,  Time: 0:29:05
Iter:   2600,  Train Loss: 0.094,  Train Acc: 95.31%,  Val Loss:   0.2,  Val Acc: 94.06%,  Time: 0:30:09
Iter:   2700,  Train Loss:   0.1,  Train Acc: 96.09%,  Val Loss:  0.19,  Val Acc: 94.16%,  Time: 0:31:22 *
Iter:   2800,  Train Loss:  0.12,  Train Acc: 97.66%,  Val Loss:  0.19,  Val Acc: 94.08%,  Time: 0:32:33 *
Epoch [3/3]
Iter:   2900,  Train Loss:  0.13,  Train Acc: 96.88%,  Val Loss:  0.19,  Val Acc: 93.92%,  Time: 0:33:40
Iter:   3000,  Train Loss: 0.079,  Train Acc: 98.44%,  Val Loss:   0.2,  Val Acc: 93.96%,  Time: 0:34:47
Iter:   3100,  Train Loss: 0.049,  Train Acc: 98.44%,  Val Loss:  0.21,  Val Acc: 93.92%,  Time: 0:35:55
Iter:   3200,  Train Loss:  0.13,  Train Acc: 96.88%,  Val Loss:  0.21,  Val Acc: 94.13%,  Time: 0:37:02
Iter:   3300,  Train Loss: 0.059,  Train Acc: 98.44%,  Val Loss:   0.2,  Val Acc: 94.11%,  Time: 0:38:10
Iter:   3400,  Train Loss:  0.05,  Train Acc: 98.44%,  Val Loss:  0.21,  Val Acc: 94.24%,  Time: 0:39:17
Iter:   3500,  Train Loss: 0.071,  Train Acc: 97.66%,  Val Loss:   0.2,  Val Acc: 94.31%,  Time: 0:40:24
Iter:   3600,  Train Loss:  0.01,  Train Acc: 100.00%,  Val Loss:   0.2,  Val Acc: 94.34%,  Time: 0:41:32
Iter:   3700,  Train Loss:  0.13,  Train Acc: 96.88%,  Val Loss:   0.2,  Val Acc: 94.04%,  Time: 0:42:39
Iter:   3800,  Train Loss:   0.1,  Train Acc: 95.31%,  Val Loss:   0.2,  Val Acc: 94.35%,  Time: 0:43:46
No optimization for a long time, auto-stopping...
Test Loss:  0.17,  Test Acc: 94.82%

加快代码

根据官方介绍,让练习加快只需求一行代码。即:model = torch.compile(model)

把它加到 train_eval.py中

def train(config, model, train_iter, dev_iter, test_iter):
    start_time = time.time()
    model = torch.compile(model)
    model.train()
    ...

pythonrun.py–model bert

Epoch [1/3]
/home/guodong/miniconda3/envs/pt2/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:90: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
  warnings.warn(
[2023-04-08 13:41:26,662] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
Iter:      0,  Train Loss:   2.4,  Train Acc: 14.06%,  Val Loss:   2.4,  Val Acc:  9.08%,  Time: 0:00:43 *
Iter:    100,  Train Loss:  0.51,  Train Acc: 85.16%,  Val Loss:  0.38,  Val Acc: 89.13%,  Time: 0:01:39 *
Iter:    200,  Train Loss:  0.33,  Train Acc: 90.62%,  Val Loss:  0.32,  Val Acc: 90.31%,  Time: 0:02:33 *
Iter:    300,  Train Loss:  0.26,  Train Acc: 93.75%,  Val Loss:  0.31,  Val Acc: 90.58%,  Time: 0:03:31 *
Iter:    400,  Train Loss:  0.38,  Train Acc: 89.84%,  Val Loss:  0.26,  Val Acc: 91.93%,  Time: 0:04:29 *
Iter:    500,  Train Loss:  0.23,  Train Acc: 93.75%,  Val Loss:  0.27,  Val Acc: 91.78%,  Time: 0:05:21
Iter:    600,  Train Loss:  0.24,  Train Acc: 91.41%,  Val Loss:  0.25,  Val Acc: 92.13%,  Time: 0:06:19 *
Iter:    700,  Train Loss:  0.26,  Train Acc: 92.97%,  Val Loss:  0.24,  Val Acc: 92.26%,  Time: 0:07:15 *
Iter:    800,  Train Loss:  0.18,  Train Acc: 93.75%,  Val Loss:  0.21,  Val Acc: 93.12%,  Time: 0:08:10 *
Iter:    900,  Train Loss:  0.23,  Train Acc: 92.19%,  Val Loss:  0.21,  Val Acc: 93.10%,  Time: 0:09:07 *
Iter:   1000,  Train Loss:  0.19,  Train Acc: 90.62%,  Val Loss:  0.21,  Val Acc: 93.11%,  Time: 0:09:58
Iter:   1100,  Train Loss:  0.25,  Train Acc: 92.97%,  Val Loss:   0.2,  Val Acc: 93.27%,  Time: 0:10:54 *
Iter:   1200,  Train Loss:  0.17,  Train Acc: 94.53%,  Val Loss:   0.2,  Val Acc: 93.34%,  Time: 0:11:49 *
Iter:   1300,  Train Loss:  0.22,  Train Acc: 92.19%,  Val Loss:   0.2,  Val Acc: 93.38%,  Time: 0:12:44 *
Iter:   1400,  Train Loss:   0.3,  Train Acc: 91.41%,  Val Loss:   0.2,  Val Acc: 93.48%,  Time: 0:13:39 *
[2023-04-08 13:55:06,652] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
Epoch [2/3]
Iter:   1500,  Train Loss:  0.15,  Train Acc: 95.31%,  Val Loss:  0.19,  Val Acc: 93.64%,  Time: 0:14:51 *
Iter:   1600,  Train Loss:  0.19,  Train Acc: 94.53%,  Val Loss:   0.2,  Val Acc: 93.84%,  Time: 0:15:43
Iter:   1700,  Train Loss:  0.15,  Train Acc: 93.75%,  Val Loss:  0.19,  Val Acc: 93.94%,  Time: 0:16:39 *
Iter:   1800,  Train Loss:   0.1,  Train Acc: 96.09%,  Val Loss:   0.2,  Val Acc: 93.71%,  Time: 0:17:30
Iter:   1900,  Train Loss:  0.11,  Train Acc: 96.88%,  Val Loss:  0.19,  Val Acc: 94.02%,  Time: 0:18:22
Iter:   2000,  Train Loss:  0.11,  Train Acc: 96.88%,  Val Loss:  0.19,  Val Acc: 93.98%,  Time: 0:19:14
Iter:   2100,  Train Loss:  0.14,  Train Acc: 95.31%,  Val Loss:  0.19,  Val Acc: 93.97%,  Time: 0:20:06
Iter:   2200,  Train Loss:  0.09,  Train Acc: 98.44%,  Val Loss:  0.19,  Val Acc: 94.04%,  Time: 0:20:57
Iter:   2300,  Train Loss: 0.078,  Train Acc: 96.88%,  Val Loss:  0.19,  Val Acc: 94.01%,  Time: 0:21:49
Iter:   2400,  Train Loss: 0.065,  Train Acc: 97.66%,  Val Loss:  0.19,  Val Acc: 93.99%,  Time: 0:22:41
Iter:   2500,  Train Loss: 0.096,  Train Acc: 98.44%,  Val Loss:  0.19,  Val Acc: 94.03%,  Time: 0:23:33
Iter:   2600,  Train Loss: 0.099,  Train Acc: 96.09%,  Val Loss:  0.18,  Val Acc: 94.17%,  Time: 0:24:32 *
Iter:   2700,  Train Loss:  0.11,  Train Acc: 95.31%,  Val Loss:  0.19,  Val Acc: 94.18%,  Time: 0:25:24
Iter:   2800,  Train Loss:  0.11,  Train Acc: 96.88%,  Val Loss:  0.17,  Val Acc: 94.27%,  Time: 0:26:19 *
Epoch [3/3]
Iter:   2900,  Train Loss:  0.11,  Train Acc: 97.66%,  Val Loss:  0.18,  Val Acc: 94.11%,  Time: 0:27:11
Iter:   3000,  Train Loss: 0.072,  Train Acc: 97.66%,  Val Loss:  0.19,  Val Acc: 94.21%,  Time: 0:28:03
Iter:   3100,  Train Loss: 0.032,  Train Acc: 99.22%,  Val Loss:  0.19,  Val Acc: 94.28%,  Time: 0:28:55
Iter:   3200,  Train Loss:  0.13,  Train Acc: 96.88%,  Val Loss:  0.19,  Val Acc: 94.25%,  Time: 0:29:46
Iter:   3300,  Train Loss: 0.042,  Train Acc: 98.44%,  Val Loss:  0.19,  Val Acc: 94.43%,  Time: 0:30:38
Iter:   3400,  Train Loss:  0.09,  Train Acc: 97.66%,  Val Loss:   0.2,  Val Acc: 94.31%,  Time: 0:31:30
Iter:   3500,  Train Loss: 0.049,  Train Acc: 98.44%,  Val Loss:  0.19,  Val Acc: 94.67%,  Time: 0:32:22
Iter:   3600,  Train Loss: 0.0093,  Train Acc: 100.00%,  Val Loss:  0.19,  Val Acc: 94.64%,  Time: 0:33:14
Iter:   3700,  Train Loss:  0.12,  Train Acc: 97.66%,  Val Loss:  0.19,  Val Acc: 94.43%,  Time: 0:34:05
Iter:   3800,  Train Loss: 0.061,  Train Acc: 98.44%,  Val Loss:  0.19,  Val Acc: 94.66%,  Time: 0:34:57
No optimization for a long time, auto-stopping...
Test Loss:  0.17,  Test Acc: 94.57%

这次练习花费是34分57s(约等于35分钟),对比之前的43分钟减少了8分钟,速度提高18.6%

再次加快

可是这还不行!看练习刚开始时分输出的一个⚠️正告信息:

/home/guodong/miniconda3/envs/pt2/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:90: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting torch.set_float32_matmul_precision('high') for better performance.

它提示了我们一个继续提高功能的设置,来在run.py中加上:

torch.set_float32_matmul_precision('high')

重新练习:

Epoch [1/3]
Iter:      0,  Train Loss:   2.4,  Train Acc: 14.06%,  Val Loss:   2.4,  Val Acc:  9.08%,  Time: 0:01:18 *
Iter:    100,  Train Loss:  0.41,  Train Acc: 88.28%,  Val Loss:  0.39,  Val Acc: 88.94%,  Time: 0:02:00 *
Iter:    200,  Train Loss:  0.45,  Train Acc: 88.28%,  Val Loss:  0.41,  Val Acc: 88.40%,  Time: 0:02:37
Iter:    300,  Train Loss:   0.3,  Train Acc: 89.84%,  Val Loss:  0.33,  Val Acc: 90.12%,  Time: 0:03:18 *
Iter:    400,  Train Loss:  0.46,  Train Acc: 85.94%,  Val Loss:   0.3,  Val Acc: 91.03%,  Time: 0:03:59 *
Iter:    500,  Train Loss:  0.31,  Train Acc: 92.19%,  Val Loss:  0.27,  Val Acc: 91.57%,  Time: 0:04:42 *
Iter:    600,  Train Loss:  0.29,  Train Acc: 89.84%,  Val Loss:  0.27,  Val Acc: 91.43%,  Time: 0:05:19
Iter:    700,  Train Loss:  0.23,  Train Acc: 93.75%,  Val Loss:  0.25,  Val Acc: 91.90%,  Time: 0:06:02 *
Iter:    800,  Train Loss:  0.19,  Train Acc: 93.75%,  Val Loss:  0.24,  Val Acc: 92.24%,  Time: 0:06:47 *
Iter:    900,  Train Loss:  0.22,  Train Acc: 92.19%,  Val Loss:  0.22,  Val Acc: 93.18%,  Time: 0:07:29 *
Iter:   1000,  Train Loss:  0.19,  Train Acc: 92.19%,  Val Loss:  0.23,  Val Acc: 92.43%,  Time: 0:08:06
Iter:   1100,  Train Loss:  0.24,  Train Acc: 92.19%,  Val Loss:  0.21,  Val Acc: 93.33%,  Time: 0:08:48 *
Iter:   1200,  Train Loss:  0.24,  Train Acc: 92.19%,  Val Loss:   0.2,  Val Acc: 93.26%,  Time: 0:09:28 *
Iter:   1300,  Train Loss:  0.21,  Train Acc: 91.41%,  Val Loss:   0.2,  Val Acc: 93.56%,  Time: 0:10:10 *
Iter:   1400,  Train Loss:  0.31,  Train Acc: 92.19%,  Val Loss:  0.19,  Val Acc: 93.74%,  Time: 0:10:51 *
[2023-04-08 11:23:03,069] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
Epoch [2/3]
Iter:   1500,  Train Loss:  0.14,  Train Acc: 96.09%,  Val Loss:  0.19,  Val Acc: 93.65%,  Time: 0:13:11
Iter:   1600,  Train Loss:  0.25,  Train Acc: 92.97%,  Val Loss:  0.21,  Val Acc: 93.47%,  Time: 0:13:48
Iter:   1700,  Train Loss:  0.17,  Train Acc: 95.31%,  Val Loss:   0.2,  Val Acc: 93.76%,  Time: 0:14:26
Iter:   1800,  Train Loss:  0.14,  Train Acc: 96.09%,  Val Loss:  0.21,  Val Acc: 93.81%,  Time: 0:15:03
Iter:   1900,  Train Loss:  0.12,  Train Acc: 96.09%,  Val Loss:   0.2,  Val Acc: 93.80%,  Time: 0:15:40
Iter:   2000,  Train Loss:  0.13,  Train Acc: 96.09%,  Val Loss:  0.21,  Val Acc: 93.48%,  Time: 0:16:17
Iter:   2100,  Train Loss:  0.21,  Train Acc: 93.75%,  Val Loss:   0.2,  Val Acc: 93.85%,  Time: 0:16:54
Iter:   2200,  Train Loss: 0.099,  Train Acc: 96.88%,  Val Loss:  0.21,  Val Acc: 93.84%,  Time: 0:17:32
Iter:   2300,  Train Loss: 0.074,  Train Acc: 96.88%,  Val Loss:   0.2,  Val Acc: 94.03%,  Time: 0:18:09
Iter:   2400,  Train Loss: 0.066,  Train Acc: 97.66%,  Val Loss:   0.2,  Val Acc: 94.02%,  Time: 0:18:47
No optimization for a long time, auto-stopping...
Test Loss:  0.18,  Test Acc: 94.08%

整个练习过程只花费了18:47分钟,速度提高58%!!!

整个体现如同开了八门后施展夜凯的凯皇,但和以生命为代价的凯皇不同。Pytorch2.0的这个提高,加起来不过两行代码,几乎可以说是没有任何代价。

当然不同的项目体现出现的提高作用是不同的,可是如此无脑提高的Pytorch2.0仍旧值得你拥有!

猜测代码修改

值得一提的是。在模型练习完成后。把模型用来猜测的脚本也要改一下:

也需求加上:model = torch.compile(model)

即:

import torch
from importlib import import_module
import time
key = {
    0: 'finance',
    1: 'realty',
    2: 'stocks',
    3: 'education',
    4: 'science',
    5: 'society',
    6: 'politics',
    7: 'sports',
    8: 'game',
    9: 'entertainment'
}
model_name = 'bert'
x = import_module('models.' + model_name)
config = x.Config('THUCNews')
model = x.Model(config).to(config.device)
model = torch.compile(model)  # 加上这行代码
model.load_state_dict(torch.load(config.save_path, map_location='cpu'))
def build_predict_text_raw(text):
    token = config.tokenizer.tokenize(text)
    token = ['[CLS]'] + token
    seq_len = len(token)
    mask = []
    token_ids = config.tokenizer.convert_tokens_to_ids(token)
    pad_size = config.pad_size
    # 下面进行padding,用0补足位数
    if pad_size:
        if len(token) < pad_size:
            mask = [1] * len(token_ids) + ([0] * (pad_size - len(token)))
            token_ids += ([0] * (pad_size - len(token)))
        else:
            mask = [1] * pad_size
            token_ids = token_ids[:pad_size]
            seq_len = pad_size
    return token_ids, seq_len, mask
def build_predict_text(text):
    token_ids, seq_len, mask = build_predict_text_raw(text)
    ids = torch.LongTensor([token_ids]).cuda()
    seq_len = torch.LongTensor([seq_len]).cuda()
    mask = torch.LongTensor([mask]).cuda()
    return ids, seq_len, mask
def predict(text):
    """
    单个文本猜测
    :param text:
    :return:
    """
    data = build_predict_text(text)
    with torch.no_grad():
        outputs = model(data)
        num = torch.argmax(outputs)
    return key[int(num)]
if __name__ == '__main__':
    t = "张家界天门山排队跳崖事情"
    print(predict(t))

输出:society