序言

创建并训练第一个神经网络,要从:

下载数据集->加载数据集->加载Tensorboard

构建神经网络->训练->测试...->保存已经训练的模型

入手, 每一节,都是一个标题,可以通过点击直接跳转查看。

1. Dataset

IF you want to use dataset(), coding import torchvision

import torchvision
test_data = torchvision.CIFAR10('file path', train=True/False, transform=torchvision.transforms.ToTensor(), download=True/False)
# Normally a dataset includes Train-set and Test-set, so the parameters 'train' set about
# 'transform' parameter means whethter you want to transform the images from dataset to different forms
# 'download' will check if you do not have download CIFAR10 dataset
# After all test_data means the whole package
print(len(test_data)) # will print 50000, which means it has 50000 train_image

2. Dataloader

from torch.utils.data import DataLoader
test_loader = DataLoader(dataset=test_data, batch_size=4, num_workers=0, drop_last=True)
# 'dataset' parameters is the target that will be load
# 'batch_size' means 1 package(batch) will has how many images, so as you set it '4'
print(len(test_loader)) # will print 12500, and if batch_size is 64 it will print 781
# 'num_workers' will appoint the processes of computer to figure the task, '0' will work automatically    

# if 'batch_size' is '64', All 50000 images will no be loaded totally, thus 50000/64 is't an int
# so 'drop_last' parameters will help you kick off the remaining images

3. import image & transform

3.1 use PIL to open image

from PIL import Image
myPhoto = Image.open('your image path')
myPhoto,show() # will open the photo you select, showing in your default application
myPhoto = myPhoto.convert('RGB')
# actually , not all photos' forms are 3 channels. Like .png, also has a transparency channel
# .convert will change photo's color channel, because some transform function like 'torchvision.transforms.ToTensor' requires 3 color channels.

4. Use Tensorboard

4.1 SummaryWriter()

SummaryWriter() is the entrance of tensor board to write in data.

from torch.utils.tensorboard import SummaryWriter
MyDataBoard = SummaryWriter('Your board name')
# Remember this name, it means interpreter will mkidr a folder named it, and it is also your tensorboard's title.
# 'MydataBoard' is now a instantiation of tensorboard class, so just use 'MydataBoard', you could operate the tensorboard.
After you finsh the operation, close the
MyDataBoard.close()

4.2 add_scalar()

for i in range(500):
  MyDataBoard.add_scalar('y=x**2+2x-6', i**2+2*i-6, i)
  # use .add_scalar() one time means a scalar in graph.
     # .add_scalar('tag', how_y_calculate, how_x_calculate)
  # if in different add_scalar step, your 'tag' is same, the scalar will be added in the same graph
  # 'how_y_calculate' is scalar's coordinates in y axis, 'how_x_calculate' is same

4.3 add_image()

Images that in torch.Tensor or np.array or string forms could be add to Tensorboard.

.add_image('tag', img, step, dataformats)
MyDataBoard.add_image("I think it\'s a 只因!", np_image, 1, dataformats='HWC')
# this step we add a image in form np.array.
# Specially, default 'dataformats' in .add_image() is 'CHW', (channel,heigh,weigh)
# but if your image is opened through PIL.Image, and use np.array() to transform it into np.array, its dataformat is 'HWC', (Heigh,weigh,channel), so you should set 'dataformats' parameter

5. Create a Neural network class

from torch import nn

class MyfirstNet(nn.Module):
    def __init__(self):
        super(MyfirstNet, self).__init__()
        self.model = nn.Sequential(
            nn.Conv2d(3, 32, 5, 1, 2),  # 卷积
            nn.MaxPool2d(2),  # 池化
            nn.Conv2d(32, 32, 5, 1, 2),  # 卷积
            nn.MaxPool2d(2),  # 池化
            nn.Conv2d(32, 64, 5, 1, 2),  # 卷积
            nn.MaxPool2d(2),  # 池化
            nn.Flatten(),  # 降维, 多维转一维
            nn.Linear(64 * 4 * 4, 64),  # 线性映射 in 64*4*4 out 64
            nn.Linear(64, 10)
        )

    def forward(self, inputData):
        resolveData = self.model(inputData)
        return resolveData

As you see, these codes is a complete neural network.

5.1 Create Class and Inherit & override superclass

from torch import nn # import the superclass

class MyfirstNet(nn.Module): # inherit
    def __init__(self):
        super(MyfirstNet, self).__init__() # inherit the superclass's __init__
        self.model = nn.Sequential(
        # here is your training
        ) 
        # override self.model, with a function nn.Sequential(), which makes codes clear

5.2 forward()

Forward properties of nn class is the way to process your input data through what training ways and return what.

def forward(self, inputData):
    resolveData = self.model(inputData)
    return resolveData
  # forward() send inputData to self.model properties we had set and return the over-training data to us.
  # it actually means how we process our data.

5.3 nn.Sequential()

Sequential() function makea us feel convenient, but even we don't have sequential(), we also can continue the process

from torch import nn

class MyfirstNet(nn.Module):
    def __init__(self):
        super(MyfirstNet, self).__init__()
        self.model = nn.Sequential(
            nn.Conv2d(3, 32, 5, 1, 2),  # 卷积
            nn.MaxPool2d(2),  # 池化
            nn.Conv2d(32, 32, 5, 1, 2),  # 卷积
            nn.MaxPool2d(2),  # 池化
            nn.Conv2d(32, 64, 5, 1, 2),  # 卷积
            nn.MaxPool2d(2),  # 池化
            nn.Flatten(),  # 降维, 多维转一维
            nn.Linear(64 * 4 * 4, 64),  # 线性映射 in 64*4*4 out 64
            nn.Linear(64, 10)
        )

    def forward(self, inputData):
        resolveData = self.model(inputData)
        return resolveData

6. Train the Neural Network

6.1 Instantiation training class

It's a basic operation just like other instantiation.

Training = MyfirstNet()

6.2 Load the optimizer and loss function etc.

    # 加载损失函数
    loss_fn = nn.CrossEntropyLoss()  # 交叉熵
    # with this fucntion we could calculate the loss between input and target just like
    # loss = loss_fn(outputData, tags)  # 计算loss 
    # loss.backward() # 反向传播, 梯度下降。
    
    #  加载优化器
    learning_rate = 1e-2 # although smaller learning_rate will results a deep learning, but it will also take a long time for learning and '过拟合'
    optimizer = torch.optim.SGD(Training.parameters(), lr=learning_rate)
    # torch.optim.SGD(neural_network parameter, learning_rate)

​ 通过 损失函数优化器 我们就可以实现 计算损失、反向传播、计算题度、梯度下降 等修正模型参数的重要方法。

7. Test the Neural Network

Through the step we learned above, we could train a module completely.

    # training config 训练前设置
    # train epoch 设置
    epoch = 300
    # open the tensorboard
    writer = SummaryWriter('First_unbroken_train')
    # 开始训练
    Training.train()  # a signal of training start
    total_train_step = 1  # global training step
    total_test_step = 1  # global test step
    for n in range(epoch):
        print(f'------ {n} epoch training starting ------')
        # start training and load training data
        for data in train_data_load:  # 从batch size中拿出数据
            imgs, tags = data  # 提取图片 和 答案tag
            outputData = Training(imgs)  # 送入训练网络
            loss = loss_fn(outputData, tags)  # 计算loss
            optimizer.zero_grad()  # 加载优化器 清理梯度
            loss.backward()  # 反向传播 计算梯度
            optimizer.step()  # 梯度下降
            # print(type(loss)) loss is Tensor
            # print(type(loss.item())) loss.item() is float, thus we need item()to take data out.
            if total_train_step % 100 == 0:  # every 100 steps print and show the loss
                print(f'Training in {n} epoch, ongoing {total_train_step} step\'s Loss is {loss.item()}')  # 打印loss
                writer.add_scalar('lossOfTrain', loss.item(), total_train_step)
            total_train_step += 1  # already study 1 batch_size
        # training over , start test

8. Analyze Loss and Accuracy

        test_total_loss = 0  # loss of this test
        total_test_accuracy = 0  # accuracy of this test
        Training.eval()  # a signal of verify test
        with torch.no_grad():  # test without grad to save the memory
            print('------ Start verify accuracy of NN module this time ------')
            for data in test_data_load:
                imgs, tags = data
                testOut = Training(imgs)
                loss = loss_fn(testOut, tags)
                test_total_loss += loss.item()
                # Because of linear, one image was mapped into one line,
                # so search ever line with argmax(1) we could get every image's prediction and use to compare
                accuracy = (testOut.argmax(1) == tags).sum()
                # it will return true of false, and sum() will calculate all of this by true=1 false=0
                total_test_accuracy += accuracy
        print(f'After this training, the total loss of test_data is {test_total_loss}')
        print(f'After this training, Accuracy of test_data is {total_test_accuracy/test_data_len}')
        # total right/all test data
        writer.add_scalar('lossOfTest', test_total_loss, total_test_step)
        writer.add_scalar('accuracyOftest', total_test_accuracy/test_data_len, total_test_step)
        total_test_step += 1  # test over

torch.no_gard() save memory and time of learning in test, because we don't need test dataset to train module, right?

8.1 how to calculate the accuracy of test

Actually, in the Sequential() the latest step is nn.Linear(64, 10) makes data forecast as a line into 10 results.

.argmax(1) 1 present searching the Biggest data row by row,

Specially, one dimensionality array doesn't have axis. But it also can use .argmax(0) , which will return the max digit in the array, instead of every biggest digit in every columns.

axis

example = np.array([[1, 5, 3, 4],
                   [7, 8, 9, 10],
                   [2, 5, 8, 6]])

print(example.argmax(0))
# result: [1 0 2 1]
print(example.argmax(1))
# result: [1 3 2]

 

accuracy = (testOut.argmax(1) == tags).sum()
# testOut.argmax(1) will find the biggest digit, and return the digit's index.
# testOut is a Tensor in two dimension like:
# [[0.5, 0.3,....., 0.64]]
# so .argmax(1) could work in this linear of tensor

(testOut.argmax(1) == tags)

compares the index with answer of tag(it's an index too), if the index is same, will return 1, false will return 0.

.sum() will count all returns and add them up. The returns only have 1 or 0, so the result means the number of Accepted answer.

Save & Load the already training Neural Network.

Save

torch.save(Training, f'./mymodule/MyFirstTrainModule_NO{n+1}.pth')
# 'torch.save(instantiation_of_NN, store_path)'
# 'instantiation' parameter symbolize nn project that you decide to store.
# 'store_path' parameter, store  file in the path, usually, we store in '.pth' form

Specially, torch.save() will not create a file path if it does't exist, just err.

So check the path first

Load

model = torch.load('./mymodule/MyFirstTrainModule_NO30.pth')

This step will help you instantiate module of your class.

But even you save the module completely, you need to paste the module's training step or import the training class(recommend)

from fullStructure import *  # import the NNmodule class

or

class MyfirstNet(nn.Module):
    def __init__(self):
        super(MyfirstNet, self).__init__()
        self.model = nn.Sequential(
            nn.Conv2d(3, 32, 5, 1, 2),  # 卷积
            nn.MaxPool2d(2),  # 池化
            nn.Conv2d(32, 32, 5, 1, 2),  # 卷积
            nn.MaxPool2d(2),  # 池化
            nn.Conv2d(32, 64, 5, 1, 2),  # 卷积
            nn.MaxPool2d(2),  # 池化
            nn.Flatten(),  # 降维, 多维转一维
            nn.Linear(64 * 4 * 4, 64),  # 线性映射 in64*4*4 out 64
            nn.Linear(64, 10)
        )

    def forward(self, inputData):
        resolveData = self.model(inputData)
        return resolveData

10. Summary

本章我们用一点都不地道的英语粗略地描述了,如何创建自己的第一个完整的训练网络,训练CIFAR10,这样一个基础的训练集。

并且学习使用Tensorboard,呈现出训练当中的Loss,Accuracy方便对神经网络进行调节。

下一章将更加粗略地讲解如何使用已经训练好的神经网络

CIFAR10提供了 飞机、车、鸡、猫、鹿、狗、蛤蟆、马、船、货车。下一章我们将测试一下它是否可以认出只因。

完整代码

import torch.optim
import torchvision
from torch import nn
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter


class MyfirstNet(nn.Module):
    def __init__(self):
        super(MyfirstNet, self).__init__()
        self.model = nn.Sequential(
            nn.Conv2d(3, 32, 5, 1, 2),  # 卷积
            nn.MaxPool2d(2),  # 池化
            nn.Conv2d(32, 32, 5, 1, 2),  # 卷积
            nn.MaxPool2d(2),  # 池化
            nn.Conv2d(32, 64, 5, 1, 2),  # 卷积
            nn.MaxPool2d(2),  # 池化
            nn.Flatten(),  # 降维, 多维转一维
            nn.Linear(64 * 4 * 4, 64),  # 线性映射 in64*4*4 out 64
            nn.Linear(64, 10)
        )

    def forward(self, inputData):
        resolveData = self.model(inputData)
        return resolveData


if __name__ == '__main__':
    # 加载训练模型
    Training = MyfirstNet()
    # 加载训练数据集
    train_data = torchvision.datasets.CIFAR10("./CIFAR10_dataset", train=True,
                                              transform=torchvision.transforms.ToTensor(), download=True)
    # 加载测试数据集
    test_data = torchvision.datasets.CIFAR10("./CIFAR10_dataset", train=False,
                                             transform=torchvision.transforms.ToTensor(), download=True)
    # 加载train batch size
    train_data_load = DataLoader(dataset=train_data, batch_size=64, num_workers=0, drop_last=True)
    # 加载test batch size
    test_data_load = DataLoader(dataset=test_data, batch_size=64, num_workers=0, drop_last=True)
    # calculate the whole num of images in Train/Test set
    train_data_len = len(train_data)
    print(f'length of trainData {train_data_len}')
    test_data_len = len(test_data)
    print(f'length of testData {test_data_len}')
    # 加载损失函数
    loss_fn = nn.CrossEntropyLoss()  # 交叉熵
    #  加载优化器
    learning_rate = 1e-2
    optimizer = torch.optim.SGD(Training.parameters(), lr=learning_rate)
    # training config 训练前设置
    # train epoch 设置
    epoch = 300
    # open the tensorboard
    writer = SummaryWriter('First_unbroken_train')
    # 开始训练
    Training.train()  # a signal of training start
    total_train_step = 1  # global training step
    total_test_step = 1  # global test step
    for n in range(epoch):
        print(f'------ {n} epoch training starting ------')
        # start training and load training data
        for data in train_data_load:  # 从batch size中拿出数据
            imgs, tags = data  # 提取图片 和 答案tag
            outputData = Training(imgs)  # 送入训练网络
            loss = loss_fn(outputData, tags)  # 计算loss
            optimizer.zero_grad()  # 加载优化器 清理梯度
            loss.backward()  # 反向传播 计算梯度
            optimizer.step()  # 梯度下降
            # print(type(loss)) loss is Tensor
            # print(type(loss.item())) loss.item() is float
            if total_train_step % 100 == 0:  # every 100 steps print and show the loss
                print(f'Training in {n} epoch, ongoing {total_train_step} step\'s Loss is {loss.item()}')  # 打印loss
                writer.add_scalar('lossOfTrain', loss.item(), total_train_step)
            total_train_step += 1  # already study 1 batch_size
        # training over , start test
        test_total_loss = 0  # loss of this test
        total_test_accuracy = 0  # accuracy of this test
        Training.eval()  # a signal of verify test
        with torch.no_grad():  # test without grad to save the memory
            print('------ Start verify accuracy of NN module this time ------')
            for data in test_data_load:
                imgs, tags = data
                testOut = Training(imgs)
                loss = loss_fn(testOut, tags)
                test_total_loss += loss.item()
                # Because of linear, one image was mapped into one line,
                # so search ever line with argmax(1) we could get every image's prediction and use to compare
                accuracy = (testOut.argmax(1) == tags).sum()
                # it will return true of false, and sum() will calculate all of this by true=1 false=0
                total_test_accuracy += accuracy
        print(f'After this training, the total loss of test_data is {test_total_loss}')
        print(f'After this training, Accuracy of test_data is {total_test_accuracy/test_data_len}')
        # total right/all test data
        writer.add_scalar('lossOfTest', test_total_loss, total_test_step)
        writer.add_scalar('accuracyOftest', total_test_accuracy/test_data_len, total_test_step)
        total_test_step += 1  # test over
        # save the all-ready module, epoch was counted from 0
        if n in (49, 59, 69, 79, 99, 119, 149, 169, 189, 199, 209, 219, 239, 249, 269, 279, 299):
            torch.save(Training, f'./mymodule/MyFirstTrainModule_NO{n+1}.pth')

    # close the tensorboard
    writer.close()
最后修改:2023 年 01 月 26 日
如果觉得我的文章对你有用,请随意赞赏