[PyTorch] Build the Neural Network

코딩/PyTorch

[PyTorch] Build the Neural Network

guungyul 2025. 1. 10. 20:58

Neural Networks는 데이터에 작업을 하는 layer와 모듈로 이루어져있다. torch.nn namespace는 사용저 정의 neural network를 만드는데 필요한 기본 구성 요소들을 제공한다. PyTorch에 있는 모든 모듈은 nn.Module의 subclass로 정의된다. Neural network란 다른 모듈들로 구성되어 있는 하나의 모듈이다.

아래 예제에서는 FashionMNIST 데이터셋 이미지들을 classify하기 위한 neural network를 만든다.

import os
import torch
from torch import nn
form torch.utils.data import DataLoader
from torchvision import datasets, transforms

Get Device for Training

모델을 GPU같은 hardware accelerator에서 학습시기키 위해서 torch.cuda나 torch.backends.mps가 사용 가능한지 확인한다. 만약 사용가능하지 않다면 CPU를 사용한다.

device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")
# Using cuda device

Define the Class

사용자 정의 neural network는 nn.Module의 subclass로 정의된다. Neural network layer들은 __init__에서 초기화된다. 모든 nn.Moduel subclass들은 input data에 대한 작업을 forward 함수에서 정의한다.

class NeuralNetwork(nn.Module):
	def __init__(self):
    	super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
        	nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
       	)
        
    def forward(self, x):
    	x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

그 후 NeuralNetwork의 객체를 만들고 device로 옮길 수 있다.

model = NeuralNetwork().to(device)
print(model)
# NeuralNetwork(
#   (flatten): Flatten(start_dim=1, end_dim=-1)
#   (linear_relu_stack): Sequential(
#     (0): Linear(in_features=784, out_features=512, bias=True)
#     (1): ReLU()
#     (2): Linear(in_features=512, out_features=512, bias=True)
#     (3): ReLU()
#     (4): Linear(in_features=512, out_features=10, bias=True)
#   )
# )

이 모델을 사용하기 위해 input data를 넣으면 모델의 forward 함수가 실행된다. model.forward()를 직접적으로 부르지 않는다.

Input으로 모델을 부르면 2-dimensional tensor을 반환한다.

dim=0은 각 class에 대해 예측한 10개의 raw predicted value를 나타냄
dim=1은 각 output의 개별 값을 나타냄

이 output을 softmax 모듈 nn.Softmax에 통과시켜 최종 예측 값을 구할 수 있다.

X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")
# Predicted class: tensor([7], device='cuda:0')

Model Layers

모델의 layer를 분석하기 위해 28x28 이미지 3개 짜리 minibatch를 네트워크에 넣어본다.

input_image = torch.rand(3, 28, 28)
print(input_image.size())
# torch.Size([3, 28, 28])

nn.Flatten

nn.Flatten layer는 2D 28x28 이미지를 하나의 연속된 array로 변환시킵니다. (minibatch의 dimension은 (dim=0) 유지됩니다.)

flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())
# torch.Size([3, 784])

nn.Linear

저장된 weights와 biases를 사용해 input을 linear transformation시키는 모듈이다.

layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())
# torch.Size([3, 20])

nn.ReLU

Non-linear activation은 모델의 input과 output 사이에 복잡한 mapping을 만든다.

위 모델에서는 linear layer들 사이에 nn.ReLU를 사용했다.

print(f"Before ReLU: {hidden1}\n\n")
# Before ReLU: tensor([[ 0.4158, -0.0130, -0.1144,  0.3960,  0.1476, -0.0690, -0.0269,  0.2690,
#           0.1353,  0.1975,  0.4484,  0.0753,  0.4455,  0.5321, -0.1692,  0.4504,
#           0.2476, -0.1787, -0.2754,  0.2462],
#         [ 0.2326,  0.0623, -0.2984,  0.2878,  0.2767, -0.5434, -0.5051,  0.4339,
#           0.0302,  0.1634,  0.5649, -0.0055,  0.2025,  0.4473, -0.2333,  0.6611,
#           0.1883, -0.1250,  0.0820,  0.2778],
#         [ 0.3325,  0.2654,  0.1091,  0.0651,  0.3425, -0.3880, -0.0152,  0.2298,
#           0.3872,  0.0342,  0.8503,  0.0937,  0.1796,  0.5007, -0.1897,  0.4030,
#           0.1189, -0.3237,  0.2048,  0.4343]], grad_fn=<AddmmBackward0>)

hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")
# After ReLU: tensor([[0.4158, 0.0000, 0.0000, 0.3960, 0.1476, 0.0000, 0.0000, 0.2690, 0.1353,
#          0.1975, 0.4484, 0.0753, 0.4455, 0.5321, 0.0000, 0.4504, 0.2476, 0.0000,
#          0.0000, 0.2462],
#         [0.2326, 0.0623, 0.0000, 0.2878, 0.2767, 0.0000, 0.0000, 0.4339, 0.0302,
#          0.1634, 0.5649, 0.0000, 0.2025, 0.4473, 0.0000, 0.6611, 0.1883, 0.0000,
#          0.0820, 0.2778],
#         [0.3325, 0.2654, 0.1091, 0.0651, 0.3425, 0.0000, 0.0000, 0.2298, 0.3872,
#          0.0342, 0.8503, 0.0937, 0.1796, 0.5007, 0.0000, 0.4030, 0.1189, 0.0000,
#          0.2048, 0.4343]], grad_fn=<ReluBackward0>)

nn.Sequential

nn.Sequential은 모듈을 순서대로 저장하는 콘테이너이다. 데이터는 정의된 순서대로 모든 모듈을 거쳐간다.

이런 sequential container를 사용해 빠른 entwork들을 함께 저장해놓을 수 있다.

seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

nn.Softmax

Neural network의 마지막 layer는 logit을 반환하고 이는 nn.Softmax 모듈에게 전달된다. 0과 1사이의 값으로 조절된 logit들은 각 class에 대한 모델의 예측 확률을 나타낸다. dim parameter는 값들의 합이 1이 되어야 하는 dimension을 나타낸다.

softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

Model Parameters

Neural network안에 있는 많은 layer들은 모두 parameterized 되어 있다. 예를 들어 학습 과정에서 최적화된 weights와 biases가 연관되어 있다.

nn.Module의 subclass로 정의되면 자동으로 모델 객체에 있는 field들이 기록되고, 모든 parameter들은 parameters()와 named_parameters() 함수로 접근 가능하다.

아래 예제에서는 각 parameter를 순회하면서 크기와 미리보는 값들을 출력한다.

print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

"""
Model structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[ 0.0273,  0.0296, -0.0084,  ..., -0.0142,  0.0093,  0.0135],
        [-0.0188, -0.0354,  0.0187,  ..., -0.0106, -0.0001,  0.0115]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0155, -0.0327], device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0116,  0.0293, -0.0280,  ...,  0.0334, -0.0078,  0.0298],
        [ 0.0095,  0.0038,  0.0009,  ..., -0.0365, -0.0011, -0.0221]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values : tensor([ 0.0148, -0.0256], device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values : tensor([[-0.0147, -0.0229,  0.0180,  ..., -0.0013,  0.0177,  0.0070],
        [-0.0202, -0.0417, -0.0279,  ..., -0.0441,  0.0185, -0.0268]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) | Values : tensor([ 0.0070, -0.0411], device='cuda:0', grad_fn=<SliceBackward0>)
"""