PyTorch forward 병렬 처리, 연산 그래프

Notice

Recent Posts

Recent Comments

Link

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Tags more

Archives

Today

Total

관리 메뉴

기계는 거짓말하지 않는다

PyTorch forward 병렬 처리, 연산 그래프 본문

PyTorch forward 병렬 처리, 연산 그래프

KillinTime 2025. 3. 31. 23:04

PyTorch의 forward() 연산 방식은 기본적으로 순차적으로 실행되는 것처럼 보이지만,

내부적으로 가능한 연산을 병렬로 실행한다. 특히 GPU에서는 연산이 비동기적으로 수행될 수 있고

병렬 처리를 더욱 효율적으로 활용할 수 있다.

연산 그래프

PyTorch에서 forward() 실행 시 생성되는 연산 그래프(Computation Graph)는

동적 방식(Dynamic Computation Graph)으로 작동한다.

이는 모델의 forward pass 과정에서 연산이 발생할 때마다 그래프가 즉시 생성되는 방식이다.

연산 그래프의 특징

동적 생성(Dynamic Graph)
TensorFlow의 정적 그래프(Static Graph)와 달리, PyTorch는 실행 시점에서 그래프를 즉시 생성한다.
따라서 모델 구조를 유연하게 변경할 수 있고, 디버깅이 상대적으로 쉽다.
Autograd와의 연계
PyTorch의 자동 미분(Autograd) 시스템은 이 연산 그래프를 활용하여 역전파 시 gradient를 계산한다.
연산마다 gradient를 계산할 수 있도록 그래프의 노드가 연결된다.
병렬 연산 가능성
연산 종속성이 없는 노드들은 병렬로 실행될 수 있다.
특히 GPU에서는 내부적으로 CUDA 커널이 독립적으로 실행될 수 있다.

torch.autograd 에 대한 간단한 소개 — 파이토치 한국어 튜토리얼 (PyTorch tutorials in Korean)

torch.autograd 에 대한 간단한 소개

torch.autograd 는 신경망 학습을 지원하는 PyTorch의 자동 미분 엔진입니다. 이 단원에서는 autograd가 신경망 학습을 어떻게 돕는지에 대한 개념적 이해를 할 수 있습니다. 배경(Background): 신경망(NN; Neur

tutorials.pytorch.kr

아래 코드는 서로 다른 GRU 인스턴스를 선언했을 때 병렬 처리가 가능한지,

실제 실행 시간을 비교해보는 간단한 예제이다.

우선 CUDA GPU 사용이 가능해야한다.

import torch
import torch.nn as nn
import time

# GPU 사용 가능해야 함
device = "cuda" if torch.cuda.is_available() else "cpu"

gru1 = nn.GRU(input_size=10, hidden_size=20, batch_first=True).to(device)
gru2 = nn.GRU(input_size=10, hidden_size=20, batch_first=True).to(device)

x1 = torch.randn(32, 5, 10).to(device)  # batch_size=32, seq_len=5, input_size=10
x2 = torch.randn(32, 5, 10).to(device)

# 순차 실행
start_time = time.time()
output1, hidden1 = gru1(x1)
output2, hidden2 = gru2(x2)
end_time = time.time()
print(f"Sequential execution time: {end_time - start_time:.6f} seconds")

# 병렬 실행
start_time = time.time()
with torch.no_grad():
    output1, hidden1 = gru1(x1)
    output2, hidden2 = gru2(x2)
end_time = time.time()
print(f"Parallel execution time: {end_time - start_time:.6f} seconds")

실행 시간 측정 결과는 약 2배정도 차이나는것을 볼 수 있다.

'AI' 카테고리의 다른 글

PyTorch Windows 환경 DataLoader num_workers (0)	2025.05.22
PyTorch Seed Settings (0)	2025.05.01
torchvision transforms Resize의 size 매개변수 (0)	2024.09.26
PyTorch 경고 Creating a tensor from a list of numpy.ndarrays is extremely slow (0)	2024.07.18
torch Could not load library libcudnn_cnn_train.so.8. Error 오류 (0)	2024.07.03

'AI' Related Articles

Comments

기계는 거짓말하지 않는다

PyTorch forward 병렬 처리, 연산 그래프 본문

PyTorch forward 병렬 처리, 연산 그래프

연산 그래프

연산 그래프의 특징

'AI' 카테고리의 다른 글

티스토리툴바