기계는 거짓말하지 않는다

torch Could not load library libcudnn_cnn_train.so.8. Error 오류 본문

AI

torch Could not load library libcudnn_cnn_train.so.8. Error 오류

KillinTime 2024. 7. 3. 22:07

Python torch 프레임워크 실행 시 다음과 같은 오류가 발생한 경우

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8
Traceback (most recent call last):
  File "/python_path.py", line 267, in <module>
    main()
  File "/python_path.py", line 264, in main
    train(model, dataloader_dict, criterion, optimizer, scheduler, num_classes, device, save_path_dir, num_epochs, enable_validate)
  File "/python_path.py", line 107, in train
    loss.backward()
  File "/lib_path/python3.10/site-packages/torch/_tensor.py", line 522, in backward
    torch.autograd.backward(
  File "/lib_path/python3.10/site-packages/torch/autograd/__init__.py", line 266, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: GET was unable to find an engine to execute this computation

local cudnn과 pip cudnn 충돌일 수 있다.

다음을 확인한다.

# pip 설치 package 확인
pip list

# 예시 출력
...
nvidia-cudnn-cu12        8.9.2.26
...

nvidia-cudnn-버전과 같은 패키지가 설치되어 있다면 패키지를 제거한다.

pip uninstall nvidia-cudnn-cu12

그리고 다시 실행하여 확인한다.

Comments