使用 ONNX Runtime 在不同硬體目標上進行 PyTorch 模型推理

作為希望部署 PyTorch 或 ONNX 模型並最大限度提高效能和硬體靈活性的開發人員，您可以利用 ONNX Runtime 在您的硬體平臺上最佳化執行模型。

在本教程中，您將學習

如何使用 PyTorch ResNet-50 模型進行影像分類
轉換為 ONNX，以及
使用 ONNX Runtime 部署到預設 CPU、NVIDIA CUDA (GPU) 和 Intel OpenVINO——使用相同的應用程式程式碼在不同硬體平臺上載入和執行推理。

ONNX 是由微軟、Meta、亞馬遜和其他科技公司開發的開源機器學習模型格式，旨在標準化並簡化機器學習模型在各種硬體上的部署。ONNX Runtime 由微軟貢獻和維護，用於最佳化 ONNX 模型在 PyTorch、TensorFlow 等框架上的效能。ResNet-50 模型在 ImageNet 資料集上訓練後，常用於影像分類。

本教程演示瞭如何使用 Microsoft Azure Machine Learning，在 CPU、GPU 和帶有 OpenVINO 和 ONNX Runtime 的 Intel 硬體上執行 ONNX 模型。

設定

作業系統先決條件

您的環境應已安裝 curl。

裝置先決條件

onnxruntime-gpu 庫需要訪問您裝置或計算叢集中的 NVIDIA CUDA 加速器，但對於 CPU 和 OpenVINO-CPU 演示，僅在 CPU 上執行即可。

推理先決條件

確保您有可供推理的影像。對於本教程，我們有一個“cat.jpg”影像，它與 Notebook 檔案位於同一目錄中。

環境先決條件

在 Azure Notebook 終端或 AnaConda 命令列視窗中，執行以下命令建立用於 CPU、GPU 和/或 OpenVINO 的三個環境（差異部分已加粗）。

CPU

conda create -n cpu_env_demo python=3.8
conda activate cpu_env_demo
conda install -c anaconda ipykernel
conda install -c conda-forge ipywidgets
python -m ipykernel install --user --name=cpu_env_demo
jupyter notebook

GPU

conda create -n gpu_env_demo python=3.8
conda activate gpu_env_demo 
conda install -c anaconda ipykernel
conda install -c conda-forge ipywidgets
python -m ipykernel install --user --name=gpu_env_demo 
jupyter notebook

OpenVINO

conda create -n openvino_env_demo python=3.8
conda activate openvino_env_demo 
conda install -c anaconda ipykernel
conda install -c conda-forge ipywidgets
python -m ipykernel install --user --name=openvino_env_demo
python -m pip install --upgrade pip
pip install openvino

庫要求

在第一個程式碼單元格中，使用以下程式碼片段安裝必要的庫（差異部分已加粗）。

CPU + GPU

import sys

if sys.platform in ['linux', 'win32']: # Linux or Windows
    !{sys.executable} -m pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio===0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
else: # Mac
    print("PyTorch 1.9 MacOS Binaries do not support CUDA, install from source instead")

!{sys.executable} -m pip install onnxruntime-gpu onnx onnxconverter_common==1.8.1 pillow

OpenVINO

import sys

if sys.platform in ['linux', 'win32']: # Linux or Windows
    !{sys.executable} -m pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio===0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
else: # Mac
    print("PyTorch 1.9 MacOS Binaries do not support CUDA, install from source instead")

!{sys.executable} -m pip install onnxruntime-openvino onnx onnxconverter_common==1.8.1 pillow

import openvino.utils as utils
utils.add_openvino_libs_to_path()

ResNet-50 演示

環境設定

匯入必要的庫以獲取模型並執行推理。

from torchvision import models, datasets, transforms as T
import torch
from PIL import Image
import numpy as np

載入並匯出預訓練 ResNet-50 模型到 ONNX

從 PyTorch 下載預訓練的 ResNet-50 模型並匯出為 ONNX 格式。

resnet50 = models.resnet50(pretrained=True)

# Download ImageNet labels
!curl -o imagenet_classes.txt https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt

# Read the categories
with open("imagenet_classes.txt", "r") as f:
    categories = [s.strip() for s in f.readlines()]

# Export the model to ONNX
image_height = 224
image_width = 224
x = torch.randn(1, 3, image_height, image_width, requires_grad=True)
torch_out = resnet50(x)
torch.onnx.export(resnet50,                     # model being run
                  x,                            # model input (or a tuple for multiple inputs)
                  "resnet50.onnx",              # where to save the model (can be a file or file-like object)
                  export_params=True,           # store the trained parameter weights inside the model file
                  opset_version=12,             # the ONNX version to export the model to
                  do_constant_folding=True,     # whether to execute constant folding for optimization
                  input_names = ['input'],      # the model's input names
                  output_names = ['output'])    # the model's output names

示例輸出

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 10472 100 10472 0 0 50581 0 --:--:-- --:--:-- --:--:-- 50834

設定推理預處理

為您希望模型進行推理的影像（例如 cat.jpg）建立預處理。

# Pre-processing for ResNet-50 Inferencing, from https://pytorch.com.tw/hub/pytorch_vision_resnet/
resnet50.eval()  
filename = 'cat.jpg' # change to your filename

input_image = Image.open(filename)
preprocess = T.Compose([
    T.Resize(256),
    T.CenterCrop(224),
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model

# move the input and model to GPU for speed if available
print("GPU Availability: ", torch.cuda.is_available())
if torch.cuda.is_available():
    input_batch = input_batch.to('cuda')
    resnet50.to('cuda')

示例輸出

GPU Availability: False

使用 ONNX Runtime 推理 ResNet-50 ONNX 模型

使用 ONNX Runtime 推理模型，方法是為環境選擇合適的執行提供程式。如果您的環境使用 CPU，請取消註釋 CPUExecutionProvider；如果環境使用 NVIDIA CUDA，請取消註釋 CUDAExecutionProvider；如果環境使用 OpenVINOExecutionProvider，請取消註釋 OpenVINOExecutionProvider——並註釋掉其他 onnxruntime.InferenceSession 程式碼行。

# Inference with ONNX Runtime
import onnxruntime
from onnx import numpy_helper
import time

session_fp32 = onnxruntime.InferenceSession("resnet50.onnx", providers=['CPUExecutionProvider'])
# session_fp32 = onnxruntime.InferenceSession("resnet50.onnx", providers=['CUDAExecutionProvider'])
# session_fp32 = onnxruntime.InferenceSession("resnet50.onnx", providers=['OpenVINOExecutionProvider'])

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

latency = []
def run_sample(session, image_file, categories, inputs):
    start = time.time()
    input_arr = inputs.cpu().detach().numpy()
    ort_outputs = session.run([], {'input':input_arr})[0]
    latency.append(time.time() - start)
    output = ort_outputs.flatten()
    output = softmax(output) # this is optional
    top5_catid = np.argsort(-output)[:5]
    for catid in top5_catid:
        print(categories[catid], output[catid])
    return ort_outputs

ort_output = run_sample(session_fp32, 'cat.jpg', categories, input_batch)
print("ONNX Runtime CPU/GPU/OpenVINO Inference time = {} ms".format(format(sum(latency) * 1000 / len(latency), '.2f')))

示例輸出

Egyptian cat 0.78605634
tabby 0.117310025
tiger cat 0.020089425
Siamese cat 0.011728076
plastic bag 0.0052174763
ONNX Runtime CPU Inference time = 32.34 ms

與 PyTorch 的比較

使用 PyTorch 針對 ONNX Runtime CPU 和 GPU 的準確性及延遲進行基準測試。

# Inference with OpenVINO
from openvino.runtime import Core

ie = Core()
onnx_model_path = "./resnet50.onnx"
model_onnx = ie.read_model(model=onnx_model_path)
compiled_model_onnx = ie.compile_model(model=model_onnx, device_name="CPU")

# inference
output_layer = next(iter(compiled_model_onnx.outputs))

latency = []
input_arr = input_batch.detach().numpy()
inputs = {'input':input_arr}
start = time.time()
request = compiled_model_onnx.create_infer_request()
output = request.infer(inputs=inputs)

outputs = request.get_output_tensor(output_layer.index).data
latency.append(time.time() - start)

print("OpenVINO CPU Inference time = {} ms".format(format(sum(latency) * 1000 / len(latency), '.2f')))

print("***** Verifying correctness *****")
for i in range(2):
    print('OpenVINO and ONNX Runtime output {} are close:'.format(i), np.allclose(ort_output, outputs, rtol=1e-05, atol=1e-04))

示例輸出

Egyptian cat 0.7820879
tabby 0.113261245
tiger cat 0.020114701
Siamese cat 0.012514038
plastic bag 0.0056432663
OpenVINO CPU Inference time = 31.83 ms
***** Verifying correctness *****
PyTorch and ONNX Runtime output 0 are close: True
PyTorch and ONNX Runtime output 1 are close: True

與 OpenVINO 的比較

使用 OpenVINO 針對 ONNX Runtime OpenVINO 的準確性及延遲進行基準測試。

# Inference with OpenVINO
from openvino.runtime import Core

ie = Core()
onnx_model_path = "./resnet50.onnx"
model_onnx = ie.read_model(model=onnx_model_path)
compiled_model_onnx = ie.compile_model(model=model_onnx, device_name="CPU")

# inference
output_layer = next(iter(compiled_model_onnx.outputs))

latency = []
input_arr = input_batch.detach().numpy()
inputs = {'input':input_arr}
start = time.time()
request = compiled_model_onnx.create_infer_request()
output = request.infer(inputs=inputs)

outputs = request.get_output_tensor(output_layer.index).data
latency.append(time.time() - start)

print("OpenVINO CPU Inference time = {} ms".format(format(sum(latency) * 1000 / len(latency), '.2f')))

print("***** Verifying correctness *****")
for i in range(2):
    print('OpenVINO and ONNX Runtime output {} are close:'.format(i), np.allclose(ort_output, outputs, rtol=1e-05, atol=1e-04))

示例輸出

Egyptian cat 0.7820879
tabby 0.113261245
tiger cat 0.020114701
Siamese cat 0.012514038
plastic bag 0.0056432663
OpenVINO CPU Inference time = 31.83 ms
***** Verifying correctness *****
PyTorch and ONNX Runtime output 0 are close: True
PyTorch and ONNX Runtime output 1 are close: True

結論

我們已經證明，ONNX Runtime 是在 CPU、NVIDIA CUDA (GPU) 和 Intel OpenVINO（移動裝置）上執行 PyTorch 或 ONNX 模型的有效方式。ONNX Runtime 可以部署到更多型別的硬體，這些資訊可以在執行提供程式中找到。我們很樂意透過參與我們的 ONNX Runtime Github 倉庫來聽取您的反饋。

影片演示

請點選此處觀看影片，瞭解 ResNet-50 部署和靈活推理的更多解釋以及分步指南。