ORT 模型格式

什麼是 ORT 模型格式？

ORT 格式是精簡版 ONNX Runtime 構建支援的格式。精簡版構建可能更適合用於大小受限的環境，例如移動和 Web 應用程式。

完整版 ONNX Runtime 構建支援 ORT 格式模型和 ONNX 模型。

向後相容性

通常，我們的目標是特定版本的 ONNX Runtime 可以運行當前（ONNX Runtime 釋出時）或更舊版本的 ORT 格式模型。

儘管我們努力保持向後相容性，但仍存在一些重大更改。

ONNX Runtime 版本	ORT 格式版本支援	備註
1.14+	v5, v4 (有限支援)	有關 v4 有限支援的詳細資訊，請參閱此處。
1.13	v5	v5 重大更改：移除了核心定義雜湊。
1.12-1.8	v4	v4 重大更改：更新了核心定義雜湊計算。
1.7	v3, v2, v1
1.6	v2, v1
1.5	v1	ORT 格式引入

將 ONNX 模型轉換為 ORT 格式

ONNX 模型使用 convert_onnx_models_to_ort 指令碼轉換為 ORT 格式。

轉換指令碼執行兩個功能

載入並最佳化 ONNX 格式模型，並將其儲存為 ORT 格式
確定最佳化模型所需的運算子以及可選的資料型別，並將它們儲存到配置檔案中，以便在需要時用於精簡運算子構建

轉換指令碼可以針對單個 ONNX 模型或目錄執行。如果針對目錄執行，將遞迴搜尋目錄中的“.onnx”檔案進行轉換。

每個“.onnx”檔案都會被載入、最佳化，並以“.ort”副檔名儲存為 ORT 格式檔案，儲存在與原始“.onnx”檔案相同的位置。

指令碼輸出

每個 ONNX 模型對應一個 ORT 格式模型
一個構建配置檔案（“required_operators.config”），其中包含最佳化後的 ONNX 模型所需的運算子。

如果啟用了型別縮減（ONNX Runtime 1.7 版或更高版本），則配置檔案還將包含每個運算子所需的型別，並命名為“required_operators_and_types.config”。

如果您使用的是預構建的 ONNX Runtime iOS、Android 或 Web 包，則構建配置檔案不會被使用，可以忽略。

指令碼位置

ORT 模型格式受 ONNX Runtime 1.5.2 版或更高版本支援。

將 ONNX 格式模型轉換為 ORT 格式利用 ONNX Runtime python 包，因為模型會在轉換過程中載入到 ONNX Runtime 並進行最佳化。

對於 ONNX Runtime 1.8 版及更高版本，轉換指令碼直接從 ONNX Runtime python 包執行。

對於早期版本，轉換指令碼從本地 ONNX Runtime 倉庫執行。

安裝 ONNX Runtime

從 https://pypi.org/project/onnxruntime/ 安裝 onnxruntime python 包，以便將模型從 ONNX 格式轉換為內部 ORT 格式。需要 1.5.3 或更高版本。

安裝最新發布版本

pip install onnxruntime

安裝以前的釋出版本

如果您正在從原始碼構建 ONNX Runtime（自定義、精簡或最小構建），則必須將 python 包版本與您簽出的 ONNX Runtime 倉庫分支相匹配。

例如，要使用 1.7 版本

git checkout rel-1.7.2
pip install onnxruntime==1.7.2

如果您在 git 倉庫中使用 main 分支，您應該使用每夜構建的 ONNX Runtime python 包

pip install -U -i https://test.pypi.org/simple/ ort-nightly

將 ONNX 模型轉換為 ORT 格式指令碼用法

ONNX Runtime 1.8 或更高版本

python -m onnxruntime.tools.convert_onnx_models_to_ort <onnx model file or dir>

其中

onnx model file or dir 是 .onnx 檔案或包含一個或多個 .onnx 模型的目錄的路徑

透過執行帶 --help 引數的指令碼可以檢視當前的可選引數。支援的引數和預設值在 ONNX Runtime 版本之間略有不同。

ONNX Runtime 1.11 的幫助文字

python -m onnxruntime.tools.convert_onnx_models_to_ort --help

  usage: convert_onnx_models_to_ort.py [-h] [--optimization_style {Fixed,Runtime} [{Fixed,Runtime} ...]] [--enable_type_reduction] [--custom_op_library CUSTOM_OP_LIBRARY] [--save_optimized_onnx_model] [--allow_conversion_failures] [--nnapi_partitioning_stop_ops NNAPI_PARTITIONING_STOP_OPS]
                                      [--target_platform {arm,amd64}]
                                      model_path_or_dir

  Convert the ONNX format model/s in the provided directory to ORT format models. All files with a `.onnx` extension will be processed. For each one, an ORT format model will be created in the same directory. A configuration file will also be created containing the list of required
  operators for all converted models. This configuration file should be used as input to the minimal build via the `--include_ops_by_config` parameter.

  positional arguments:
    model_path_or_dir     Provide path to ONNX model or directory containing ONNX model/s to convert. All files with a .onnx extension, including those in subdirectories, will be processed.

  optional arguments:
    -h, --help            show this help message and exit
    --optimization_style {Fixed,Runtime} [{Fixed,Runtime} ...]
                          Style of optimization to perform on the ORT format model. Multiple values may be provided. The conversion will run once for each value. The general guidance is to use models optimized with 'Runtime' style when using NNAPI or CoreML and 'Fixed' style otherwise.
                          'Fixed': Run optimizations directly before saving the ORT format model. This bakes in any platform-specific optimizations. 'Runtime': Run basic optimizations directly and save certain other optimizations to be applied at runtime if possible. This is useful when
                          using a compiling EP like NNAPI or CoreML that may run an unknown (at model conversion time) number of nodes. The saved optimizations can further optimize nodes not assigned to the compiling EP at runtime.
    --enable_type_reduction
                          Add operator specific type information to the configuration file to potentially reduce the types supported by individual operator implementations.
    --custom_op_library CUSTOM_OP_LIBRARY
                          Provide path to shared library containing custom operator kernels to register.
    --save_optimized_onnx_model
                          Save the optimized version of each ONNX model. This will have the same level of optimizations applied as the ORT format model.
    --allow_conversion_failures
                          Whether to proceed after encountering model conversion failures.
    --nnapi_partitioning_stop_ops NNAPI_PARTITIONING_STOP_OPS
                          Specify the list of NNAPI EP partitioning stop ops. In particular, specify the value of the "ep.nnapi.partitioning_stop_ops" session options config entry.
    --target_platform {arm,amd64}
                          Specify the target platform where the exported model will be used. This parameter can be used to choose between platform-specific options, such as QDQIsInt8Allowed(arm), NCHWc (amd64) and NHWC (arm/amd64) format, different optimizer level options, etc.

可選指令碼引數

最佳化方式

自 ONNX Runtime 1.11 起

指定轉換後的模型是完全最佳化（“Fixed”）還是保留執行時最佳化（“Runtime”）。預設情況下會生成這兩種型別的模型。有關更多資訊，請參閱此處。

這取代了早期 ONNX Runtime 版本中的最佳化級別選項。

最佳化級別

ONNX Runtime 1.10 及更早版本

設定 ONNX Runtime 在以 ORT 格式儲存模型之前用於最佳化模型的最佳化級別。

對於 ONNX Runtime 1.8 及更高版本，如果模型將使用 CPU EP 執行，建議使用 all 級別。

對於早期版本，建議使用 extended 級別，因為 all 級別以前包含裝置特定的最佳化，這會限制模型的移植性。

如果模型要與 NNAPI EP 或 CoreML EP 一起執行，建議使用 basic 最佳化級別建立 ORT 格式模型。應進行效能測試，比較使用啟用 NNAPI 或 CoreML EP 的此模型執行與使用 CPU EP 最佳化到更高級別的模型執行，以確定最佳設定。

有關更多資訊，請參閱關於移動場景效能調優的文件。

啟用型別縮減

在 ONNX Runtime 1.7 及更高版本中，可以限制所需運算子支援的資料型別，以進一步減小構建大小。本檔案中將此剪枝稱為“運算子型別縮減”。當 ONNX 模型轉換時，每個運算子所需的輸入和輸出資料型別將累積幷包含在配置檔案中。

如果您希望啟用運算子型別縮減，則必須安裝 Flatbuffers python 包。

pip install flatbuffers

例如，Softmax 的 ONNX Runtime 核心同時支援 float 和 double。如果您的模型使用 Softmax 但僅使用 float 資料，我們可以排除支援 double 的實現，以減小核心的二進位制大小。

自定義運算子支援

如果您的 ONNX 模型使用自定義運算子，則必須提供包含自定義運算子核心的庫路徑，以便成功載入 ONNX 模型。自定義運算子將保留在 ORT 格式模型中。

儲存最佳化後的 ONNX 模型

新增此標誌以儲存最佳化後的 ONNX 模型。最佳化後的 ONNX 模型包含與 ORT 格式模型相同的節點和初始化器，可以在 Netron 中檢視，用於除錯和效能調優。

以前的 ONNX Runtime 版本

在 ONNX Runtime 1.7 版本之前，模型轉換指令碼必須從克隆的原始碼倉庫執行

python <ONNX Runtime repository root>/tools/python/convert_onnx_models_to_ort.py <onnx model file or dir>

載入並執行 ORT 格式模型

執行 ORT 格式模型的 API 與 ONNX 模型相同。

有關單個 API 用法的詳細資訊，請參閱ONNX Runtime API 文件。

各平臺 API

平臺	可用 API
Android	C, C++, Java, Kotlin
iOS	C, C++, Objective-C (透過橋接支援 Swift)
Web	JavaScript

ORT 格式模型載入

如果您提供 ORT 格式模型的檔名，副檔名為“.ort”的檔案將被推斷為 ORT 格式模型。

如果您提供 ORT 格式模型的記憶體中位元組，則會檢查這些位元組中的標記以確定它是否為 ORT 格式模型。

如果您希望明確說明 InferenceSession 輸入是 ORT 格式模型，可以透過 SessionOptions 進行設定，儘管這通常不是必需的。

從檔案路徑載入 ORT 格式模型

C++ API

Ort::SessionOptions session_options;
session_options.AddConfigEntry("session.load_model_format", "ORT");

Ort::Env env;
Ort::Session session(env, <path to model>, session_options);

Java API

SessionOptions session_options = new SessionOptions();
session_options.addConfigEntry("session.load_model_format", "ORT");

OrtEnvironment env = OrtEnvironment.getEnvironment();
OrtSession session = env.createSession(<path to model>, session_options);

JavaScript API

import * as ort from "onnxruntime-web";

const session = await ort.InferenceSession.create("<path to model>");

從記憶體位元組陣列載入 ORT 格式模型

如果使用包含 ORT 格式模型資料的輸入位元組陣列建立會話，預設情況下，我們將在會話建立時複製模型位元組，以確保模型位元組緩衝區有效。

您還可以透過將 SessionOptions 配置項 session.use_ort_model_bytes_directly 設定為 1 來啟用直接使用模型位元組的選項。這可能會降低 ONNX Runtime Mobile 的峰值記憶體使用量，但您需要保證模型位元組在 ORT 會話的整個生命週期內都是有效的。對於 ONNX Runtime Web，此選項預設啟用。

如果啟用了 session.use_ort_model_bytes_directly，還有一個選項是直接使用初始化器的模型位元組，以進一步降低峰值記憶體使用量。將 Session Options 配置項 session.use_ort_model_bytes_for_initializers 設定為 1 以啟用此功能。請注意，如果某個初始化器被預打包，則會抵消直接使用該初始化器模型位元組所帶來的峰值記憶體使用量節省，因為需要為預打包資料分配新的緩衝區。預打包是一種可選的效能最佳化，涉及將初始化器佈局更改為當前平臺的最佳順序（如果不同）。如果降低峰值記憶體使用量比潛在的效能最佳化更重要，可以透過將 session.disable_prepacking 設定為 1 來停用預打包。

C++ API

Ort::SessionOptions session_options;
session_options.AddConfigEntry("session.load_model_format", "ORT");
session_options.AddConfigEntry("session.use_ort_model_bytes_directly", "1");

std::ifstream stream(<path to model>, std::ios::in | std::ios::binary);
std::vector<uint8_t> model_bytes((std::istreambuf_iterator<char>(stream)), std::istreambuf_iterator<char>());

Ort::Env env;
Ort::Session session(env, model_bytes.data(), model_bytes.size(), session_options);

Java API

SessionOptions session_options = new SessionOptions();
session_options.addConfigEntry("session.load_model_format", "ORT");
session_options.addConfigEntry("session.use_ort_model_bytes_directly", "1");

byte[] model_bytes = Files.readAllBytes(Paths.get(<path to model>));

OrtEnvironment env = OrtEnvironment.getEnvironment();
OrtSession session = env.createSession(model_bytes, session_options);

JavaScript API

import * as ort from "onnxruntime-web";

const response = await fetch(modelUrl);
const arrayBuffer = await response.arrayBuffer();
model_bytes = new Uint8Array(arrayBuffer);

const session = await ort.InferenceSession.create(model_bytes);