I/O 繫結

當使用非 CPU 執行提供程式時，在執行圖（呼叫 Run()）之前，將輸入（和/或輸出）安排在目標裝置（由所使用的執行提供程式抽象）上是最有效的。如果輸入未複製到目標裝置，ORT 會在 Run() 呼叫中將其從 CPU 複製。同樣，如果輸出未在裝置上預分配，ORT 會假定輸出是在 CPU 上請求的，並在 Run() 呼叫的最後一步將其從裝置複製。這會侵佔圖的執行時間，誤導使用者認為 ORT 很慢，而大部分時間都花在了這些複製操作上。

為了解決這個問題，我們引入了 IOBinding 的概念。其核心思想是在呼叫 Run() 之前，將輸入複製到裝置，並將輸出預分配到裝置上。IOBinding 在我們所有的語言繫結中均可用。

以下是各種語言中演示此功能用法的程式碼片段。

C++

  Ort::Env env;
  Ort::Session session(env, model_path, session_options);
  Ort::IoBinding io_binding{session};
  auto input_tensor = Ort::Value::CreateTensor<float>(memory_info, input_tensor_values.data(), input_tensor_size, input_node_dims.data(), 4);
  io_binding.BindInput("input1", input_tensor);
  Ort::MemoryInfo output_mem_info{"Cuda", OrtDeviceAllocator, 0,
                                  OrtMemTypeDefault};
  // Use this to bind output to a device when the shape is not known in advance. If the shape is known you can use the other overload of this function that takes an Ort::Value as input (IoBinding::BindOutput(const char* name, const Value& value)).
  // This internally calls the BindOutputToDevice C API.

  io_binding.BindOutput("output1", output_mem_info);
  session.Run(run_options, io_binding);

請注意，在上述程式碼示例中，輸出張量在繫結之前並未分配，而是將 Ort::MemoryInfo 繫結為輸出。這是一種有效的方法，可以讓會話根據所需的形狀分配張量。特別是對於資料依賴型形狀或動態形狀，這可以是一種很好的解決方案，以獲得正確的分配。但是，如果輸出形狀已知且輸出張量應重複使用，那麼將 Ort::Value 繫結到輸出也很有益。這可以使用會話分配器或外部記憶體進行分配。有關更多詳細資訊，請參閱裝置張量文件

 Ort::Allocator gpu_allocator(session, output_mem_info);
 auto output_value = Ort::Value::CreateTensor(
      gpu_allocator, output_shape.data(), output_shape.size(),
      ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT16);
 io_binding.BindOutput("output1", output_mem_info);

Python（請參閱Python API 文件）
C#（請參閱OrtIoBindingAllocationTest.cs）