開始使用 C# 的 ORT

使用 .NET CLI 安裝 Nuget 包

dotnet add package Microsoft.ML.OnnxRuntime --version 1.16.0
dotnet add package System.Numerics.Tensors --version 0.1.0

匯入庫

using Microsoft.ML.OnnxRuntime;
using System.Numerics.Tensors;

建立推理方法

這是一個 Azure Function 示例，它使用 ORT 和 C# 對使用 SciKit Learn 建立的 NLP 模型進行推理。

 public static async Task<IActionResult> Run(
            [HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req,
            ILogger log, ExecutionContext context)
        {
            log.LogInformation("C# HTTP trigger function processed a request.");

            string review = req.Query["review"];

            string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
            dynamic data = JsonConvert.DeserializeObject(requestBody);
            review ??= data.review;
            Debug.Assert(!string.IsNullOrEmpty(review), "Expecting a string with a content");

            // Get path to model to create inference session.
            const string modelPath = "./model.onnx";

            // Create an InferenceSession from the Model Path.
            // Creating and loading sessions are expensive per request.
            // They better be cached
            using var session = new InferenceSession(modelPath);

            // create input tensor (nlp example)
            using var inputOrtValue = OrtValue.CreateTensorWithEmptyStrings(OrtAllocator.DefaultInstance, new long[] { 1, 1 });
            inputOrtValue.StringTensorSetElementAt(review, 0);

            // Create input data for session. Request all outputs in this case.
            var inputs = new Dictionary<string, OrtValue>
            {
                { "input", inputOrtValue }
            };

            using var runOptions = new RunOptions();

            // We are getting a sequence of maps as output. We are interested in the first element (map) of the sequence.
            // That result is a Sequence of Maps, and we only need the first map from there.
            using var outputs = session.Run(runOptions, inputs, session.OutputNames);
            Debug.Assert(outputs.Count > 0, "Expecting some output");

            // We want the last output, which is the sequence of maps
            var lastOutput = outputs[outputs.Count - 1];

            // Optional code to check the output type
            {
                var outputTypeInfo = lastOutput.GetTypeInfo();
                Debug.Assert(outputTypeInfo.OnnxType == OnnxValueType.ONNX_TYPE_SEQUENCE, "Expecting a sequence");

                var sequenceTypeInfo = outputTypeInfo.SequenceTypeInfo;
                Debug.Assert(sequenceTypeInfo.ElementType.OnnxType == OnnxValueType.ONNX_TYPE_MAP, "Expecting a sequence of maps");
            }

            var elementsNum = lastOutput.GetValueCount();
            Debug.Assert(elementsNum > 0, "Expecting a non empty sequence");

            // Get the first map in sequence
            using var firstMap = lastOutput.GetValue(0, OrtAllocator.DefaultInstance);

            // Optional code just checking
            {
                // Maps always have two elements, keys and values
                // We are expecting this to be a map of strings to floats
                var mapTypeInfo = firstMap.GetTypeInfo().MapTypeInfo;
                Debug.Assert(mapTypeInfo.KeyType == TensorElementType.String, "Expecting keys to be strings");
                Debug.Assert(mapTypeInfo.ValueType.OnnxType == OnnxValueType.ONNX_TYPE_TENSOR, "Values are in the tensor");
                Debug.Assert(mapTypeInfo.ValueType.TensorTypeAndShapeInfo.ElementDataType == TensorElementType.Float, "Result map value is float");
            }

            var inferenceResult = new Dictionary<string, float>();
            // Let use the visitor to read map keys and values
            // Here keys and values are represented with the same number of corresponding entries
            // string -> float
            firstMap.ProcessMap((keys, values) => {
                // Access native buffer directly
                var valuesSpan = values.GetTensorDataAsSpan<float>();

                var entryCount = (int)keys.GetTensorTypeAndShape().ElementCount;
                inferenceResult.EnsureCapacity(entryCount);
                for (int i = 0; i < entryCount; ++i)
                {
                    inferenceResult.Add(keys.GetStringElement(i), valuesSpan[i]);
                }
            }, OrtAllocator.DefaultInstance);


            // Return the inference result as json.
            return new JsonResult(inferenceResult);

        }

複用輸入/輸出張量緩衝區

在某些場景中，您可能希望複用輸入/輸出張量。這通常發生在您想要鏈式連線兩個模型（即，將一個模型的輸出作為另一個模型的輸入），或者在多次推理執行時加速推理速度的情況下。

鏈式連線：將模型 A 的輸出作為模型 B 的輸入

using Microsoft.ML.OnnxRuntime.Tensors;
using Microsoft.ML.OnnxRuntime;

namespace Samples
{
    class FeedModelAToModelB
    {
        static void Program()
        {
            const string modelAPath = "./modelA.onnx";
            const string modelBPath = "./modelB.onnx";
            using InferenceSession session1 = new InferenceSession(modelAPath);
            using InferenceSession session2 = new InferenceSession(modelBPath);

            // Illustration only
            float[] inputData = { 1, 2, 3, 4 };
            long[] inputShape = { 1, 4 };

            using var inputOrtValue = OrtValue.CreateTensorValueFromMemory(inputData, inputShape);

            // Create input data for session. Request all outputs in this case.
            var inputs1 = new Dictionary<string, OrtValue>
            {
                { "input", inputOrtValue }
            };

            using var runOptions = new RunOptions();

            // session1 inference
            using (var outputs1 = session1.Run(runOptions, inputs1, session1.OutputNames))
            {
                // get intermediate value
                var outputToFeed = outputs1.First();

                // modify the name of the ONNX value
                // create input list for session2
                var inputs2 = new Dictionary<string, OrtValue>
                {
                    { "inputNameForModelB", outputToFeed }
                };

                // session2 inference
                using (var results = session2.Run(runOptions, inputs2, session2.OutputNames))
                {
                    // manipulate the results
                }
            }
        }
    }
}

具有固定大小輸入和輸出的多次推理執行

如果模型具有固定大小的數值張量輸入和輸出，請使用更優選的 OrtValue 及其 API 來加速推理速度並最大限度地減少資料傳輸。 OrtValue 類使得複用輸入和輸出張量的底層緩衝區成為可能。它會固定託管緩衝區並將其用於推理。它還提供了對輸出的原始緩衝區的直接訪問。您還可以為輸出預分配 OrtValue，或在現有緩衝區之上建立它。這避免了一些開銷，對於在總執行時間中可察覺到時間影響的較小模型而言可能是有益的。

請記住，OrtValue 類與 Onnxruntime C# API 中的許多其他類一樣，是 IDisposable 的。它需要正確處置，以解除託管緩衝區的固定或釋放原始緩衝區，從而避免記憶體洩漏。

在 GPU 上執行（可選）

如果使用 GPU 包，只需在建立 InferenceSession 時使用適當的 SessionOptions 即可。

int gpuDeviceId = 0; // The GPU device ID to execute on
using var gpuSessionOptoins = SessionOptions.MakeSessionOptionWithCudaProvider(gpuDeviceId);
using var session = new InferenceSession("model.onnx", gpuSessionOptoins);

ONNX Runtime C# API

ONNX Runtime 提供了 C# .NET 繫結，用於在任何 .NET 標準平臺上對 ONNX 模型執行推理。

支援的版本

.NET Standard 1.1

構建版本

工件	描述	支援的平臺
Microsoft.ML.OnnxRuntime	CPU（釋出版）	Windows, Linux, Mac, X64, X86 (僅限 Windows), ARM64 (僅限 Windows)…更多詳情請參見: 相容性
Microsoft.ML.OnnxRuntime.Gpu	GPU - CUDA（釋出版）	Windows, Linux, Mac, X64…更多詳情請參見: 相容性
Microsoft.ML.OnnxRuntime.DirectML	GPU - DirectML（釋出版）	Windows 10 1709+
onnxruntime	CPU, GPU (開發版), CPU (裝置端訓練)	與釋出版本相同
Microsoft.ML.OnnxRuntime.Training	CPU 裝置端訓練（釋出版）	Windows, Linux, Mac, X64, X86 (僅限 Windows), ARM64 (僅限 Windows)…更多詳情請參見: 相容性