Azure 執行提供程式 (預覽版)

Azure 執行提供程式使 ONNX Runtime 能夠呼叫遠端 Azure 端點進行推理，該端點必須事先部署或可用。

自 1.16 版本起，以下可插拔運算元可從 onnxruntime-extensions 獲取：

透過這些運算元，Azure 執行提供程式支援兩種使用模式：

邊緣和 Azure 並行
合併並執行混合模式

Azure 執行提供程式處於預覽階段，所有 API 和用法都可能發生變化。

安裝

自 1.16 版本起，Azure 執行提供程式預設隨 Python 和 NuGet 包一起釋出。

要求

自 1.16 版本起，所有 Azure 執行提供程式運算元都隨 onnxruntime-extensions (>=v0.9.0) Python 和 NuGet 包一起釋出。請確保在使用 Azure 執行提供程式之前安裝正確的 onnxruntime-extension 包。

構建

有關構建說明，請參閱構建頁面。

用法

邊緣和 Azure 並行

在此模式下，有兩個模型同時執行。Azure 模型透過 RunAsync API 非同步執行，該 API 也可透過 Python 和 C# 獲得。

import os
import onnx
from onnx import helper, TensorProto
from onnxruntime_extensions import get_library_path
from onnxruntime import SessionOptions, InferenceSession
import numpy as np
import threading


# Generate the local model by:
# https://github.com/microsoft/onnxruntime-extensions/blob/main/tutorials/whisper_e2e.py
def get_whiper_tiny():
    return '/onnxruntime-extensions/tutorials/whisper_onnx_tiny_en_fp32_e2e.onnx'


# Generate the azure model
def get_openai_audio_azure_model():
    auth_token = helper.make_tensor_value_info('auth_token', TensorProto.STRING, [1])
    model = helper.make_tensor_value_info('model_name', TensorProto.STRING, [1])
    response_format = helper.make_tensor_value_info('response_format', TensorProto.STRING, [-1])
    file = helper.make_tensor_value_info('file', TensorProto.UINT8, [-1])

    transcriptions = helper.make_tensor_value_info('transcriptions', TensorProto.STRING, [-1])

    invoker = helper.make_node('OpenAIAudioToText',
                               ['auth_token', 'model_name', 'response_format', 'file'],
                               ['transcriptions'],
                               domain='com.microsoft.extensions',
                               name='audio_invoker',
                               model_uri='https://api.openai.com/v1/audio/transcriptions',
                               audio_format='wav',
                               verbose=False)

    graph = helper.make_graph([invoker], 'graph', [auth_token, model, response_format, file], [transcriptions])
    model = helper.make_model(graph, ir_version=8,
                              opset_imports=[helper.make_operatorsetid('com.microsoft.extensions', 1)])
    model_name = 'openai_whisper_azure.onnx'
    onnx.save(model, model_name)
    return model_name


if __name__ == '__main__':
    sess_opt = SessionOptions()
    sess_opt.register_custom_ops_library(get_library_path())

    azure_model_path = get_openai_audio_azure_model()
    azure_model_sess = InferenceSession(azure_model_path,
        sess_opt, providers=['CPUExecutionProvider', 'AzureExecutionProvider'])  # load AzureEP

    with open('test16.wav', "rb") as _f:  # read raw audio data from a local wav file
        audio_stream = np.asarray(list(_f.read()), dtype=np.uint8)

    azure_model_inputs = {
        "auth_token": np.array([os.getenv('AUDIO', '')]),  # read auth from env variable
        "model_name": np.array(['whisper-1']),
        "response_format":  np.array(['text']),
        "file": audio_stream
    }


    class RunAsyncState:
        def __init__(self):
            self.__event = threading.Event()
            self.__outputs = None
            self.__err = ''

        def fill_outputs(self, outputs, err):
            self.__outputs = outputs
            self.__err = err
            self.__event.set()

        def get_outputs(self):
            if self.__err != '':
                raise Exception(self.__err)
            return self.__outputs;

        def wait(self, sec):
            self.__event.wait(sec)


    def azureRunCallback(outputs: np.ndarray, state: RunAsyncState, err: str) -> None:
        state.fill_outputs(outputs, err)


    run_async_state = RunAsyncState();
    # infer azure model asynchronously
    azure_model_sess.run_async(None, azure_model_inputs, azureRunCallback, run_async_state)

    # in the same time, run the edge
    edge_model_path = get_whiper_tiny()
    edge_model_sess = InferenceSession(edge_model_path,
        sess_opt, providers=['CPUExecutionProvider'])

    edge_model_outputs = edge_model_sess.run(None, {
        'audio_stream': np.expand_dims(audio_stream, 0),
        'max_length': np.asarray([200], dtype=np.int32),
        'min_length': np.asarray([0], dtype=np.int32),
        'num_beams': np.asarray([2], dtype=np.int32),
        'num_return_sequences': np.asarray([1], dtype=np.int32),
        'length_penalty': np.asarray([1.0], dtype=np.float32),
        'repetition_penalty': np.asarray([1.0], dtype=np.float32)
    })

    print("\noutput from whisper tiny: ", edge_model_outputs)
    run_async_state.wait(10)
    print("\nresponse from openAI: ", run_async_state.get_outputs())
    # compare results and pick the better

合併並執行混合模式

或者，也可以將本地模型和 Azure 模型合併為混合模型，然後像普通 ONNX 模型一樣進行推理。示例指令碼可在此處找到：此處。

當前限制

僅在 Windows、Linux 和 Android 上構建和執行。
對於 Android，不支援 AzureTritonInvoker。

Azure 執行提供程式 (預覽版)

目錄

安裝

要求

構建

用法

邊緣和 Azure 並行

合併並執行混合模式

當前限制