Azure 執行提供程式 (預覽版)
Azure 執行提供程式使 ONNX Runtime 能夠呼叫遠端 Azure 端點進行推理,該端點必須事先部署或可用。
自 1.16 版本起,以下可插拔運算元可從 onnxruntime-extensions 獲取:
透過這些運算元,Azure 執行提供程式支援兩種使用模式:
Azure 執行提供程式處於預覽階段,所有 API 和用法都可能發生變化。
目錄
安裝
自 1.16 版本起,Azure 執行提供程式預設隨 Python 和 NuGet 包一起釋出。
要求
自 1.16 版本起,所有 Azure 執行提供程式運算元都隨 onnxruntime-extensions (>=v0.9.0) Python 和 NuGet 包一起釋出。請確保在使用 Azure 執行提供程式之前安裝正確的 onnxruntime-extension 包。
構建
有關構建說明,請參閱構建頁面。
用法
邊緣和 Azure 並行
在此模式下,有兩個模型同時執行。Azure 模型透過 RunAsync API 非同步執行,該 API 也可透過 Python 和 C# 獲得。
import os
import onnx
from onnx import helper, TensorProto
from onnxruntime_extensions import get_library_path
from onnxruntime import SessionOptions, InferenceSession
import numpy as np
import threading
# Generate the local model by:
# https://github.com/microsoft/onnxruntime-extensions/blob/main/tutorials/whisper_e2e.py
def get_whiper_tiny():
return '/onnxruntime-extensions/tutorials/whisper_onnx_tiny_en_fp32_e2e.onnx'
# Generate the azure model
def get_openai_audio_azure_model():
auth_token = helper.make_tensor_value_info('auth_token', TensorProto.STRING, [1])
model = helper.make_tensor_value_info('model_name', TensorProto.STRING, [1])
response_format = helper.make_tensor_value_info('response_format', TensorProto.STRING, [-1])
file = helper.make_tensor_value_info('file', TensorProto.UINT8, [-1])
transcriptions = helper.make_tensor_value_info('transcriptions', TensorProto.STRING, [-1])
invoker = helper.make_node('OpenAIAudioToText',
['auth_token', 'model_name', 'response_format', 'file'],
['transcriptions'],
domain='com.microsoft.extensions',
name='audio_invoker',
model_uri='https://api.openai.com/v1/audio/transcriptions',
audio_format='wav',
verbose=False)
graph = helper.make_graph([invoker], 'graph', [auth_token, model, response_format, file], [transcriptions])
model = helper.make_model(graph, ir_version=8,
opset_imports=[helper.make_operatorsetid('com.microsoft.extensions', 1)])
model_name = 'openai_whisper_azure.onnx'
onnx.save(model, model_name)
return model_name
if __name__ == '__main__':
sess_opt = SessionOptions()
sess_opt.register_custom_ops_library(get_library_path())
azure_model_path = get_openai_audio_azure_model()
azure_model_sess = InferenceSession(azure_model_path,
sess_opt, providers=['CPUExecutionProvider', 'AzureExecutionProvider']) # load AzureEP
with open('test16.wav', "rb") as _f: # read raw audio data from a local wav file
audio_stream = np.asarray(list(_f.read()), dtype=np.uint8)
azure_model_inputs = {
"auth_token": np.array([os.getenv('AUDIO', '')]), # read auth from env variable
"model_name": np.array(['whisper-1']),
"response_format": np.array(['text']),
"file": audio_stream
}
class RunAsyncState:
def __init__(self):
self.__event = threading.Event()
self.__outputs = None
self.__err = ''
def fill_outputs(self, outputs, err):
self.__outputs = outputs
self.__err = err
self.__event.set()
def get_outputs(self):
if self.__err != '':
raise Exception(self.__err)
return self.__outputs;
def wait(self, sec):
self.__event.wait(sec)
def azureRunCallback(outputs: np.ndarray, state: RunAsyncState, err: str) -> None:
state.fill_outputs(outputs, err)
run_async_state = RunAsyncState();
# infer azure model asynchronously
azure_model_sess.run_async(None, azure_model_inputs, azureRunCallback, run_async_state)
# in the same time, run the edge
edge_model_path = get_whiper_tiny()
edge_model_sess = InferenceSession(edge_model_path,
sess_opt, providers=['CPUExecutionProvider'])
edge_model_outputs = edge_model_sess.run(None, {
'audio_stream': np.expand_dims(audio_stream, 0),
'max_length': np.asarray([200], dtype=np.int32),
'min_length': np.asarray([0], dtype=np.int32),
'num_beams': np.asarray([2], dtype=np.int32),
'num_return_sequences': np.asarray([1], dtype=np.int32),
'length_penalty': np.asarray([1.0], dtype=np.float32),
'repetition_penalty': np.asarray([1.0], dtype=np.float32)
})
print("\noutput from whisper tiny: ", edge_model_outputs)
run_async_state.wait(10)
print("\nresponse from openAI: ", run_async_state.get_outputs())
# compare results and pick the better
合併並執行混合模式
或者,也可以將本地模型和 Azure 模型合併為混合模型,然後像普通 ONNX 模型一樣進行推理。示例指令碼可在此處找到:此處。
當前限制
- 僅在 Windows、Linux 和 Android 上構建和執行。
- 對於 Android,不支援 AzureTritonInvoker。