JavaScript 中用於 BERT NLP 任務的 ONNX Runtime 自定義 Excel 函式

在本教程中，我們將探討如何建立自定義 Excel 函式（ORT.Sentiment() 和 ORT.Question()），以使用 ONNX Runtime Web 實現 BERT NLP 模型，從而在電子表格任務中啟用深度學習。推理在本地進行，就在 Excel 中！

Image of browser inferencing on sample images.

前提條件

Node.js
連線到 Microsoft 365 訂閱的 Office（包括 Web 版 Office）。如果您尚未擁有 Office，可以加入 Microsoft 365 開發者計劃，以獲得一個免費的、可續訂 90 天的 Microsoft 365 訂閱，用於開發期間使用。
有關詳細資訊，請參閱 Office 載入項教程

什麼是自定義函式？

Excel 有許多您可能熟悉的內建函式，例如 SUM()。自定義函式是一個有用的工具，可以透過在 JavaScript 中將這些函式定義為載入項的一部分來建立新函式並將其新增到 Excel 中。這些函式可以在 Excel 中像訪問任何內建函式一樣被訪問。

建立自定義函式專案

既然我們瞭解了什麼是自定義函式，那麼讓我們看看如何建立可以在本地推理模型的函式，以便在單元格中獲取情感文字，或者透過提問並將答案返回到單元格來從單元格中提取資訊。

如果您打算跟著操作，請克隆我們將在本部落格中討論的專案。該專案是使用 Yeoman CLI 的模板專案建立的。在此快速入門中瞭解有關基礎專案的更多資訊。
執行以下命令以安裝包並構建專案。

npm install
npm run build

以下命令將在 Excel 網頁版中執行載入項，並將載入項旁載入到命令中提供的電子表格。

// Command to run on the web.
// Replace "{url}" with the URL of an Excel document.
npm run start:web -- --document {url}

使用以下命令在 Excel 客戶端中執行。

// Command to run on desktop (Windows or Mac)
npm run start:desktop

首次執行專案時，將出現兩個提示
- 一個將要求啟用開發者模式。這是旁載入外掛所必需的。
- 接下來，當出現提示時，接受外掛服務的證書。
要訪問自定義函式，請在空單元格中鍵入 =ORT.Sentiment("TEXT") 和 =ORT.Question("QUESTION","CONTEXT") 並傳入引數。

現在我們準備好深入研究程式碼了！

`manifest.xml` 檔案

manifest.xml 檔案指定所有自定義函式都屬於 ORT 名稱空間。您將使用該名稱空間在 Excel 中訪問自定義函式。將 manifest.xml 中的值更新為 ORT。

<bt:String id="Functions.Namespace" DefaultValue="ORT"/>
<ProviderName>ORT</ProviderName>

在此處瞭解有關清單檔案配置的更多資訊。

`functions.ts` 檔案

在 function.ts 檔案中，我們定義了函式名稱、引數、邏輯和返回型別。

在 function.ts 檔案的頂部匯入 inferenceQuestion 和 inferenceSentiment 函式。（我們將在本教程後面介紹這些函式中的邏輯。）

/* global console */
import { inferenceQuestion } from "./bert/inferenceQuestion";
import { inferenceSentiment } from "./bert/inferenceSentiment";

接下來新增 sentiment 和 question 函式。

/**
* Returns the sentiment of a string.
* @customfunction
* @param text Text string
* @returns sentiment string.
*/
export async function sentiment(text: string): Promise<string> {
const result = await inferenceSentiment(text);
console.log(result[1][0]);
return result[1][0].toString();
}
/**
 * Returns the sentiment of a string.
 * @customfunction
 * @param question Question string
 * @param context Context string
 * @returns answer string.
 */
export async function question(question: string, context: string): Promise<string> {
const result = await inferenceQuestion(question, context);
if (result.length > 0) {
    console.log(result[0].text);
    return result[0].text.toString();
}
return "Unable to find answer";
}

`inferenceQuestion.ts` 檔案

inferenceQuestion.ts 檔案包含處理問答 BERT 模型的邏輯。該模型是使用本教程建立的。然後我們使用 ORT 量化工具來減小模型的大小。在此處瞭解有關量化的更多資訊。

首先從 question_answer.ts 匯入 onnxruntime-web 和輔助函式。question_answer.ts 是在此處找到的 tensorflow 示例的編輯版本。您可以在此專案的原始碼此處找到編輯後的版本。

/* eslint-disable no-undef */
import * as ort from "onnxruntime-web";
import { create_model_input, Feature, getBestAnswers, Answer } from "./utils/question_answer";

inferenceQuestion 函式將接收問題和上下文，並根據推理結果提供答案。然後我們設定模型的路徑。該路徑是在 webpack.config.js 中使用 CopyWebpackPlugin 設定的。此外掛在構建時將所需的資產複製到 dist 資料夾。

export async function inferenceQuestion(question: string, context: string): Promise<Answer[]> {
  const model: string = "./bert-large-uncased-int8.onnx";

現在讓我們建立 ONNX Runtime 推理會話並設定選項。在此處瞭解所有 SessionOptions 的更多資訊。

  // create session, set options
  const options: ort.InferenceSession.SessionOptions = {
    executionProviders: ["wasm"],
    // executionProviders: ['webgl']
    graphOptimizationLevel: "all",
  };
  console.log("Creating session");
  const session = await ort.InferenceSession.create(model, options);

接下來，我們使用 question_answer.ts 中的 create_model_input 函式對 question 和 context 進行編碼。這將返回 Feature。

  // Get encoded ids from text tokenizer.
  const encoded: Feature = await create_model_input(question, context);
  console.log("encoded", encoded);

  export interface Feature {
    input_ids: Array<any>;
    input_mask: Array<any>;
    segment_ids: Array<any>;
    origTokens: Token[];
    tokenToOrigMap: { [key: number]: number };
}

既然我們有了編碼的 Feature，我們需要建立型別為 BigInt 的陣列（input_ids、attention_mask 和 token_type_ids）來建立 ort.Tensor 輸入。

  // Create arrays of correct length
  const length = encoded.input_ids.length;
  var input_ids = new Array(length);
  var attention_mask = new Array(length);
  var token_type_ids = new Array(length);

  // Get encoded.input_ids as BigInt
  input_ids[0] = BigInt(101);
  attention_mask[0] = BigInt(1);
  token_type_ids[0] = BigInt(0);
  var i = 0;
  for (; i < length; i++) {
    input_ids[i + 1] = BigInt(encoded.input_ids[i]);
    attention_mask[i + 1] = BigInt(1);
    token_type_ids[i + 1] = BigInt(0);
  }
  input_ids[i + 1] = BigInt(102);
  attention_mask[i + 1] = BigInt(1);
  token_type_ids[i + 1] = BigInt(0);

  console.log("arrays", input_ids, attention_mask, token_type_ids);

從 Arrays 建立 ort.Tensor。

  const sequence_length = input_ids.length;
  var input_ids_tensor: ort.Tensor = new ort.Tensor("int64", BigInt64Array.from(input_ids), [1, sequence_length]);
  var attention_mask_tensor: ort.Tensor = new ort.Tensor("int64", BigInt64Array.from(attention_mask), [ 1, sequence_length]);
  var token_type_ids_tensor: ort.Tensor = new ort.Tensor("int64", BigInt64Array.from(token_type_ids), [ 1, sequence_length]);

我們已準備好執行推理！在這裡，我們建立 OnnxValueMapType（輸入物件）和 FetchesType（返回標籤）。您可以傳送物件和字串陣列而不宣告型別，但新增型別是有用的。

  const model_input: ort.InferenceSession.OnnxValueMapType = {
    input_ids: input_ids_tensor,
    input_mask: attention_mask_tensor,
    segment_ids: token_type_ids_tensor,
  };
  const output_names: ort.InferenceSession.FetchesType = ["start_logits", "end_logits"];
  const output = await session.run(model_input, output_names);
  const result_length = output["start_logits"].data.length;

接下來，遍歷結果並從生成的 start_logits 和 end_logits 建立一個 number 陣列。

  const start_logits: number[] = Array(); 
  const end_logits: number[] = Array(); 
  console.log("start_logits", start_logits);
  console.log("end_logits", end_logits);
  for (let i = 0; i <= result_length; i++) {
    start_logits.push(Number(output["start_logits"].data[i]));
  }
  for (let i = 0; i  <= result_length; i++) {
    end_logits.push(Number(output["end_logits"].data[i]));
  }

最後，我們將從 question_answer.ts 呼叫 getBestAnswers。這將接收結果並進行後處理，以從推理結果中獲取答案。

  const answers: Answer[] = getBestAnswers(
    start_logits,
    end_logits,
    encoded.origTokens,
    encoded.tokenToOrigMap,
    context
  );
  console.log("answers", answers);
  return answers;
}

然後，answers 將返回到 functions.ts 的 question，生成的字串將被返回並填充到 Excel 單元格中。

export async function question(question: string, context: string): Promise<string> {
  const result = await inferenceQuestion(question, context);
  if (result.length > 0) {
    console.log(result[0].text);
    return result[0].text.toString();
  }
  return "Unable to find answer";
}

現在您可以執行以下命令，將載入項構建並旁載入到您的 Excel 電子表格中！

// Command to run on the web.
// Replace "{url}" with the URL of an Excel document.
npm run start:web -- --document {url}

以上是 ORT.Question() 自定義函式的詳細說明，接下來我們將詳細說明 ORT.Sentiment() 的實現方式。

`inferenceSentiment.ts` 檔案

inferenceSentiment.ts 是用於推理和獲取 Excel 單元格中文字情感的邏輯。此處的程式碼改編自此示例。讓我們深入瞭解這部分的工作原理。

首先，讓我們匯入所需的包。正如您將在本教程中看到的，bertProcessing 函式將建立我們的模型輸入。bert_tokenizer 是 BERT 模型的 JavaScript 分詞器。onnxruntime-web 在瀏覽器上啟用 JavaScript 推理。

/* eslint-disable no-undef */
import * as bertProcessing from "./bertProcessing";
import * as ort from "onnxruntime-web";
import { EMOJIS } from "./emoji";
import { loadTokenizer } from "./bert_tokenizer";

現在讓我們載入已為情感分析微調的量化 BERT 模型。然後建立 ort.InferenceSession 和 ort.InferenceSession.SessionOptions。

export async function inferenceSentiment(text: string) {
  // Set model path.
  const model: string = "./xtremedistill-go-emotion-int8.onnx";
  const options: ort.InferenceSession.SessionOptions = {
    executionProviders: ["wasm"],
    // executionProviders: ['webgl']
    graphOptimizationLevel: "all",
  };
  console.log("Creating session");
  const session = await ort.InferenceSession.create(model, options);

接下來，我們對文字進行分詞以建立 model_input，並將其與輸出標籤 output_0 一起傳送到 session.run 以獲取推理結果。

  // Get encoded ids from text tokenizer.
  const tokenizer = loadTokenizer();
  const encoded = await tokenizer.then((t) => {
    return t.tokenize(text);
  });
  console.log("encoded", encoded);
  const model_input = await bertProcessing.create_model_input(encoded);
  console.log("run session");
  const output = await session.run(model_input, ["output_0"]);
  const outputResult = output["output_0"].data;
  console.log("outputResult", outputResult);

接下來，我們解析輸出以獲取最佳結果，並將其對映到標籤、分數和表情符號。

  let probs = [];
  for (let i = 0; i < outputResult.length; i++) {
    let sig = bertProcessing.sigmoid(outputResult[i]);
    probs.push(Math.floor(sig * 100));
  }
  console.log("probs", probs);
  const result = [];
  for (var i = 0; i < EMOJIS.length; i++) {
    const t = [EMOJIS[i], probs[i]];
    result[i] = t;
  }
  result.sort(bertProcessing.sortResult);
  console.log(result);
  const result_list = [];
  result_list[0] = ["Emotion", "Score"];
  for (i = 0; i < 6; i++) {
    result_list[i + 1] = result[i];
  }
  console.log(result_list);
  return result_list;
}

result_list 被返回並解析，以將最佳結果返回到 Excel 單元格。

export async function sentiment(text: string): Promise<string> {
  const result = await inferenceSentiment(text);
  console.log(result[1][0]);
  return result[1][0].toString();
}

現在您可以執行以下命令，將載入項構建並旁載入到您的 Excel 電子表格中！

// Command to run on the web.
// Replace "{url}" with the URL of an Excel document.
npm run start:web -- --document {url}

結論

在這裡，我們回顧了使用 JavaScript 藉助 ONNX Runtime Web 和開源模型在 Excel 載入項中建立自定義函式所需的邏輯。從這裡，您可以採用此邏輯並更新到您擁有的特定模型或用例。請務必檢視完整的原始碼，其中包含分詞器和預處理/後處理，以完成上述任務。

JavaScript 中用於 BERT NLP 任務的 ONNX Runtime 自定義 Excel 函式

目錄

前提條件

什麼是自定義函式？

建立自定義函式專案

`manifest.xml` 檔案

`functions.ts` 檔案

`inferenceQuestion.ts` 檔案

`inferenceSentiment.ts` 檔案

結論

附加資源

JavaScript 中用於 BERT NLP 任務的 ONNX Runtime 自定義 Excel 函式

目錄

前提條件

什麼是自定義函式？

建立自定義函式專案

manifest.xml 檔案

functions.ts 檔案

inferenceQuestion.ts 檔案

inferenceSentiment.ts 檔案

結論

附加資源

`manifest.xml` 檔案

`functions.ts` 檔案

`inferenceQuestion.ts` 檔案

`inferenceSentiment.ts` 檔案