影像辨識 Tool

使用 Automation Tool 上傳圖片並辨識影像

此範例會讓用戶上傳一張圖片，並讓 AI 依據用戶的問題加上圖片去回答。整個範例會建立 3 個 Processor 來完成辨識影像的流程，最後藉由預覽來驗證功能。

4-1 Validate Payload
4-2 LLM Completion
4-3 Response
4-4 預覽

前置作業：可以支援辨識影像的 Completion Model、一張圖片

4-1. Validate Payload

在 Schema 輸入需要驗證的 JSON Schema，此處我們需要讓用戶輸入一個問題，因此先用 question。

{
  "type": "object",
  "properties": {
    "question": {
      "type": "string"
    }
  },
  "required": [
    "question"
  ]
}

Note: 詳細 JSON Schema 寫法，請參考JSON Schema

File Alias: 由於需要用戶上傳圖片辨識，此處需要新增一個 file，File Type 請選擇 image，Alias 可以自訂，範例使用 img 作為別名。

4-2. LLM Completion

在 Completion Model 選擇已經建好的 model 或點擊 Add 新增 model（注意模型需要支援 image），提示語 Prompt 的部分選擇 Template，並提示 LLM 觀察圖片並回答用戶輸入的問題。

你是影像辨識的助手，負責觀察使用者上傳的圖片，並且回答使用者的問題。

* 這是使用者的問題: '{{{prevPayload.question}}} '

Output Schema: 將 LLM 的回答存到 answer

{
  "type": "object",
  "properties": {
    "answer": {
      "type": "string",
      "description": "answer 是你要回答用戶的答案"
    }
  },
  "required": [
    "answer"
  ]
}

MaxToken: 設定消耗 Token 上限為必填。

Blob IDs: 請選擇 Expression 並輸入以下範例。

(() => {
  // return the result of the expression
  return img.blobId;
})()

Note: 可以到模型供應商查詢模型支援的功能，以 OpenAI 為例，gpt-4o 可以支援 input image。

platform.openai.com

4-3. Response

利用 Payload 屬性輸出與訊息對應的結構化資料。選擇 Expression 並填入以下範例。

(() => {
  // return the result of the expression
  return {answer: answer}
})()

4-4.預覽 Automation Tool

點擊 preview 來實際存取該 API 測試，類型請選 Form-data，在 Form data 的地方，可以看到剛剛規定需要輸入的 question，此處輸入問題後並點擊底下的 Add 上傳一張圖片，上傳後點擊 Send。底下 Response Body 可以成功讓 LLM 辨識圖片並描述該張圖片內容作為 answer:

"answer": "這是皮卡丘，一個來自《寶可夢》系列的虛構角色。"

使用 Automation Tool 上傳圖片並辨識影像​

4-1. Validate Payload​

4-2. LLM Completion​

4-3. Response​

4-4.預覽 Automation Tool​

使用 Automation Tool 上傳圖片並辨識影像

4-1. Validate Payload

4-2. LLM Completion

4-3. Response

4-4.預覽 Automation Tool