Skip to content

Structured Extraction API

Overview

The Structured Extraction API extracts key information from PDF files and converts it into structured data, supporting custom JSON Schema templates.

Primary Use Cases

  • Resume information extraction from PDFs
  • Key information extraction from contract documents
  • Structured data extraction from invoice PDFs
  • Report document information organization
  • Digitization of certificate documents

API Endpoint

POST https://freetier-01.cn-hangzhou.cluster.matrixonecloud.cn/byoa/api/v1/explore/extract

Request Headers

Parameter Type Required Description
moi-key String Yes API key
Content-Type String Yes Fixed value: application/json

Request Parameters

Parameter Type Required Description
file_path String Yes URL address of the PDF file
json_schema Object Yes JSON Schema extraction pattern definition

Request Example

Python Example

import requests
import json

url = "https://freetier-01.cn-hangzhou.cluster.matrixonecloud.cn/byoa/api/v1/explore/extract"
headers = {
    "moi-key": "your-api-key",
    "Content-Type": "application/json"
}

data = {
    "file_path": "http://120.26.117.79:8080/files/xiyou.pdf",
    "json_schema": {
        "title": "ExtractInfo",
        "type": "object",
        "properties": {
            "email": {
                "type": "string",
                "description": "Contact email"
            },
            "book_name": {
                "type": "string",
                "description": "Book name"
            }
        },
        "required": [
            "email",
            "book_name"
        ]
    }
}

response = requests.post(url, headers=headers, json=data)
print(response.json())

cURL Example

curl --location --request POST 'https://freetier-01.cn-hangzhou.cluster.matrixonecloud.cn/byoa/api/v1/explore/extract' \
--header 'moi-key: your-api-key' \
--header 'Content-Type: application/json' \
--data-raw '{
    "file_path": "http://120.26.117.79:8080/files/xiyou.pdf",
    "json_schema": {
        "title": "ExtractInfo",
        "type": "object",
        "properties": {
            "email": {
                "type": "string",
                "description": "Contact email"
            },
            "book_name": {
                "type": "string",
                "description": "Book name"
            }
        },
        "required": [
            "email",
            "book_name"
        ]
    }
}'

Response Format

{
    "code": "ok",
    "msg": "ok",
    "data": {
        "req_id": "607654fa-f12b-4415-a4c4-38dd07c7926e",
        "msg": "success",
        "file_path": "http://120.26.117.79:8080/files/xiyou.pdf",
        "file_size_bytes": 1562886,
        "results": {
            "email": "466698432@qq.com",
            "book_name": "Journey to the West"
        }
    }
}