Skip to main content

Document Processing

Process documents using Optical Character Recognition (OCR) to extract structured data according to predefined templates.

Template Management

Templates must be created through the web interface. You cannot create or modify templates via API.

Premium Feature: Multi-document Processing

Processing multiple files in a single request or ZIP files requires a premium subscription.

Endpoint

POST /documents/parse

Authentication

Include your API key in the request header:

x-api-key: your_api_key_here

Request Body

Send as multipart/form-data with these fields:

FieldTypeDescriptionRequired
filesFile(s)Document(s) to analyze (PDF, PNG, JPEG, ZIP)Yes
templateIdstring (UUID)ID of existing templateYes
languagestringLanguage hint for OCR (optional)No

Language (Optional)

You can optionally specify a language hint to improve accuracy for specific languages. Generally not needed, but you can force it by providing the language parameter.

Supported languages include: en (English), fr (French), es (Spanish), de (German), it (Italian), pt (Portuguese), ru (Russian), zh (Chinese), ja (Japanese), ko (Korean), ar (Arabic), and many more.

Examples:

  • "en" - English documents
  • "fr" - French documents
  • "zh" - Chinese documents

Processing Types

The endpoint automatically detects and handles:

  1. Single Document: One PDF or image file
  2. Multiple Documents: Multiple files (premium only)
  3. ZIP Archive: ZIP containing multiple documents (premium only)

Example Requests

Single Document

curl -X POST https://api.parselyze.com/documents/parse \
-H "x-api-key: your_api_key_here" \
-F "files=@invoice.pdf" \
-F "templateId=<YOUR_TEMPLATE_ID>"

Single Document with Language Hint

curl -X POST https://api.parselyze.com/documents/parse \
-H "x-api-key: your_api_key_here" \
-F "files=@facture_francaise.pdf" \
-F "templateId=<YOUR_TEMPLATE_ID>" \
-F "language=fr"

Multiple Documents with Language Hint (Premium)

curl -X POST https://api.parselyze.com/documents/parse \
-H "x-api-key: your_api_key_here" \
-F "files=@document1.pdf" \
-F "files=@document2.jpg" \
-F "templateId=<YOUR_TEMPLATE_ID>" \
-F "language=en"

JavaScript/Node.js Examples

import { Parselyze } from "parselyze";

const parselyze = new Parselyze("plz_xxxxxxxx...xxxxxx");

(async function () {
console.log("Start parsing document...");

const result = await parselyze.documents.parse({
files: ["./invoice.pdf"],
templateId: "<YOUR_TEMPLATE_ID>",
});

console.log("Parsing complete:", result);
})();

Python Examples

def analyze_multiple_documents():
template_id = '<YOUR_TEMPLATE_ID>'

files = [
('files', ('document1.pdf', open('document1.pdf', 'rb'), 'application/pdf')),
('files', ('document2.jpg', open('document2.jpg', 'rb'), 'image/jpeg'))
]

data = {
'templateId': template_id,
'language': 'en' # Optional
}

response = requests.post(
"https://api.parselyze.com/documents/parse",
headers={"x-api-key": os.getenv('PARSELYZE_API_KEY')},
files=files,
data=data
)

# Close files
for _, (_, file_obj, _) in files:
file_obj.close()

result = response.json()
print("Multi-document analysis:", result)

Response Format

Single Document

{
"result": {
"invoice": {
"number": "INV-2025-001",
"date": "26/05/2025",
"total": 1250.75,
"vendor": "Acme Corporation"
}
},
"pageCount": 1,
"pageUsed": 1,
"pageRemaining": 999
}

Multiple Documents

{
"results": [
{
"filename": "document1.pdf",
"result": {
"invoice": {
"number": "INV-2025-001",
"total": 1250.75
}
}
},
{
"filename": "document2.jpg",
"result": {
"invoice": {
"number": "INV-2025-002",
"total": 850.50
}
}
}
],
"totalPageCount": 3,
"pageUsed": 3,
"pageRemaining": 997
}

Template Requirements

Templates must be created through the web interface before use. You cannot create templates via API.

Supported File Types

  • PDF: Multi-page documents supported
  • Images: PNG, JPEG formats
  • ZIP: Archive containing multiple documents (premium only)

Page Quota

Your monthly page quota is consumed based on:

  • PDF files: 1 page per physical page in the document
  • Image files: 1 page per image
  • ZIP files: Sum of all pages in contained documents

Error Responses

CodeErrorDescription
400Bad RequestMissing required fields or invalid file format
401UnauthorizedInvalid or missing API key
403ForbiddenFeature requires premium subscription or testing mode restricted
404Not FoundTemplate not found or not accessible
413Payload Too LargeFile size exceeds limit (50MB)
429Too Many RequestsRate limit exceeded

See Also