Skip to main content

Asynchronous Processing

Process documents asynchronously for large files or batch operations. The async API allows you to submit jobs and receive results via webhooks or polling.

When to Use Async Processing
  • Large documents that take longer to process
  • Batch processing of multiple documents
  • Integration with background job systems
  • Webhook-based workflows
  • Need to retrieve results multiple times or later

Not sure which to choose? See the Processing Overview for a detailed comparison.

How It Works

  1. Submit Job: Upload document and create async job
  2. Job Processing: Document is queued and processed in background
  3. Get Results: Receive results via webhook or poll the job status
  4. Retrieve Data: Access the processed data and download results

Endpoints

Submit Async Job

Create a new async processing job.

POST /v1/documents/parse/async

Authentication

Include your API key in the request header:

x-api-key: your_api_key_here

Request Body

Send as multipart/form-data with these fields:

FieldTypeDescriptionRequired
fileFileDocument to analyze (PDF, PNG, JPEG)Yes
templateIdstring (UUID)ID of existing templateYes
languagestringLanguage hint for OCR (optional)No
Single File Only

Async processing currently supports one file per job. For multiple files, submit separate jobs.

Idempotency

Parselyze automatically handles duplicate submissions to prevent redundant processing:

  • Same File Detection: If you submit the exact same file with the same template, the API returns the existing job instead of creating a new one
  • File Comparison: Files are compared using content fingerprinting, not file names
  • Automatic Deduplication: No configuration needed - idempotency is built-in
Avoid Redundant Processing

If you accidentally submit the same document twice, you'll receive the original job ID and its current status. This prevents duplicate charges and unnecessary processing.

Example Request

curl -X POST https://api.parselyze.com/v1/documents/parse/async \
-H "x-api-key: plz_xxxxxxxx...xxxxxx" \
-F "file=@large_document.pdf" \
-F "templateId=YOUR_TEMPLATE_ID"

Response

New Job:

{
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "pending",
"message": "Job queued for processing",
"createdAt": "2026-01-27T10:30:00Z"
}

Existing Job (Duplicate File):

{
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"message": "This document has already been submitted. Returning existing job.",
"createdAt": "2026-01-27T10:28:15Z"
}
Idempotency

When you submit the exact same file with the same template, you always get back the original job. Use GET /v1/jobs/:jobId to get the full job details including the result.

Status Values:

  • pending: Job is queued and waiting to be processed
  • processing: Job is currently being processed
  • completed: Job finished successfully
  • failed: Job failed after all retries
  • retrying: Job failed and will be retried

Get Job Status

Retrieve the current status and result of a job.

GET /v1/jobs/:jobId

Example Request

curl -X GET https://api.parselyze.com/v1/jobs/550e8400-e29b-41d4-a716-446655440000 \
-H "x-api-key: plz_xxxxxxxx...xxxxxx"

Response - Processing

{
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "processing",
"fileName": "large_document.pdf",
"templateId": "YOUR_TEMPLATE_ID",
"result": null,
"error": null,
"pageCount": null,
"attempts": 1,
"createdAt": "2026-01-27T10:30:00Z",
"startedAt": "2026-01-27T10:30:05Z",
"completedAt": null
}

Response - Completed

{
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"fileName": "large_document.pdf",
"templateId": "YOUR_TEMPLATE_ID",
"result": {
"invoice": {
"number": "INV-2025-001",
"date": "26/05/2025",
"total": 1250.75,
"vendor": "Acme Corporation"
}
},
"error": null,
"pageCount": 5,
"attempts": 1,
"createdAt": "2026-01-27T10:30:00Z",
"startedAt": "2026-01-27T10:30:05Z",
"completedAt": "2026-01-27T10:30:45Z"
}

Response - Failed

{
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "failed",
"fileName": "large_document.pdf",
"templateId": "YOUR_TEMPLATE_ID",
"result": null,
"error": "Failed to extract text from document",
"pageCount": null,
"attempts": 3,
"createdAt": "2026-01-27T10:30:00Z",
"startedAt": "2026-01-27T10:30:05Z",
"completedAt": "2026-01-27T10:32:15Z"
}

JavaScript/Node.js Examples

Submit and Poll for Results

import { Parselyze } from "parselyze";

const parselyze = new Parselyze(process.env.PARSELYZE_API_KEY);

async function processDocumentAsync() {
// Submit job
const job = await parselyze.documents.parseAsync({
file: "./large_document.pdf",
templateId: "YOUR_TEMPLATE_ID"
});

console.log(`Job submitted: ${job.jobId}`);

// Poll for completion
let result;
while (true) {
result = await parselyze.jobs.get(job.jobId);

if (result.status === "completed") {
console.log("Job completed:", result.result);
break;
} else if (result.status === "failed") {
console.error("Job failed:", result.error);
break;
}

console.log(`Status: ${result.status}, waiting...`);
await new Promise(resolve => setTimeout(resolve, 5000)); // Wait 5s
}
}

processDocumentAsync();

Python Examples

Submit and Poll

import requests
import time
import os

API_KEY = os.getenv('PARSELYZE_API_KEY')
BASE_URL = 'https://api.parselyze.com'

def process_async():
# Submit job
with open('./large_document.pdf', 'rb') as file:
files = {'file': file}
data = {'templateId': 'YOUR_TEMPLATE_ID'}

response = requests.post(
f"{BASE_URL}/v1/documents/parse/async",
headers={"x-api-key": API_KEY},
files=files,
data=data
)

job = response.json()
job_id = job['jobId']
print(f"Job submitted: {job_id}")

# Poll for completion
while True:
response = requests.get(
f"{BASE_URL}/v1/jobs/{job_id}",
headers={"x-api-key": API_KEY}
)

result = response.json()
status = result['status']

if status == 'completed':
print("Job completed:", result['result'])
break
elif status == 'failed':
print("Job failed:", result['error'])
break

print(f"Status: {status}, waiting...")
time.sleep(5)

process_async()

Retry Policy

Jobs automatically retry on failure with exponential backoff:

  • Max Attempts: 3 (initial attempt + 2 retries)
  • Backoff Strategy: Exponential starting at 5 seconds
    • 1st retry: after 5 seconds
    • 2nd retry: after 25 seconds (5s × 5)
  • Retry Status: Job status changes to retrying during retry attempts
  • Attempts Counter: The attempts field in the job response shows how many attempts have been made
  • Final Status: failed after all retries exhausted
Smart Retry Logic

The system only retries on transient errors (network issues, temporary unavailability). Permanent errors (invalid file format, missing template) fail immediately without retrying.

Best Practices

  1. Use Webhooks: Configure webhooks to receive automatic notifications when jobs complete - much more efficient than polling (see Webhook Guide)
  2. Leverage Idempotency: Don't worry about duplicate submissions - the API automatically returns existing jobs for identical files
  3. Handle Retries: Jobs may take longer than expected during retries, implement appropriate timeout logic
  4. Store Job IDs: Save job IDs in your database for audit trails and later retrieval
  5. Polling Interval: If not using webhooks, don't poll too frequently (recommended: every 5-10 seconds)
  6. Handle Failures: Implement error handling for failed jobs and check the error field for details

Error Responses

CodeErrorDescription
400Bad RequestInvalid file format or missing required fields
401UnauthorizedInvalid or missing API key
404Not FoundJob or template not found
413Payload Too LargeFile exceeds 50MB limit
429Too Many RequestsRate limit exceeded

Rate Limits

Async job submission is subject to the same rate limits as synchronous processing:

  • Free Plan: 10 requests/minute
  • Starter Plan: 30 requests/minute
  • Pro Plan: 60 requests/minute
  • Business Plan: 60 requests/minute

See Also