Asynchronous Processing
Process documents asynchronously for large files or batch operations. The async API allows you to submit jobs and receive results via webhooks or polling.
- Large documents that take longer to process
- Batch processing of multiple documents
- Integration with background job systems
- Webhook-based workflows
- Need to retrieve results multiple times or later
Not sure which to choose? See the Processing Overview for a detailed comparison.
How It Works
- Submit Job: Upload document and create async job
- Job Processing: Document is queued and processed in background
- Get Results: Receive results via webhook or poll the job status
- Retrieve Data: Access the processed data and download results
Endpoints
Submit Async Job
Create a new async processing job.
POST /v1/documents/parse/async
Authentication
Include your API key in the request header:
x-api-key: your_api_key_here
Request Body
Send as multipart/form-data with these fields:
| Field | Type | Description | Required |
|---|---|---|---|
| file | File | Document to analyze (PDF, PNG, JPEG) | Yes |
| templateId | string (UUID) | ID of existing template | Yes |
| language | string | Language hint for OCR (optional) | No |
Async processing currently supports one file per job. For multiple files, submit separate jobs.
Idempotency
Parselyze automatically handles duplicate submissions to prevent redundant processing:
- Same File Detection: If you submit the exact same file with the same template, the API returns the existing job instead of creating a new one
- File Comparison: Files are compared using content fingerprinting, not file names
- Automatic Deduplication: No configuration needed - idempotency is built-in
If you accidentally submit the same document twice, you'll receive the original job ID and its current status. This prevents duplicate charges and unnecessary processing.
Example Request
curl -X POST https://api.parselyze.com/v1/documents/parse/async \
-H "x-api-key: plz_xxxxxxxx...xxxxxx" \
-F "file=@large_document.pdf" \
-F "templateId=YOUR_TEMPLATE_ID"
Response
New Job:
{
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "pending",
"message": "Job queued for processing",
"createdAt": "2026-01-27T10:30:00Z"
}
Existing Job (Duplicate File):
{
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"message": "This document has already been submitted. Returning existing job.",
"createdAt": "2026-01-27T10:28:15Z"
}
When you submit the exact same file with the same template, you always get back the original job. Use GET /v1/jobs/:jobId to get the full job details including the result.
Status Values:
pending: Job is queued and waiting to be processedprocessing: Job is currently being processedcompleted: Job finished successfullyfailed: Job failed after all retriesretrying: Job failed and will be retried
Get Job Status
Retrieve the current status and result of a job.
GET /v1/jobs/:jobId
Example Request
curl -X GET https://api.parselyze.com/v1/jobs/550e8400-e29b-41d4-a716-446655440000 \
-H "x-api-key: plz_xxxxxxxx...xxxxxx"
Response - Processing
{
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "processing",
"fileName": "large_document.pdf",
"templateId": "YOUR_TEMPLATE_ID",
"result": null,
"error": null,
"pageCount": null,
"attempts": 1,
"createdAt": "2026-01-27T10:30:00Z",
"startedAt": "2026-01-27T10:30:05Z",
"completedAt": null
}
Response - Completed
{
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"fileName": "large_document.pdf",
"templateId": "YOUR_TEMPLATE_ID",
"result": {
"invoice": {
"number": "INV-2025-001",
"date": "26/05/2025",
"total": 1250.75,
"vendor": "Acme Corporation"
}
},
"error": null,
"pageCount": 5,
"attempts": 1,
"createdAt": "2026-01-27T10:30:00Z",
"startedAt": "2026-01-27T10:30:05Z",
"completedAt": "2026-01-27T10:30:45Z"
}
Response - Failed
{
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "failed",
"fileName": "large_document.pdf",
"templateId": "YOUR_TEMPLATE_ID",
"result": null,
"error": "Failed to extract text from document",
"pageCount": null,
"attempts": 3,
"createdAt": "2026-01-27T10:30:00Z",
"startedAt": "2026-01-27T10:30:05Z",
"completedAt": "2026-01-27T10:32:15Z"
}
JavaScript/Node.js Examples
Submit and Poll for Results
import { Parselyze } from "parselyze";
const parselyze = new Parselyze(process.env.PARSELYZE_API_KEY);
async function processDocumentAsync() {
// Submit job
const job = await parselyze.documents.parseAsync({
file: "./large_document.pdf",
templateId: "YOUR_TEMPLATE_ID"
});
console.log(`Job submitted: ${job.jobId}`);
// Poll for completion
let result;
while (true) {
result = await parselyze.jobs.get(job.jobId);
if (result.status === "completed") {
console.log("Job completed:", result.result);
break;
} else if (result.status === "failed") {
console.error("Job failed:", result.error);
break;
}
console.log(`Status: ${result.status}, waiting...`);
await new Promise(resolve => setTimeout(resolve, 5000)); // Wait 5s
}
}
processDocumentAsync();
Python Examples
Submit and Poll
import requests
import time
import os
API_KEY = os.getenv('PARSELYZE_API_KEY')
BASE_URL = 'https://api.parselyze.com'
def process_async():
# Submit job
with open('./large_document.pdf', 'rb') as file:
files = {'file': file}
data = {'templateId': 'YOUR_TEMPLATE_ID'}
response = requests.post(
f"{BASE_URL}/v1/documents/parse/async",
headers={"x-api-key": API_KEY},
files=files,
data=data
)
job = response.json()
job_id = job['jobId']
print(f"Job submitted: {job_id}")
# Poll for completion
while True:
response = requests.get(
f"{BASE_URL}/v1/jobs/{job_id}",
headers={"x-api-key": API_KEY}
)
result = response.json()
status = result['status']
if status == 'completed':
print("Job completed:", result['result'])
break
elif status == 'failed':
print("Job failed:", result['error'])
break
print(f"Status: {status}, waiting...")
time.sleep(5)
process_async()
Retry Policy
Jobs automatically retry on failure with exponential backoff:
- Max Attempts: 3 (initial attempt + 2 retries)
- Backoff Strategy: Exponential starting at 5 seconds
- 1st retry: after 5 seconds
- 2nd retry: after 25 seconds (5s × 5)
- Retry Status: Job status changes to
retryingduring retry attempts - Attempts Counter: The
attemptsfield in the job response shows how many attempts have been made - Final Status:
failedafter all retries exhausted
The system only retries on transient errors (network issues, temporary unavailability). Permanent errors (invalid file format, missing template) fail immediately without retrying.
Best Practices
- Use Webhooks: Configure webhooks to receive automatic notifications when jobs complete - much more efficient than polling (see Webhook Guide)
- Leverage Idempotency: Don't worry about duplicate submissions - the API automatically returns existing jobs for identical files
- Handle Retries: Jobs may take longer than expected during retries, implement appropriate timeout logic
- Store Job IDs: Save job IDs in your database for audit trails and later retrieval
- Polling Interval: If not using webhooks, don't poll too frequently (recommended: every 5-10 seconds)
- Handle Failures: Implement error handling for
failedjobs and check the error field for details
Error Responses
| Code | Error | Description |
|---|---|---|
| 400 | Bad Request | Invalid file format or missing required fields |
| 401 | Unauthorized | Invalid or missing API key |
| 404 | Not Found | Job or template not found |
| 413 | Payload Too Large | File exceeds 50MB limit |
| 429 | Too Many Requests | Rate limit exceeded |
Rate Limits
Async job submission is subject to the same rate limits as synchronous processing:
- Free Plan: 10 requests/minute
- Starter Plan: 30 requests/minute
- Pro Plan: 60 requests/minute
- Business Plan: 60 requests/minute
See Also
- Webhooks: Receive job completion notifications automatically
- Synchronous Processing: For small documents with immediate results
- Authentication: API key management
- Error Handling: Comprehensive error handling guide