Skip to main content
We’re actively expanding webhook support.Just landed: automatic retries with exponential backoff — failed deliveries are now retried up to 5 times over 30 minutes.Coming soon:
  • Team-wide default webhook URLs
  • Cryptographic signatures for payload verification
Want early access? Reach out at [email protected] or join our Slack community.

Overview

Webhooks deliver real-time HTTP POST notifications to your server when long-running operations complete. Instead of polling for status, your application receives instant updates.

Use Cases

Async Processing

Get notified when batches or crawls complete instead of polling

Pipeline Triggers

Automatically trigger downstream processing when data is ready

Alerting

Send alerts to Slack, email, or other systems on completion

Data Sync

Keep your database in sync with Olostep results

Supported Events

Fired when a batch finishes processing (all items completed or failed).
{
  "id": "event_a1b2c3d4e5f6g7h8",
  "object": "event.batch.completed",
  "timestamp": 1737570000000,
  "delivery_attempt": "1/5",
  "data": {
    "id": "batch_xyz123",
    "object": "batch",
    "status": "completed",
    "items_total": 100,
    "items_completed": 98,
    "items_failed": 2,
    "created_at": "2024-01-15T10:00:00Z",
    "completed_at": "2024-01-15T10:05:32Z"
  }
}
Fired when a crawl finishes and all discovered pages have been processed.
{
  "id": "event_x9y8z7w6v5u4t3s2",
  "object": "event.crawl.completed",
  "timestamp": 1737570000000,
  "delivery_attempt": "1/5",
  "data": {
    "id": "crawl_abc789",
    "object": "crawl",
    "status": "completed",
    "start_url": "https://example.com",
    "urls_count": 87,
    "max_pages": 100,
    "max_depth": 3,
    "actual_max_depth": 3,
    "start_epoch": 1737569500000,
    "start_date": "2024-01-15"
  }
}

Setting Up Webhooks

Pass webhook when creating a resource. This URL receives the completion notification.
Parameter name: The canonical parameter is webhook. For backward compatibility, webhook_url is also accepted as an alias.
import requests

# Batch example
response = requests.post(
    "https://api.olostep.com/v1/batches",
    headers={"Authorization": "Bearer <YOUR_API_KEY>"},
    json={
        "items": [
            {"url": "https://example.com/page1", "custom_id": "1"},
            {"url": "https://example.com/page2", "custom_id": "2"}
        ],
        "webhook": "https://your-server.com/webhooks/olostep"
    }
)

# Crawl example
response = requests.post(
    "https://api.olostep.com/v1/crawls",
    headers={"Authorization": "Bearer <YOUR_API_KEY>"},
    json={
        "start_url": "https://example.com",
        "max_pages": 50,
        "webhook": "https://your-server.com/webhooks/olostep"
    }
)

Webhook Payload

All webhook payloads follow a unified envelope structure:
{
  "id": "event_a1b2c3d4e5f6g7h8",
  "object": "event.batch.completed",
  "timestamp": 1737570000000,
  "delivery_attempt": "1/5",
  "data": {
    "id": "batch_xyz123",
    "object": "batch",
    "status": "completed",
    "items_total": 100,
    "items_completed": 98,
    "items_failed": 2
  }
}

Envelope Fields

FieldDescription
idEvent ID — same across all retry attempts
objectEvent type (e.g., event.batch.completed)
timestampWhen this delivery attempt was sent (epoch ms)
delivery_attemptCurrent attempt / max attempts (e.g., 1/5, 3/5)
dataThe actual resource data (same format as API response)
Use the id field to deduplicate webhook deliveries in your receiver. The same event ID appears in all retry attempts.

Retry Behavior

Failed webhook deliveries are automatically retried with exponential backoff over a 30-minute window:
AttemptDelay Before AttemptCumulative Time
1Immediate0 min
2~2 min~2 min
3~4 min~6 min
4~7 min~13 min
5~15 min~28 min
Total retry window: 30 minutes
Per-request timeout: 30 seconds

What Counts as Success

Your endpoint must return a 2xx status code within 30 seconds. Any other response triggers a retry.
ResponseResult
200 OK✅ Delivered
201 Created✅ Delivered
301 Redirect❌ Retry (we don’t follow redirects)
400 Bad Request❌ Retry
500 Server Error❌ Retry
Timeout (>30s)❌ Retry
Connection refused❌ Retry

Best Practices

Return 200 OK immediately and process the webhook asynchronously. If your processing takes longer than 30 seconds, we’ll retry — causing duplicate deliveries.
from queue import Queue
import threading

webhook_queue = Queue()

@app.route('/webhooks/olostep', methods=['POST'])
def handle_webhook():
    # Queue for async processing
    webhook_queue.put(request.json)
    
    # Return immediately
    return 'OK', 200

def process_webhooks():
    while True:
        event = webhook_queue.get()
        # Slow processing happens here
        process_event(event)

threading.Thread(target=process_webhooks, daemon=True).start()
Use the id field to deduplicate. Store processed event IDs and skip duplicates.
processed_events = set()  # Use Redis/DB in production

def handle_event(event):
    if event['id'] in processed_events:
        return  # Already processed
    
    # Process the event
    process_batch_completed(event['data'])
    
    # Mark as processed
    processed_events.add(event['id'])
Log all webhook receipts for debugging. Include the event ID, timestamp, and processing result.
import logging

@app.route('/webhooks/olostep', methods=['POST'])
def handle_webhook():
    event = request.json
    logging.info(f"Webhook received: id={event['id']} type={event['object']} attempt={event['delivery_attempt']}")
    
    try:
        process_event(event)
        logging.info(f"Webhook processed: id={event['id']}")
    except Exception as e:
        logging.error(f"Webhook failed: id={event['id']} error={e}")
        raise
    
    return 'OK', 200
Always use HTTPS for webhook endpoints. HTTP endpoints are vulnerable to eavesdropping and man-in-the-middle attacks.

Troubleshooting

  1. Verify the webhook parameter was included in your request
  2. Verify your endpoint is publicly accessible (not localhost)
  3. Check your server logs for incoming requests
  4. Ensure you’re returning a 2xx status code
This is expected during retries. Implement idempotent handling using the id field:
def handle_event(event):
    if already_processed(event['id']):
        return  # Skip duplicate
    
    process_event(event)
    mark_processed(event['id'])
Your endpoint must respond within 30 seconds. Process webhooks asynchronously:
@app.route('/webhooks', methods=['POST'])
def webhook():
    queue.enqueue(process_webhook, request.json)
    return 'OK', 200  # Respond immediately

Coming Soon

Team Default URL

Configure a default webhook URL in your account settings. All requests will use this URL unless overridden.

Signature Verification

Cryptographic signatures (HMAC-SHA256) to verify webhook payloads came from Olostep.
Want early access to these features? Contact us at [email protected] or join our Slack community.