Background Jobs

Automation, Workers, and Event-Driven Workflows


A user clicks "Export Report." The request hits your controller, which queries 200,000 rows, compiles them into a spreadsheet, writes the file to storage, and returns a download link. That request takes 45 seconds. The load balancer times out at 30. The user sees a 504. They click the button again. Now two export jobs are running against the same dataset, fighting for memory.

Background job processing exists because HTTP requests are the wrong place for slow work. Every Laravel application we have built since 2005, and we have built over 50, eventually needs a queue. The question is never whether you need one. It is how you design it so jobs do not pile up, fail silently, or corrupt data when they retry.

This page covers the architectural patterns we use in production: queue topology, driver selection, idempotent job design, failure handling with dead letter queues, monitoring through Horizon, and the specific failure modes that tutorials never mention. If you have already read the Laravel queue documentation and want to know what changes when you move from tutorial to production, this is where that conversation starts.


The Constraint: Why Synchronous Processing Fails

Every HTTP request in a Laravel application runs inside a process with finite memory, a configured timeout, and a user waiting on the other end. For most requests (fetching a page, saving a form, returning JSON from an API), the lifecycle is fine. The work finishes in milliseconds. The problems start when the work is slow, unpredictable, or both.

PDF generation for a complex invoice might take 3 seconds or 30, depending on how many line items it contains. Sending a batch of 500 emails through an SMTP relay takes as long as the relay takes. Importing a CSV with 50,000 rows means 50,000 database writes, each with validation, event dispatch, and potential webhook callbacks to external systems.

Running any of these inside a web request creates three categories of failure.

Timeouts

PHP's max_execution_time, Nginx's proxy_read_timeout, and any load balancer timeout all compete. The strictest one wins, and the user gets a blank error page.

Memory exhaustion

Processing 50,000 rows in a single request can exceed PHP's memory limit. The process dies. No cleanup runs. Partial data sits in the database.

Worker blocking

While one request grinds through a data import, that PHP-FPM worker is unavailable. With enough concurrent slow requests, the entire application becomes unresponsive for everyone.

The naive fix is to increase timeouts, raise memory limits, and add more workers. This delays the problem. It does not fix it. The fix is to move slow work out of the request lifecycle entirely: accept the request, dispatch a job to a queue, return a response immediately, and let a separate worker process handle the heavy lifting. This is background job processing. It is not a feature. It is infrastructure.


Queue Architecture and Driver Selection

Laravel's queue system abstracts the transport layer behind a consistent API. You dispatch jobs the same way regardless of whether they end up in Redis, Amazon SQS, a database table, or Beanstalkd. The driver choice matters for operations, not for application code.

Redis (the default for production)

Redis is the most common queue driver for Laravel applications, and for good reason. It is fast (operations measured in microseconds), supports blocking pops (workers do not poll), and integrates with Laravel Horizon for monitoring and metrics. We use Redis for the majority of our queue infrastructure.

The trade-off: Redis is an in-memory store. If the Redis instance restarts without persistence configured, queued jobs disappear. In production, this means either Redis with AOF persistence, Redis Sentinel for failover, or a managed service like AWS ElastiCache with Multi-AZ.

Amazon SQS

SQS is the right choice when durability matters more than latency. Messages are replicated across multiple availability zones. You do not manage the infrastructure. The trade-off is that SQS does not support blocking pops (workers poll), message ordering is best-effort (FIFO queues exist but add complexity), and Horizon does not support SQS. We use SQS for jobs where message loss is unacceptable, such as webhook-triggered jobs from payment providers: you cannot afford to lose a Stripe webhook because Redis restarted.

Database queues

The database driver stores jobs in a jobs table. No additional infrastructure is needed. For applications with low job volume (fewer than 1,000 jobs per day), this is a reasonable starting point. It fails at scale because every job dispatch and every job pickup is a database write, competing with your application's transactional queries for connection pool capacity.

Queue topology

A single default queue is where most applications start. It is also where most applications should stop pretending they have thought about queue design. In production, we typically configure three to five named queues with explicit priorities.

Queue Purpose Example Jobs
critical User-facing, time-sensitive Password reset emails, payment confirmations
webhooks External system callbacks Stripe webhooks, CRM sync events
default Standard processing Notification dispatch, cache warming
bulk High-volume batch work CSV imports, report generation
scheduled Time-triggered jobs Daily digest emails, data cleanup

Workers are then assigned to queues with priority ordering. A password reset email will never wait behind a 50,000-row CSV import. The critical queue drains first. Always.


Job Dispatching Patterns

Laravel provides several dispatch mechanisms, each suited to different situations. The choice affects execution timing, failure isolation, and how jobs relate to each other.

Synchronous dispatch. dispatch_sync() runs the job inline, in the current process. Use this for testing, for local development, and for jobs that must complete before the response returns. Never use synchronous dispatch in production for slow work.

Standard async dispatch. The job serialises onto the configured queue and returns immediately. A worker picks it up when capacity is available. This is the default and the right choice for most jobs.

Delayed dispatch. The job sits on the queue but is not available for processing until the delay expires. Useful for scheduled follow-ups, rate-limited API calls, and retry windows. Be aware that delayed jobs on Redis use sorted sets, which consume memory proportional to the number of delayed jobs.

Job chaining

Each job in a chain runs only if the previous one succeeded. If the first step fails, the validation and notification jobs never execute. This is the correct pattern for multi-step workflows where later steps depend on earlier results. Compare this with workflow engines, which handle more complex branching and conditional logic.

Job batching

Batching runs jobs concurrently with aggregate callbacks. Use it for parallelisable work: processing rows in a CSV, generating thumbnails for uploaded images, or sending notifications to a list of recipients. The batch tracks progress, so you can show the user a percentage complete via a real-time dashboard.

Job middleware

Middleware wraps job execution with cross-cutting concerns. The two we use most frequently:

Rate limiting

RateLimited::class prevents jobs from exceeding external API limits. If your Stripe account allows 100 requests per second, rate-limiting middleware ensures your payment sync jobs respect that cap.

Preventing overlaps

WithoutOverlapping::class ensures only one instance of a job runs for a given key at a time. Critical for jobs that modify the same resource, such as recalculating an account balance.


Designing Idempotent Jobs

A job is idempotent if running it twice with the same input produces the same result as running it once. This property is not optional. It is a requirement for any job that might be retried.

Retries happen constantly in production. A worker crashes mid-job. Redis fails over. A deployment restarts all workers. The queue system re-dispatches the job because it was never acknowledged as complete. If that job already wrote half its data to the database, the retry writes it again. Without idempotency, you get duplicate records, double-charged customers, or emails sent twice.

The rule: Jobs run "at least once", not "exactly once". Every job that performs a side effect must be designed so that running it multiple times produces the same outcome as running it once.

Unique job identifiers

Assign each job a UUID at dispatch time. Before processing, check whether a job with that UUID has already completed. Store completed UUIDs in a cache or database table with a TTL.

Database transactions with constraints

Wrap job work in a transaction and rely on unique constraints to prevent duplicates. If the job creates an invoice, a unique constraint on [order_id, invoice_type] ensures the retry fails gracefully rather than creating a second invoice.

Upserts over inserts

Use updateOrCreate() instead of create() when the job writes records that might already exist. The second execution updates the existing record rather than failing or duplicating.

Separating side effects

Move non-idempotent side effects (sending emails, calling external APIs) to the end of the job, after the idempotent database work. If the job fails before reaching the side effect, no email is sent. If it fails after, the retry skips the database work and re-sends the email, which is typically acceptable.

Idempotency is not a library you install. It is a design discipline applied to every job class. We review job idempotency during code review the same way we review database migrations: as infrastructure that must be correct.


Failed Job Handling and Dead Letter Queues

Jobs fail. Connections drop. External APIs return 500s. A CSV contains a row with a malformed date that crashes the parser. The question is not whether jobs will fail but what happens when they do.

Retry strategies

Laravel's default retry behaviour is configurable per job. You can set the number of attempts and a backoff schedule (for example, 10 seconds, then 60, then 300). The backoff prevents a job from hammering an external service that is already struggling.

For transient failures (network timeouts, rate limits), retries with backoff usually resolve the issue. For permanent failures (invalid data, missing dependencies), retries waste resources. The job needs to distinguish between the two.

Failure Type Example Response
Transient Network timeout, rate limit, temporary API outage Retry with exponential backoff
Permanent Invalid email address, deleted record, bad data Fail immediately, log for review
Resource Out of memory, disk full, connection pool exhausted Release job back to queue, alert operations
Dependency External service down, API returning 500s Delay retry, activate circuit breaker

Dead letter queues

When a job exhausts all retries, Laravel moves it to the failed_jobs table. This is your dead letter queue. These jobs need attention. They represent work the system could not complete. The dead letter queue pattern, borrowed from enterprise integration patterns, treats failed jobs as messages that require human or automated intervention.

In our systems, we implement three layers of handling.

Automated triage

A scheduled job scans failed_jobs hourly. Jobs that failed due to known transient issues (a third-party API outage that has since resolved) are automatically retried.

Alerting

When the failed job count exceeds a threshold (we typically set this at 10 failures per hour), an alert fires to Slack or PagerDuty. This catches systemic failures: a bad deployment, a database connection leak, or an external dependency that is down.

Manual review

Jobs that cannot be automatically retried are reviewed by a developer. The failed_jobs table stores the serialised job payload and the exception trace, providing everything needed to diagnose and replay.

The worst failure mode is silent failure. A job fails, nobody notices, and the customer never receives their invoice. Dead letter queue monitoring prevents this.


Monitoring, Horizon, and Operational Visibility

A queue without monitoring is a queue where problems go undetected until a customer reports them.

Laravel Horizon

Laravel Horizon provides a dashboard and configuration layer for Redis-based queues. It shows real-time metrics for every queue and worker: jobs per minute, job runtime with percentile breakdowns, failed jobs with full exception traces, wait time before a worker picks up a job, and worker status across all processes.

Horizon also manages worker processes through its supervisor configuration, automatically scaling workers up or down based on queue depth. This is more reliable than manually managing queue:work processes with Supervisor.

What to monitor and alert on

Beyond Horizon's dashboard, we set up alerts for specific conditions that require human intervention.

Metric Warning Threshold Critical Threshold
Queue depth 500 pending jobs 2,000 pending jobs
Wait time 30 seconds 120 seconds
Failure rate 1% of processed 5% of processed
Worker count Below expected Zero workers running
Memory per worker 100 MB 200 MB

Process management

Queue workers are long-running PHP processes. They do not restart between jobs. This makes them susceptible to memory leaks, stale database connections, and accumulated state. In production, we configure workers to restart regularly: process up to 1,000 jobs or run for one hour, whichever comes first, then exit cleanly. Supervisor or systemd restarts the worker immediately. This limits the impact of memory leaks and ensures workers pick up code changes after deployments.


Production Failure Modes

Tutorials show you how to dispatch a job and process it. They do not show you what happens when things go wrong at scale. These are the failure modes we encounter repeatedly across the 50+ Laravel applications we maintain.

Memory leaks in long-running workers

PHP was designed for request-response cycles where memory is freed after each request. Queue workers break this assumption. Common sources: Eloquent model events that accumulate listeners, logging handlers that buffer output, and image processing libraries that do not release resources. The fix: limit worker lifetime with --max-jobs and --max-time.

Job timeouts and the --timeout flag

A job that hangs blocks the worker indefinitely. Set the worker timeout higher than the individual job's $timeout property. The job-level timeout raises a MaxAttemptsExceededException. The worker-level timeout kills the process. If they are equal, the worker dies before the exception handler can run.

Race conditions between concurrent jobs

Two workers pick up two jobs that both modify the same account balance. Without locking, one writes an outdated value. Database-level locking (lockForUpdate()) prevents this but adds contention. For high-throughput scenarios, use atomic cache operations or redesign the job to append events rather than mutate state directly.

Tenant isolation in multi-tenant queues

In multi-tenant Laravel applications, jobs must execute within the correct tenant context. Capture the tenant identifier at dispatch time, restore it at the start of handle(). Without this, a queue worker processing jobs from multiple tenants will retain the context from the previous job.

Webhook-triggered job patterns

External systems (Stripe, Xero, CRM platforms) send webhooks to your application. Each webhook should dispatch a job rather than processing inline. This ensures the webhook endpoint returns a 200 quickly and isolates the processing from the HTTP request.

The challenge with webhook jobs is idempotency. Stripe sends the same webhook event multiple times as a reliability measure. Your job must handle receiving the same event three times without creating three payment records. Store the webhook event ID and check it before processing, following the same idempotency patterns described above. This pattern connects to our broader approach to API integrations, where incoming data from external systems flows through queued jobs with validation, deduplication, and error handling at each stage.


When Background Jobs Change How a Business Operates

The technical patterns above are infrastructure. Their business impact is what makes them worth the engineering investment.

  • Reports that generate themselves Users click "Generate" and receive an email with the finished report. No waiting, no timeouts, no 504 errors.
  • Data imports with progress tracking Batch jobs with progress bars replace cron jobs that blocked other scheduled tasks and left operations guessing.
  • Payment webhooks that never go missing Dedicated queues with SQS durability, dead letter monitoring, and automatic retry. Revenue stops leaking through infrastructure gaps.
  • Responsive applications under load Heavy work happens in the background. Users never wait for email servers or report generation. Pages respond instantly.

These are patterns we have implemented across order management systems, financial operations platforms, and service delivery tools since 2005. Background job processing connects to workflow engines for complex multi-step processes, to real-time dashboards for operational visibility, and to infrastructure decisions about how workers are deployed and scaled.


Build Reliable Queue Infrastructure

If your Laravel application is running slow work inside HTTP requests, or if your queue setup works in development but causes problems in production, we can help. The first conversation is free, comes with no obligation, and usually surfaces at least one queue design issue worth fixing.

Book a discovery call →
Graphic Swish