Server Management, Self-Hosting, and DevOps Basics

Your application is only as reliable as the servers it runs on. Web application infrastructure is the set of decisions that prevent failure: hosting, deployment pipelines, monitoring, backups, and a recovery plan you have actually tested. For most UK businesses the foundation is a single server you control. VPS hosting UK providers like Hetzner and DigitalOcean make it cheap and quick to provision, sized correctly and managed properly. The most carefully tested Laravel codebase still becomes worthless when a deployment corrupts the database, a disk fills up at 3am, or a certificate expires without anyone noticing.

This guide covers how we make those decisions, starting with the question every UK team faces first: which host, and which provider to trust with it. The patterns here come from provisioning and managing infrastructure across a wide range of Laravel applications, some running continuously for over a decade. Real failures, real fixes, and the operational discipline that keeps production systems alive. That includes taking over servers configured by previous developers, with no documentation and no obvious way to rebuild them.

The Constraint: Why Infrastructure Decisions Compound

Most development teams treat infrastructure as an afterthought. The application gets attention; the server gets whatever the default setup provides. This works until it does not.

A single misconfigured Nginx worker pool causes request queuing under load. A database running on the same disk as the application means a large import fills the volume and crashes both. An SSL certificate that renews manually gets forgotten and takes the site offline on a Saturday morning.

The compounding effect: Every infrastructure decision constrains future options. The hosting provider you choose determines your scaling options. Your deployment method determines your rollback speed. Your monitoring setup determines whether you find problems or your users do.

The constraint is this: infrastructure must be reproducible, observable, and recoverable. If you can't rebuild a server from scratch in under an hour, you don't have infrastructure. You have a snowflake.

The Naive Approach: Manual Servers and Hope

The tutorial version of deployment looks like this: SSH into a server, run git pull, run composer install, run php artisan migrate, restart PHP-FPM. It works on the first deploy. It fails on the fiftieth.

The problems accumulate gradually. File permissions drift. Environment variables get edited directly on the server and never recorded anywhere. A failed migration leaves the database in a half-applied state.

Then comes the worst one. A composer install pulls a new dependency version that breaks the application, and there is no way to roll back without restoring a full backup.

No documentation: One developer configured the server months ago. Nobody else knows the Nginx configuration or which PHP extensions are installed.

No monitoring: Someone checks the site occasionally. Disk space is not tracked. Memory usage is not graphed. The first sign of trouble is a customer complaint.

No rollback: A failed deployment means restoring a full backup. Recovery time is measured in hours, not seconds.

No reproducibility: The server is a snowflake. Rebuilding it from scratch would take days of detective work.

The pattern extends to hosting decisions. A team starts on shared hosting because it is cheap. The application grows, and shared hosting throttles CPU during peak hours. The response is to upgrade: a bigger plan, then a VPS, then a managed platform. Each migration means a full rebuild, because nothing was documented or automated in the first place.

The Reliable Pattern: Infrastructure as a Managed System

We treat web application infrastructure as code, not as a set of manual configurations. Every server we provision follows a repeatable process. Every deployment is atomic and produces an immutable build artefact. Every failure mode has a documented recovery path.

The stack

Our standard Laravel infrastructure stack uses these components, each chosen for a specific reason and managed against a specific failure mode.

Ubuntu LTS on a VPS

Hetzner or DigitalOcean, depending on region and requirements. Long-term support releases provide security patches without breaking changes.

Laravel Forge for provisioning

Server provisioning, SSL management, and deployment orchestration. Handles PHP installation, Nginx configuration, and Let's Encrypt certificates.

Nginx + PHP-FPM + OPcache

Nginx handles request routing and SSL termination. PHP-FPM manages application worker processes, with pool sizes calculated from available memory (each worker consumes roughly 30-50MB). OPcache eliminates repeated PHP compilation, reducing response times by 50-70% on typical Laravel requests.

PostgreSQL + Redis

PostgreSQL on a separate volume or server for I/O isolation. Redis provides sub-millisecond reads for cache, session storage, and queue processing.

This is not a complex stack. It is deliberately simple. Fewer moving parts means fewer failure points, and every component has been battle-tested across hundreds of thousands of production hours.

PHP-FPM worker sizing

Most PHP hosting guides say "tune your workers" without providing the actual calculation. Here it is.

pm.max_children formula:

(Total RAM - OS overhead - PostgreSQL shared_buffers - Redis maxmemory - queue worker memory) / per-worker RSS

Measure per-worker RSS with: ps -eo rss,comm | grep php-fpm | awk '{sum+=$1; n++} END {print sum/n/1024 " MB average"}'. On a typical Laravel application, expect 30-50MB per worker. A 4GB VPS with PostgreSQL and Redis running locally leaves roughly 2.5GB for PHP-FPM, which supports around 50-80 workers. Set pm = static for predictable memory usage on dedicated servers, or pm = dynamic on shared environments where memory must be reclaimed during quiet periods.

Why Laravel Forge

We use Laravel Forge to provision and manage servers. Forge handles the tedious parts of server setup, while still allowing SSH access for the remaining 10% that needs custom configuration. The alternative is maintaining Ansible playbooks or shell scripts by hand. We have done both, and Forge wins on the 90% case: it reduces the operational burden without taking away the escape hatch.

Forge also provides a deployment pipeline: pull code, install dependencies, run migrations, build assets, restart PHP-FPM. Each step is logged. Failures halt the pipeline before the application is affected. For teams that need more control over deployment orchestration, Envoyer provides dedicated zero-downtime deployment management with release history and one-click rollback.

Hosting Decisions: VPS Hosting UK, Cloud, and Managed Platforms

Choosing where to host a web application is a decision with long-term consequences. The wrong choice costs money, limits scaling options, or creates vendor dependency. The comparison that matters most for a UK business is cloud hosting vs VPS: a managed cloud platform that bills per request, or a single server you control.

VPS hosting (our default)

For most Laravel applications serving under 50,000 daily users, a well-configured VPS is the correct choice. A single server with 4 vCPUs, 8GB RAM, and SSD storage handles more traffic than most businesses generate.

We default to Hetzner for European hosting, where the cost difference is large. A Hetzner CPX31 (4 vCPU, 8GB RAM, 160GB SSD) costs roughly €15/month. The equivalent DigitalOcean droplet costs $48/month, and the equivalent AWS EC2 instance costs roughly $70/month before storage and data transfer. That price gap compounds over years.

VPS hosting also means you own your infrastructure. No platform lock-in, no proprietary APIs, no sudden pricing changes. This ties directly into digital sovereignty. When your business logic, customer data, and operational processes live on servers you control, you keep the freedom to move, modify, or grow without asking a platform vendor for permission.

When to use cloud platforms

AWS, Google Cloud, and Azure make sense when you need specific managed services: object storage with CDN (S3 + CloudFront), managed database clusters with automated failover, or serverless compute for unpredictable workloads.

Cost warning: Cloud billing is notoriously difficult to predict. Egress charges, NAT gateway fees, and per-request pricing on managed services can triple the expected monthly cost. We have seen AWS bills double overnight because a misconfigured logging pipeline was writing gigabytes to CloudWatch.

Our rule: start with a VPS. Move specific services to cloud platforms when you have a concrete requirement that a VPS cannot satisfy. Do not start on AWS because it feels professional. Start on a VPS because it is simple, fast, and cheap.

Managed application platforms

Laravel Vapor, Railway, and similar platforms abstract away server management entirely. These platforms suit applications with highly variable traffic or teams with no infrastructure expertise. The cost per request is higher, but the operational burden is near zero. The limitation is control: when something goes wrong at the platform level, you wait for their support team.

Containerisation: when it helps and when it does not

Docker and Kubernetes appear in most infrastructure discussions. For teams running fewer than five services with a development team under ten people, containerisation adds operational overhead without proportional benefit. The debugging complexity, resource consumption, and learning curve exceed what most SMB applications require.

Containers become genuinely useful when you manage five or more distinct services, need environment parity across a large development team, or run a mature CI/CD pipeline that benefits from immutable build artefacts. Below those thresholds, Forge on a VPS has a lower total cost of ownership and a simpler failure surface. This is not an opinion against containers. It is a decision framework: match the tooling to the complexity of the problem.

Zero-Downtime Deployments

Every production deployment we run follows the atomic deployment pattern. A deployment pipeline is not a luxury. It is the difference between a five-second rollback and a two-hour recovery.

Create a new release directory

Clone or pull the latest code into a fresh directory on the server. Install Composer dependencies with --no-dev --optimize-autoloader.

Run migrations and build assets

Run database migrations with a pre-flight check. Build frontend assets if required. Run a health check against the new release.

Swap the symlink

Point the current symlink at the new release directory. This is atomic. The application serves the old release until the exact moment the symlink changes.

Reload and clean up

Graceful PHP-FPM restart (no dropped connections). Purge old releases, keeping the last five for rollback. Rollback means pointing the symlink at a previous directory: a one-second operation.

Some teams adopt blue-green or canary deployment strategies at this stage, running the new release alongside the old and shifting traffic gradually. For most single-server Laravel applications, the symlink swap achieves the same outcome with less operational overhead.

Contrast that with the naive approach. Running git pull in the live directory means the application serves partially-updated code during deployment. A request hitting the server mid-pull might load old controllers with new views, which causes errors that are difficult to reproduce and diagnose.

When deployments fail

Every zero-downtime deployment guide covers the happy path. The harder question is what happens when a migration fails mid-deploy, when the new release passes health checks but breaks under real traffic, or when a Composer dependency introduces a runtime error that only surfaces in production.

The answer depends on the type of migration that ran. Additive migrations (adding columns, creating tables) are rollback-safe: point the symlink back to the previous release and the old code ignores the new columns. Destructive migrations (dropping columns, renaming tables) are not rollback-safe: the previous release expects columns that no longer exist. This is why we separate additive schema changes from destructive cleanup, deploying them in different releases with a buffer period between.

Rollback rule: If the migration was additive, roll back code immediately (symlink swap, one second). If the migration was destructive, you must roll forward with a fix. This distinction is the reason we never combine additive and destructive schema changes in a single deployment.

Monitoring and Alerting

Monitoring without alerting is data collection. Alerting without monitoring is guesswork. We configure both.

Metric	Alert Threshold	Why It Matters
HTTP 5xx rate	Above 1%	Application errors affecting users
Response time (p95)	Above 500ms	Performance degradation under load
Disk usage	80% warning, 90% critical	Most common infrastructure failure
Memory	Below 500MB free	Process starvation and OOM kills
SSL certificate expiry	14 days before expiry	Auto-renewal can fail silently
Queue depth	Above configured threshold	Background work is backing up

Server monitoring tools for small estates

Enterprise APM tools (Datadog, New Relic) cost more per month than the servers they monitor at SMB scale. For teams running 1-5 servers, the right server monitoring tools come in layers, and together they cover every metric without enterprise pricing. This is the practical side of site reliability engineering (SRE): matching observability tooling to the size of the estate.

Oh Dear or Uptime Robot

External uptime monitoring, SSL certificate expiry alerts, and mixed-content detection. These catch problems that server-side monitoring cannot: DNS resolution failures, CDN outages, and certificate renewal issues.

Laravel Pulse

Application-level metrics built into Laravel: slow queries, cache hit rates, queue throughput, and user request patterns. No external service required.

Netdata

Real-time server metrics (CPU, memory, disk I/O, network) with zero-configuration installation. Lightweight enough for production use on the same server it monitors.

Sentry

Error tracking with full stack traces, release tracking, and integration with deployment pipelines. Shows which deployment introduced a new error class.

Lessons from production

The metrics above tell you what to watch. Experience tells you which failures actually happen. These four have caught us out, or nearly caught us out, often enough to be worth stating plainly. Each one is cheap to prevent and expensive to discover live.

✓

Disk full is the most common failure. Log files, failed job output, and temporary upload files fill disks silently. Automated log rotation and disk monitoring prevent this.

✓

Memory leaks in queue workers cause gradual degradation. Workers should restart after a fixed number of jobs (--max-jobs=1000).

✓

Certificate auto-renewal fails silently when the HTTP-01 challenge cannot reach the server. The usual culprits are firewall rules blocking port 80, a CDN proxying the challenge request, DNS-01 API credentials expiring, or the Certbot timer being disabled after an OS upgrade. Test renewal manually after any networking change.

✓

DNS propagation takes up to 48 hours. Plan DNS changes well ahead of any deadline.

Backup Strategy

We follow the 3-2-1 rule: three copies of data, on two different storage types, with one copy off-site.

Database backups

Automated daily PostgreSQL dumps via pg_dump, stored locally and replicated to off-site object storage. For production databases, we enable WAL (Write-Ahead Log) archiving, which provides point-in-time recovery: the ability to restore the database to any second, not just the last nightly dump.

Uploaded files

Synced to off-site storage nightly. Large media files stored directly in object storage (S3 or equivalent) rather than on the application server.

Server configuration

Managed through Forge. A new server can be provisioned from scratch in under 30 minutes.

The discipline that separates a backup from a hope is testing restores. A backup that has never been restored is not a backup. We test database restores monthly and document the recovery time, so a disaster recovery plan is grounded in a real number rather than an assumption.

Two numbers govern that plan. The Recovery Time Objective (RTO) sets how long the business can tolerate being offline. The Recovery Point Objective (RPO) sets how much data loss is acceptable. A nightly pg_dump gives an RPO of up to 24 hours. WAL archiving with continuous shipping can reduce RPO to seconds.

If a full restore takes longer than the business can tolerate, we adjust the strategy. That might mean faster storage, a parallel restore, or a standby replica that can be promoted immediately.

Scaling: Vertical First, Then Horizontal

Premature scaling wastes money and adds operational complexity. Most Laravel applications never need horizontal scaling. Vertical scaling (upgrading the server) is simpler, cheaper, and sufficient for the vast majority of workloads.

Vertical scaling

A single well-configured server handles more traffic than most teams expect. Nginx serves static assets from memory. Redis eliminates repeated database queries. PHP-FPM worker tuning ensures available memory is used efficiently. When a server is genuinely under-resourced, the fix is straightforward: increase RAM, add CPU cores, switch to faster storage.

When horizontal scaling is necessary

Horizontal scaling becomes necessary in two situations. The first is when a single server cannot handle the load regardless of how large you make it. The second is when you need geographic distribution to cut latency for users far from the server. Both require architectural changes that are best planned before they are needed.

The prerequisites are not optional. Each one removes a piece of local state that would otherwise break the moment a second server enters the picture.

✓

Sessions in Redis (not the filesystem)

✓

File uploads in object storage (not the local disk)

✓

Load balancer distributing traffic across application servers

✓

Database connection pooling to prevent connection exhaustion

✓

Centralised logging so you can debug across multiple servers

The decision rule: if your monthly hosting cost is under £500 and you are not experiencing performance problems, you do not need horizontal scaling. Invest that engineering time in application-level optimisation instead: query optimisation, caching strategies, and efficient background job processing.

Common Infrastructure Symptoms

When something goes wrong in production, the symptom is rarely the cause. This reference maps the errors you see to the infrastructure problems that produce them.

Symptom	Likely Cause	Fix
502 Bad Gateway	PHP-FPM socket not running or worker pool exhausted	Check `pm.max_children` against available memory. Restart PHP-FPM.
504 Gateway Timeout	Nginx `fastcgi_read_timeout` exceeded	Optimise the slow request, or move long-running work to a background job.
Disk full	Unrotated log files, failed job output, or temporary uploads	Configure `logrotate` for Laravel logs. Monitor disk usage with alerts at 80%.
SSL certificate expired	Let's Encrypt ACME renewal failed silently	Check HTTP-01 challenge accessibility. Test `certbot renew --dry-run`.
"Too many connections"	PHP-FPM workers exceeding PostgreSQL `max_connections`	Add PgBouncer for connection pooling, or reduce `pm.max_children`.
OOM killer in dmesg	`pm.max_children` set too high for available RAM	Recalculate worker count using the formula above. Switch to `pm = static`.
Stuck queue jobs	Worker crash from memory leak or unhandled exception	Set `--max-jobs=1000` and `--max-time=3600` to force periodic worker restarts.

Infrastructure as an Asset

Web application infrastructure is not a cost centre. It is an operational asset that determines uptime, deployment speed, and recovery capability.

✓

Deployments happen multiple times per day Atomic deploys with one-second rollback. No stress, no downtime, no maintenance windows.
✓

Failures are detected before users notice Continuous monitoring with targeted alerts. Problems found in minutes, not days.
✓

Recovery follows a documented procedure Tested backups, rehearsed restores, known recovery times. No panicked improvisation.
✓

Provider independence Standard Linux servers on standard infrastructure. Move providers in a day, not a quarter.

This is closely related to the question of owning versus renting your systems. Infrastructure decisions also shape your security and operational posture. Every component in the stack is a potential attack surface. Fewer components, kept current and monitored, means a smaller surface to defend.

The same logic applies during change. If you are migrating from a legacy system, the infrastructure plan has to account for the transition: running old and new systems in parallel, synchronising data, and the eventual cutover. Infrastructure isn't a one-time decision either. It's an ongoing maintenance commitment that compounds in value when you treat it as a first-class concern.

Get Your Infrastructure Right

If you are running a web application and your infrastructure needs attention, we are happy to talk it through. Infrastructure management is a core part of our ongoing support service, covering monitoring, security patches, deployment pipelines, and capacity planning.

Discuss your infrastructure →

Development

Systems