Drinking the VPS Kool-Aid

Isaac Van Doren
2026-02-16

I drank the Kool-Aid. I'm sorry to say it, but I'm a VPS guy now.

I decided to build the ticketing system and website for Software Should Work myself. There are some excellent off-the-shelf options like Tito or Luma that you probably should just use. But I had good reasons to build it myself instead:

Given this bulletproof cost-benefit analysis, I set out to build instead of buy. Source code available on GitHub.

A server

Step one was to find a server. After some comparing, a $5 VPS from Hetzner in Virginia seemed like the best option. After passing through a surprisingly involved identity verification process and entering my credit card into a very German checkout form, I was off to the races. I've heard lots of good things about Hetzner and have not been disappointed in the least. The service, pricing, and console have all been great.

An application

I built the app server in Rust using Axum. Again, it's been a great experience: statically linked executables, excellent performance, good tooling, a rich type system, etc. There's more to say about how the ticketing system itself works, but I'll write another post about that.

A database

I opted for SQLite and have been very pleased overall. It certainly has some oddities, like primary keys allowing nulls by default, foreign key checks being opt-in on a per-connection basis, and having a very loose type-system, but it's hard to beat the performance and operational simplicity of file-as-DB.

I couldn't risk losing ticket information, so naturally I needed backups. For this I chose Litestream, a tool that streams changes to object storage and supports super simple point-in-time recovery: litestream restore ssw.db. It's wonderful to have such frequently recorded, easy to use backups for barely any cost.

I chose Backblaze B2 as my object storage because I read online that its free plan was structured in a way that would cover Litestream usage for many projects at no cost. Unfortunately this hasn't worked out in practice and I've had to pay a steep $2/month for Litestream. Of course this amount is inconsequential, but something seems off here. Litestream is issuing something like 3500 Class C transactions every day which exceeds B2's free tier. I suspect it is calling ListObjectsV2 on a regular interval to do compaction. Unfortunately none of my attempts at modifying the configuration have decreased this rate. In principle I see no reason why Litestream should be continually making requests when the DB almost never changes.

My experience with Litestream overall has been mixed. The fundamental thing it does is fantastic and seems to work well, but there have been some rough edges. For example, I had to look through the commit history to find a workaround for a recently introduced bug that mysteriously caused B2 to reject the requests (sign-payload: true). Nevertheless, the amount of reliability per dollar I'm getting with it is fantastic.

A web server

I used Caddy for the web server and have been extremely pleased. It automatically sets up SSL certs for you which is wonderful, and the whole user experience has been solid. I'm using it to serve static files and proxy the rest of the traffic to the app server.

A way to deploy

My deployment strategy is a ~100 line bash script called deploy.sh. It mostly copies files (using scp and rsync) and runs commands on the server via ssh. Caddy, Litestream, Grafana Alloy (used for logging), and the application are all run as systemd services and restarted or reloaded by the script. This was my first time really using systemd and it's wonderful.

To run database migrations I wrote a short bash script that checks a migration table in the DB for the latest migration number and then runs any SQL files in /migrations with a higher index (001_init.sql, 002_add_column.sql, etc). I could have used something like Flyway, but why? This does everything I need with no dependencies.

I could have gotten away with simply systemd restarting the app server, but this was too risky. If I restarted the service and the new version crashed (suppose I misconfigured an essential environment variable), boom: downtime. To avoid this, I implemented blue-green deployments in 30 lines of bash. Fortunately systemd supports parameterizing the service configuration very simply. Once that was done I could create blue.env and green.env for each service specifying port 3000 and 3001 respectively. Suppose blue is currently running on port 3000: the script will start green on 3001, hit /healthcheck to confirm it's healthy, point Caddy at 3001, and give the signal for blue to gracefully shutdown (allowing any in-flight requests to complete). What results is a simple yet robust setup with no additional dependencies.

CI/CD

I tried out RWX for CI/CD on this project and have been totally blown away. I keep reading complaints about GitHub Actions online and thinking to myself "Hey, RWX solves that!". RWX executes work as chunks of compute called tasks instead of being tied to a job-per-VM model. These tasks can then be arranged in a DAG which you'll realize is clearly the right way to specify a CI pipeline once you try it. Each task is equipped with aggressive, automatic, content-based caching which makes the whole thing extremely fast.

Observability

I started out reading logs directly with journalctl on the server (which works quite well honestly!), but I wanted a UI with easier querying and external log storage. To get this, I opted for Grafana Cloud's free tier. Once I had Alloy, Grafana's log shipping agent, running as another systemd service and modified the configuration to classify the logs from blue and green as a single service, I was good to go. Grafana Cloud is quite nice. It's very feature rich which is nice but overwhelming at times.

Originally I set up a free account in HetrixTools to run healthchecks every minute against my /healthcheck endpoint and the home page. This was dead-simple to configure and a great experience. However, once I had Grafana Cloud running, I realized I could replace the healthchecks with synthetic checks and alert rules to remove the need for HetrixTools. This was much more fiddly to get working and still isn't quite right. About twice a week I get a false positive healthcheck.

EDIT: Turns out the Grafana Cloud free tier includes a limited number of synthetics evaluations per month, so I ended up switching back to HetrixTools anyway.

A CDN

I'm using Cloudflare. No complaints! Delivering assets from their CDN helps with the landing page load time significantly. I also subsetted the font files to decrease their size from ~95kb to 5kb and converted the images to AVIF, both of which made for big speed-ups.

I have made no mistakes and nothing can be improved

In the spirit of Cunningham's Law, I hereby assert that every decision I made was optimal and nothing can be improved about my setup. (Suggestions welcome at [email protected]).

Would I do it again?

Absolutely. What I have now is a fast, cheap, powerful, and surprisingly robust system that I understand very well. I could have certainly avoided much of this effort by using an off-the-shelf tool but that's no fun and I wouldn't have learned anything.

Psst...

Do you like building good software? Come to Software Should Work!