Wrote 640 TB/Year to My SSD. Here’s the Fix

TL;DR

– A logging bug is writing ~640 TB/year to a single SQLite file
– A 1 TB consumer SSD lasts about 600 TBW total — this bug could eat that in a year
– A fix has been implemented; update now
– Workaround: SQLite trigger blocks writes immediately

—

If you left the CLI running while you slept, your SSD just aged a year. Not a metaphor. Not an exaggeration. A logging bug is writing hundreds of terabytes per year to one SQLite file.

The bug is real.

The data is verified. Fix: already implemented.

Short version: the feedback logger runs at TRACE verbosity globally.

Every module writes maximum diagnostic detail, all the time.

That’s roughly 640 TB per year. A typical 1 TB consumer SSD is rated for about 600 TBW over its entire lifespan. Leave it open for twelve months and you’ve consumed your drive’s endurance in twelve months. Not from heavy work. Just from leaving it running.

What’s Actually Writing to Your Drive

The culprit sits at `~/.codex/logs_2.sqlite`. It uses SQLite’s write-ahead logging (WAL) mode, which sounds reasonable until you realize the logging sink is cranked to global TRACE. Every module writes at maximum verbosity constantly, even when you’re not doing anything. Stream a single response and watch `logs_2.sqlite-wal` grow in real time.

Users reported constant write activity even with no active sessions.

The real problem is write amplification. The actual data being logged. Timestamps, module names, event types. Is tiny. But WAL writes full pages on each insert. So physical bytes hitting storage are orders of magnitude more than what the logs are actually storing. One developer measured about 36,211 rows every 15 seconds while it sat idle. That’s 144,000 rows per minute. It’s not diagnostics.

It’s a logging DDOS on your own hard drive.

Storage bloat gets extreme.

Someone found their `logs_2.sqlite` database had swollen to 27 GB. Running `VACUUM FULL;` collapsed it to 73 MB. Twenty-seven gigabytes of trace logs. Most of it noise.

Why This Bug Matters More Than It Seems

Here’s what actually concerns me: this has been a known issue for some time. Multiple users reported disk space exhaustion from leaving it running. Community forums had threads describing the exact behavior.

The bug reporter did the forensic work themselves.

Sampling row ID distributions. Measuring write rates over 15-second windows. Calculating the per-year extrapolation. That’s not the developers finding their own bug. That’s a user doing the QA for them.

And the root cause isn’t subtle. Global TRACE default. No rate limiting. No log rotation. No way to disable it without blocking SQLite entirely. This isn’t some complex async edge case. It’s a logging configuration problem that would’ve taken an hour to catch in staging. The fact that it shipped as default for months, hit the front page of Hacker News. And sat unfixed while users’ drives degraded in real time? That’s the actual story.

How to Stop the Bleeding Right Now

The fix that has been implemented switches from global TRACE to targeted INFO-level logging for core modules.

Update now and the problem’s solved.

For those who can’t update immediately, or want immediate relief, two options worth knowing.

First: a SQLite trigger that blocks new inserts before they happen:

“`
sqlite3 ~/.codex/logs_2.sqlite “CREATE TRIGGER IF NOT EXISTS block_log_inserts BEFORE INSERT ON logs BEGIN SELECT RAISE(IGNORE); END;”
“`

Keeps it functional. Stops all further log writes. Drive stops bleeding. Lose diagnostics. Your call.

Second: move the logs directory to RAM so writes never touch persistent storage:

“`
mkdir -p /tmp/codex_logs
ln -s /tmp/codex_logs ~/.codex/logs_2.sqlite
“`

Caveat: verify `/tmp` is actually tmpfs-backed with `df -h /tmp`. On some systems `/tmp` is just a regular disk partition and you’ve moved the wear to a different part of the same drive. If you’re on a Mac or running Linux with a memory-backed tmpfs, this eliminates the wear entirely.

For CI environments, same logic applies at the runner level. Point `~/.codex` at a scratch tmpfs inside your container. Problem disappears.

Side note: their docs are a mess and there’s no mention of this anywhere in the official troubleshooting guides. Just saying.

What This Means for Small Operators

I run a small agency. Leave dev tools running all day. Don’t watch iostat while I work.

Stuff like this is exactly the kind of silent hardware killer that eats into replacement budgets without anyone noticing until a drive dies.

The real issue isn’t the bug itself.

Every complex tool ships with configuration problems. It’s the timeline. Known for some time. Fix implemented recently. That’s months of unnecessary wear on every user who left their machine running. And it took a community issue going viral on HN and Reddit before it got prioritized.

That’s the pattern worth paying attention to. It’s not unique to this tool.

If you’re using it right now, update first. Then decide whether the trigger or the RAM workaround fits your situation. And if you’re evaluating AI CLI tools for your stack, add “what does the logging config look like in production” to your checklist. Apparently it matters.

Check your version. Run the trigger command if you need to. This one’s on the community to catch.

Sources

– GitHub issue #28224. Original bug report and forensic analysis
– Changelog. Fix implemented recently
– SQLite documentation — WAL mode and write amplification