Vibe Coding is Fun — Vibe Shipping is Dangerous

I’ve been running AI coding tools for the past year, personal projects, late nights, things I’d normally be too busy to prototype. I wanted to understand the gain and the cost before scaling them. Andrej Karpathy named the phenomenon in early 2025[1]: fully give in to the vibes, forget the code even exists, and accept whatever the model gives back. The internet quickly turned it into a methodology.
What I’ve actually found: vibe coding is one of the most powerful prototyping workflows I’ve encountered, and simultaneously a habit that can unintentionally erode code quality and fundamental engineering skills if left unchecked. Both statements are true, and the tension between them is where the real work of engineering begins.
Why AI Coding Tools are Context-Blind
The common narrative is that the AI is an Anak Intern, a lightning-fast intern with infinite confidence but zero context.
It can implement common patterns at blistering speed and reasonable quality. What it cannot do is understand the invisible gravity of your specific codebase. It doesn’t know that your custom authentication middleware handles edge cases differently than the standard library patterns. It doesn’t know this specific field is already sanitized upstream and doesn’t need to be double-processed. It has no memory of the deliberate architectural trade-offs your team made three months ago—trade-offs the new code is about to silently reverse.
The model’s confidence is completely decoupled from its accuracy. Because it always returns syntactically flawless code, it masks logic errors under a veneer of professional polish.
Where AI-Generated Code Breaks in Production
AI-generated code is naturally optimized for the “happy path”—making the demo work in isolation. Production, however, is defined by the adversarial edge cases that the demo ignores.
When auditing AI-assisted commits, the same structural cracks appear repeatedly: missing input validation, hardcoded or insecurely handled configurations, access control assumed rather than checked, and complex error paths completely skipped.
Then there is the subtle issue — nyenggol — the side-effect of a change nudging or colliding with adjacent, seemingly unrelated modules. Because a model lacks a holistic, stateful understanding of a complex monolith or microservice mesh, it generates a locally correct implementation that silently breaks an upstream dependency.
I’ve deleted more AI-generated code than I’ve shipped. The cleanup isn’t as simple as deleting a bad function; it’s the invisible tax of untangling side effects, restoring overwritten architectural conventions, and ensuring the next engineer doesn’t inherit a codebase of shifting sands. That cleanup cost rarely shows up in sprint velocities, but it shows up later in post-mortems and emergency refactors.
A 2025 Cloud Security Alliance review of over 1,400 vibe-coded production applications found that 65% had security issues, 58% contained at least one critical vulnerability, and AI-assisted commits exposed secrets at more than twice the rate of human-only code.[2] Models generate code that passes the demo but fails under adversarial production conditions.[3] This isn’t confined to research papers or small teams; we’ve seen major production outages at some of the world’s largest tech infrastructure providers tied to AI.[4][5]
There is also the critical issue of data exposure. In early 2023, Samsung engineers leaked confidential source code and meeting transcripts to ChatGPT to debug issues and generate notes, leading to immediate corporate bans.[6] These engineers weren’t malicious; they were simply moving fast. When speed is the only metric, security and intellectual property boundaries could be the first thing to erode.
The Junior Engineer Paradox
We know that deep engineering intuition is built through the painful cycle of reading, writing, breaking, and manually debugging code. With the rise of vibe coding, the industry narrative has shifted to “code is cheap” and the expectation moves from writing code to simply “reviewing” and “owning” it. This shift is especially dangerous for junior engineers who ultimately need to grow their engineering fundamentals. How can you review what you do not yet know how to build?
A junior engineer who generates a complex function they don’t understand, reviews it superficially, and merges it has shipped code they cannot truly own. They can describe the feature, but they cannot explain the code. More importantly, they cannot debug it at 2 AM when an upstream service fails. If we bypass the frustrating, manual process of writing basic code, we risk raising a generation of engineers who are fluent in prompting but illiterate in systems.
You cannot learn what you skip.
To break this paradox, I began experimenting with a flipped model:
- Test-Driven Generation: Instead of having the AI write the code and me reviewing it, I write the detailed integration tests and edge-case assertions first. Only then do I let the AI generate the implementation to satisfy those tests. This forces me to think deeply about system boundaries and expectations rather than defaulting to passive code-reading.
- Reconstructive Reviews: When I do use AI to generate complex logic, I force myself to perform a “reverse-engineered” review. I must be able to articulate the runtime mechanics and resource footprint of the generated code before I consider it complete.
A Safe Workflow for AI-Assisted Engineering
Despite the risks, AI tools have earned a permanent place in my workflow. They excel at rapid prototyping, setting up boilerplate, and acting as a sounding board.
My personal favorite use case is utilizing models to stress-test specifications and system designs before writing a single line of code. Feeding a raw architectural spec to a model and asking it to “identify edge cases, race conditions, or missing validation paths in this design” routinely surfaces structural gaps we would have otherwise missed.
At OCBC Indonesia, we introduced AI using a strict, low-blast-radius framework. We use it for generating unit test coverage, draft documentation, onboarding summaries and peer programming. AI is used strictly as an adversarial “devil’s advocate” generator, never as a decision-maker.
During reviews, we feed the proposed change documentation into a sandboxed model to generate an exhaustive, customized checklist of potential risks (e.g., “What happens if the migration fails halfway? Did we consider the database lock duration on this table under high load?”). The tollgate officers then review this checklist to ensure no blind spots were missed. The AI does not write the audit trail, and it does not sign off on the release. It challenges our assumptions; it does not replace our judgment.
Operational Guardrails
Idealistic rules like “review every line” fail under shipping pressure. To keep guardrails realistic, enforce them through automated workflows and structural constraints:
Automated PR Gates
Don’t rely solely on human vigilance. Pre-commit hooks and CI/CD pipelines are essential to catch common AI failure modes before they reach human reviewers:
- Mandatory scanning for hardcoded secrets, high-entropy strings, and credentials.
- Static analysis rules that block common insecure patterns (like raw SQL concatenation or unvalidated redirects).
- Automated verification that test coverage matches or exceeds the baseline, preventing the shipping of massive generated features without corresponding assertions.
Vertical Slicing and Context Limits
All PRs must represent small, vertically sliced features (under ~300 LoC). If a PR is larger, it is automatically rejected by the CI gate. This forces developers to slice their AI interactions into manageable units that can actually be comprehended and reviewed line-by-line, preventing “blind trust” merging.
The No-AI Zone
Operating in a highly regulated environment, systems handling customer PII, transaction ledgers, or interest calculation algorithms are strictly “no-AI zones” for logic generation. The risk of a silent decimal hallucination in a financial calculation is a risk no one should ever accept for minor velocity gains.
The Engineer of Tomorrow
The workforce is changing shape, and the entry-level bar is rising, not falling.
In the past, a junior developer could build a career by writing predictable CRUD boilerplate. That work is gone. The valuable engineer of tomorrow is not the one who can write code the fastest—the machine has already won that race. The valuable engineer is the one who can orchestrate constraints, design robust system boundaries, and exercise rigorous skeptical judgment over the code that is generated.
Will AI replace software engineers? No. But the engineers who treat these tools as a replacement for thinking will quickly find themselves replaced. The engineers who use AI as a force multiplier for deep, adversarial thinking will own the future of software.
Vibe coding is fun. Vibe shipping is dangerous.
