Your API Already Knows What’s Wrong. Why Doesn’t It Say So?
I wasted twenty minutes last month because an API wouldn’t tell me what I did wrong.
I was creating a webhook subscription. Six fields in the body. The response:
{
"error": "Bad request"
}
No code. No detail. Nothing. So I did the thing we all do. I started commenting out fields one by one, sending requests, guessing. Twenty minutes later I found it. The url field needed HTTPS, not HTTP.
The server knew this the entire time. The validation function checked the scheme, found HTTP, and rejected it. That information existed. Someone made the decision to not include it in the response. Or more likely, nobody thought about it at all.
That’s the real problem. Error responses are rarely designed. They’re whatever falls out of a catch block.
Your errors have a new audience
This has always been frustrating for developers. But here’s what’s changed: your API’s error responses are no longer just read by humans.
LLM agents are calling APIs through tool-use. AI coding assistants are generating integration code and running it. Orchestration pipelines are chaining calls across services and making decisions based on what comes back.
When an agent hits {"error": "Bad request"}, it’s completely stuck. It can’t self-correct. It can’t decide whether to retry, fix the input, or give up. It has the same options the developer had: guess.
A structured error with a code, a field name, a rejected value, and a hint? The agent reads it, fixes the input, and retries. No human involved. The developer who set up the agent doesn’t even know the error happened.
Your error responses aren’t just developer UX anymore. They’re a machine interface. And most APIs are failing at both.
Building errors that carry context
The root cause is almost always the same: context gets destroyed on its way to the response. A validation function knows exactly which field failed and why. By the time it reaches the error middleware, all that’s left is a string.
The fix is an error class that carries structured data from origin to response:
class ApiError extends Error {
constructor(code, overrides = {}) {
const catalog = ERROR_CATALOG[code];
super(overrides.message || catalog.message);
this.code = code;
this.status = overrides.status || catalog.status;
this.hint = overrides.hint || catalog.hint;
this.meta = overrides.meta || {};
}
toJSON() {
return {
error: {
code: this.code,
message: this.message,
hint: this.hint,
...this.meta,
docs_url: `https://api.example.com/errors/${this.code}`,
},
};
}
}
It pulls defaults from a catalog so every error code has a consistent baseline, but accepts overrides for instance-specific detail. The meta bag carries whatever the specific error needs: field-level validation results, version numbers, retry timing.
The middleware then does one job: format the error, not replace it.
app.use((err, req, res, _next) => {
if (err instanceof ApiError) {
return res.status(err.status).json({
...err.toJSON(),
request_id: req.id,
});
}
// Unexpected errors: log everything, expose nothing
console.error(`[${req.id}] Unhandled error:`, err);
res.status(500).json({
error: {
code: "INTERNAL_ERROR",
message: "An unexpected internal error occurred.",
hint: "If this persists, contact support with the request_id.",
request_id: req.id,
},
});
});
This is where most codebases go wrong. The middleware becomes a black hole. Some well-meaning developer adds a generic catch-all that converts every error into "Something went wrong" and the structured context that existed three layers up is gone. Don’t do this. Your middleware should be a formatter, not a filter.
Scenarios that actually come up
Everything below is from a reference Express API I built as a companion to this post. You can clone it, run it, and try every curl command yourself.
Validation that actually helps
A developer creates a webhook with an HTTP URL and a made-up event type:
curl -s http://localhost:3000/webhooks -X POST \
-H "Authorization: Bearer sk-live-demo" \
-H "Content-Type: application/json" \
-d '{"url":"http://myapp.com/hook","events":["fake.event"]}' | jq
The route validates each field and collects every problem, not just the first one:
app.post("/webhooks", requireAuth, rateLimit, (req, res, next) => {
const { url, events } = req.body || {};
const errors = [];
if (!url) {
errors.push({ field: "url", message: "Required." });
} else if (!url.startsWith("https://")) {
errors.push({
field: "url",
message: "Must use HTTPS.",
rejected_value: url,
hint: "Webhook endpoints must be served over TLS. Change http:// to https://.",
});
}
if (!events || !Array.isArray(events) || events.length === 0) {
errors.push({
field: "events",
message: "Must be a non-empty array of event types.",
available_events: VALID_EVENTS,
});
} else {
const invalid = events.filter((e) => !VALID_EVENTS.includes(e));
if (invalid.length > 0) {
errors.push({
field: "events",
message: `Unknown event types: ${invalid.join(", ")}`,
rejected_values: invalid,
available_events: VALID_EVENTS,
});
}
}
if (errors.length > 0) {
return next(new ApiError("VALIDATION_FAILED", { meta: { errors } }));
}
// ... create the webhook
});
The response:
{
"error": {
"code": "VALIDATION_FAILED",
"message": "Request body failed validation.",
"hint": "Check the 'errors' array for specific fields that need fixing.",
"errors": [
{
"field": "url",
"message": "Must use HTTPS.",
"rejected_value": "http://myapp.com/hook",
"hint": "Webhook endpoints must be served over TLS. Change http:// to https://."
},
{
"field": "events",
"message": "Unknown event types: fake.event",
"rejected_values": ["fake.event"],
"available_events": ["order.created", "order.updated", "payment.completed", "payment.failed"]
}
]
},
"request_id": "req_a1b2c3d4e5f6g7h8"
}
Both problems, in one response, with the rejected values and the valid options. The developer fixes everything in a single pass instead of playing whack-a-mole with "Bad request". An AI agent reads the available_events list and corrects the request without asking anyone.
Returning only the first validation error is a choice that disrespects the developer’s time. Collect them all.
Version conflicts that enable smart recovery
Two systems update the same config. The second one sends a stale ETag:
curl -s http://localhost:3000/configs/cfg_001 -X PATCH \
-H "Authorization: Bearer sk-live-demo" \
-H "Content-Type: application/json" \
-H 'If-Match: "etag-v2"' \
-d '{"theme":"dark"}' | jq
app.patch("/configs/:id", requireAuth, rateLimit, (req, res, next) => {
const config = configs[req.params.id];
const clientEtag = req.headers["if-match"];
if (clientEtag !== config.etag) {
const clientVersion = parseInt(clientEtag.replace(/[^0-9]/g, ""), 10) || 0;
const changesSince = config.history.filter((h) => h.version > clientVersion);
return next(new ApiError("VERSION_CONFLICT", {
hint: `Your request was based on version ${clientVersion} but the current version is ${config.version}. Fetch the latest with GET /configs/${config.id}, re-apply your changes, and retry.`,
meta: {
your_version: clientVersion,
current_version: config.version,
current_etag: config.etag,
changes_since_your_version: changesSince,
},
}));
}
// ... apply the update
});
The response tells the caller what version they had, what the current version is, and every change that happened in between. Who changed what, when.
This is the difference between a blind overwrite and an informed decision. A developer can build a merge UI. A CI/CD pipeline can detect that a human made manual changes and pause before overwriting. An agent can fetch the latest, diff the fields, and re-apply only what doesn’t conflict.
"Precondition failed" enables none of this.
Partial batch failures
A batch of messages. One has a bad phone number:
curl -s http://localhost:3000/messages/batch -X POST \
-H "Authorization: Bearer sk-live-demo" \
-H "Content-Type: application/json" \
-d '{"messages":[{"to":"+44700000001","body":"Hello"},{"to":"bad","body":"Hi"}]}' | jq
{
"summary": { "total": 2, "succeeded": 1, "failed": 1 },
"results": [
{ "index": 0, "status": "sent", "message_id": "msg_a1b2c3d4", "to": "+44700000001" },
{
"index": 1, "status": "failed",
"error": {
"code": "VALIDATION_FAILED",
"errors": [{
"field": "to",
"message": "'bad' is not a valid E.164 phone number.",
"hint": "Phone numbers must include country code, e.g. +44700900000",
"rejected_value": "bad"
}]
}
}
]
}
HTTP 207. Per-item results. The consumer knows exactly what happened: item 0 sent, item 1 failed. An orchestration agent processes the success and retries only the failure.
Returning 400 Bad Request for a batch where two out of three items succeeded is wrong. It forces the consumer into an impossible position: did anything send? Should I resend everything and risk duplicates? Batch endpoints that don’t support partial success reporting will cause data integrity issues. It’s a matter of when, not if.
Downstream failures with retry safety
The API calls a third-party document verification service. It times out:
curl -s http://localhost:3000/documents/verify -X POST \
-H "Authorization: Bearer sk-live-demo" | jq
{
"error": {
"code": "DOWNSTREAM_TIMEOUT",
"message": "Document verification timed out. Our provider did not respond within 30s.",
"hint": "This is a transient upstream issue, not a problem with your request. Safe to retry with backoff.",
"retry_safe": true,
"retry_after_seconds": 5,
"provider": "document-verification",
"provider_status": "degraded",
"status_page": "https://status.example.com"
},
"request_id": "req_f8e7d6c5b4a39281"
}
retry_safe is the most important field in this entire response. It tells the caller whether retrying could cause duplicate side effects. For a read-only verification check, it’s true. For an endpoint that triggers a financial transaction and times out after the request left your system, it would be false.
Without this field, careful consumers default to giving up. That means your transient failures become permanent failures for anyone who builds their integration responsibly. You’re punishing the developers who are trying to do the right thing.
Be transparent about downstream issues. Your developers know you depend on other services. Pretending a third-party timeout is a mysterious “Bad gateway” doesn’t protect you. It just makes everyone’s debugging harder.
Your error catalog should be an endpoint
One pattern worth stealing from the reference API: the error catalog is a live endpoint, not just a docs page.
curl -s http://localhost:3000/errors | jq
{
"errors": [
{
"code": "AUTH_MISSING",
"status": 401,
"message": "API key is missing from the request.",
"hint": "Include your key in the Authorization header.",
"docs_url": "http://localhost:3000/errors/AUTH_MISSING"
}
],
"total": 8
}
SDKs fetch this at startup and provide rich client-side error messages, even for codes added after the SDK was published. AI agents query it before making requests to understand what errors are possible. Documentation generates from it automatically. One source of truth that serves every consumer.
If your error codes only exist as prose in a docs page somewhere, they’re incomplete. Make them queryable.
Start here
You don’t need to redesign your entire API.
Find the error that generates the most support tickets. Add a hint field. Ship it this week.
One field. One deploy. Immediate impact.
Then when you’re ready:
- Audit your five worst error paths. For each one: could a developer fix this without contacting support? Could an AI agent self-correct from this response? If not, the error is missing context.
- Create an error catalog. One file. Every code, what it means, the fix. Expose it as an endpoint. Start with ten.
- Monitor by error code, not HTTP status. A dashboard of
400s is useless.VALIDATION_FAILEDvsRATE_LIMIT_EXCEEDEDvsAUTH_MISSINGis actionable.
I built a reference Express API you can clone and run to see all of these patterns working. There’s also an interactive demo that walks through the scenarios visually.
For production examples of this done well: Stripe’s error codes, Twilio’s error dictionary, OpenAI’s error handling. For a formal standard: RFC 9457 (Problem Details for HTTP APIs).
Your API already knows what went wrong. Start telling people about it.