Playbook · 10 min read · For engineering

Engineer's playbook: integrating Axiru into a Stripe codebase without breaking anything

What integrating Axiru actually looks like in a Stripe-native codebase. Wire shape, where to call into the decision engine, idempotency, error handling, observability hooks, rollback plan. Code-flavored without being a copy-paste tutorial.

You are the engineer who pulled the Axiru ticket. The pitch is straightforward: insert a policy decision between your application and the Stripe refund/payout API. The reality has three integration concerns that matter: where exactly to call the decision engine, how to handle the decision outcomes correctly, and how to keep the system observable enough that 'why did Axiru block this' is a one-minute investigation.

This playbook covers the integration shape Stripe-native teams converge on, the failure modes to plan for, and the rollback story.

The integration shape

Every outbound Stripe call that creates money movement (stripe.refunds.create, stripe.payouts.create, stripe.transfers.create, stripe.applicationFees.createRefund) gets wrapped. The wrap is a function call into Axiru's decision engine that returns one of three verdicts: APPROVE (execute the Stripe call), ESCALATE (queue for human approval, do not execute yet), or DENY (do not execute, return a structured reason to the caller).

The wrap lives in your codebase, not on Stripe's side. Axiru does not sit on the network path between your service and Stripe; it sits in your service. That matters for latency (sub-100ms p95 on the decision call), for failure modes (Axiru being down does not mean Stripe is unreachable), and for evidence (your service is the source of truth for what was attempted).

Practical placement: a thin module like services/outflow/governed.ts that exports governedRefund, governedPayout, etc. Every caller in the app that used to call stripe.refunds.create directly now calls governedRefund. The migration is mechanical and grep-able.

Idempotency: get this one right

Stripe accepts an Idempotency-Key header on every write call to deduplicate retries. Axiru needs the same key on the decision call. Use one key for the (decision + Stripe execute) pair, not two separate keys, so a retried request results in one decision record and one Stripe call, not two.

The wrap function signature should accept an idempotencyKey parameter and refuse to run without one. Callers will pass the same key they would have passed to Stripe directly; the wrap fans it out to Axiru and to Stripe.

This single rule prevents the most common integration bug: a request fails mid-flight, the caller retries, and now there are two decisions and one or two refunds depending on where the original failed. With shared keys, the second call is a no-op everywhere.

Handling ESCALATE correctly

APPROVE and DENY are easy: APPROVE means execute the Stripe call, DENY means return a structured error to the upstream caller. ESCALATE is the interesting one. The decision is pending until a human approves or denies in the Axiru UI (or Slack, or wherever your approvers live).

Your service has two choices for ESCALATE: (a) return a 202 Accepted-style response to the caller and let your downstream system poll or subscribe to the eventual outcome via webhook, or (b) block the synchronous request and wait for the verdict. Choice (a) scales better and is the recommended path for any non-trivial volume.

Axiru emits a webhook (decision.resolved) when the human approves or denies. Your service subscribes, looks up the original idempotency key, and either executes the Stripe call (on approve) or notifies the original caller (on deny). The webhook handler is idempotent; replay a decision.resolved event and nothing breaks.

Observability hooks

Four metrics worth exporting from the wrap: decision_latency_ms (the time from wrap entry to verdict), decision_outcome_total{verdict=approve|escalate|deny} (counter), decision_axiru_error_total{type=timeout|5xx|network} (counter), and end_to_end_refund_latency_ms (decision + Stripe round-trip).

Trace context: pass an open-telemetry span ID into the wrap, propagate to both Axiru and Stripe. The single biggest win from this is debugging 'why was this refund slow' across three systems without grepping logs.

Logging: structured JSON, one line per decision call, including idempotency_key, verdict, policy_version, and (on ESCALATE) approver_id once resolved. Do not log the full request payload; reason codes and amounts are enough.

Failure modes and graceful degradation

Axiru decision call times out: timeout budget should be tight (250ms is a sane default). On timeout, the wrap fails closed (DENY with a 'governance unavailable' reason) on Live Launch and Control tiers; on Scale, configurable fail-open is available for specific surfaces that the team has decided are lower-risk than blocking. Default is fail closed.

Axiru returns 5xx: same path as timeout. Do not retry inline; the upstream caller can retry with the same idempotency key.

Stripe call fails after APPROVE: Axiru has already recorded the decision. Your service emits the Stripe error back to the caller and the decision record stays as 'approved, execution failed'. Reconciliation is straightforward: scan for decisions in that state and either retry or mark as not-executed.

Network partition between your service and Axiru: the same as a timeout. Fail closed by default. Use the same idempotency-key path on retry.

Rollout plan inside your service

Stage 1: deploy the wrap module in shadow mode. The wrap calls Axiru with the request, logs the verdict, then ignores it and calls Stripe directly the same way the code did before. Compare logs over 5 to 10 days; any 'would have been DENIED' calls are reviewed manually before moving on.

Stage 2: enable enforcement on one low-volume surface (often Connect application-fee refunds, because they are infrequent and the policy is simple). Verify the production path for APPROVE, ESCALATE, and DENY with synthetic and real traffic.

Stage 3: enable enforcement on the high-volume surface (usually charge refunds) behind a feature flag. Ramp from 10% to 100% over 3 to 5 days, watching the four metrics above.

Stage 4: enable enforcement on the remaining surfaces (payouts, transfers, dispute responses) one at a time. Each surface is its own ramp; do not bundle.

Rollback

The wrap module reads an environment variable (AXIRU_ENFORCEMENT_MODE) at request time. Values: off (call Stripe directly, no decision call), shadow (call Axiru, log verdict, ignore, call Stripe), enforce (full policy path). Toggling from enforce back to shadow or off is one environment-variable change, no deploy needed.

This is the rollback story for an incident: kill switch is off, investigate, fix policy or fix code, restore to shadow first, then to enforce. The decision ledger holds every decision the system would have made or did make during the incident window; nothing is lost.

Verify the rollback path before stage 3. The first time you test a rollback should not be the first time you need one.

What integration does not require

No changes to your Stripe schema. No Stripe webhook subscriber changes (Axiru subscribes to Stripe webhooks independently; you do not need to fan webhooks to Axiru). No changes to Clerk, your auth system, or your database schema on the application side. No Prisma migrations.

If a vendor pitch tells you a refund-governance integration needs schema changes on either Stripe or your service's primary tables, ask why. The integration shape above is intentionally light because the policy layer is decoupled from the data model; that decoupling is the design intent and it is also the reason rollback is cheap.