Case Study · 005Platform rescue · handoff complete

The CTO left. Three AI agents were down. We brought them back.

Client: Arkeo AI
Sector: AI SaaS · Sales tech
Engagement: Platform rescue
Duration: Q? 2024 — handoff completeTBD
Team size: TBD StreaversTBD
Agents restored: 3 of 3 core agents

3Core AI agents diagnosed and restored

3Months from broken to stable, documented

0Documentation handed over by the prior team

TBDUptime / onboarding-completion gainTBD

01An agentic platform without the people who built it.

The product was AI agents. Three of them stopped working at once.

Product: Agentic platform for SaaS sales
Core agents: ICP · Deal Notes · Pipeline
Stack at handover: Go · Python · TypeScript · Vue
Docs at handover: Effectively none
Engagement model: Fixed-fee · milestone-based

Arkeo AI is an agentic platform that turns SaaS sales workflows into proactive, automated motion. Its product is built on AI agents — Ideal Customer Profile, Deal Notes, and Pipeline Analysis — that surface the right next move for revenue teams without a human in the middle.

When Arkeo's CTO and backend engineer departed within a short window, those three agents went dark, the frontend started failing in onboarding, and the people who understood how any of it worked were gone. There was almost no documentation to work from.

Arkeo's choice was narrow: find a partner who could walk into an unfamiliar, undocumented codebase and bring it back online — fast — or watch customer trust erode while the platform sat dark.

02The Challenge

Four problems that arrived together.

A rescue is rarely a single problem. Arkeo arrived with four — stacked, simultaneous, and already on fire. The order in which we tackled them defined whether the platform came back at all.

The people who understood the system were gone.

The CTO and the backend engineer who held most of the architectural context departed within a short window. There was no handover, no architecture document, no on-call runbook. The knowledge walked out the door, and the platform was already starting to misbehave.

Team Departure

Three production agents, all dark.

All three core agents — ICP, Deal Notes, Pipeline Analysis — were down in production. Backend idempotency and key-handling regressions on the agent side, plus a scheduler crash, meant customers who tried to onboard saw an empty product.

Broken Platform

Nothing written down, anywhere.

There was no architecture doc, no setup guide, no environment reference, no deployment runbook. Every fix had to begin with reverse-engineering the codebase from scratch — and the platform couldn't wait the months that would normally take.

Undocumented Codebase

The clock was running on customer trust.

Customers were already in onboarding when the agents went down. Every additional day the platform stayed broken was a day of trust eroding for a team that had just lost its technical leadership. The window to stabilize without losing accounts was small.

Operational Stakes

TBD — founder quote requested. The strongest version speaks to the rescue itself: platform down, team gone, foundation rebuilt in three months.

TBDTBDFounder & CEO · Arkeo AI

03Selection

Why Arkeo handed us an unfamiliar codebase.

Arkeo wasn't hiring contractors. It was choosing whoever it trusted with the platform on the worst week of the company's life. Three reasons that choice landed with us.

Rescue of undocumented systems is what we do.

Arkeo didn't need staff augmentation. It needed someone who could absorb an unfamiliar, undocumented system and start making decisions on its behalf inside the first week. Streaver had taken over inherited systems before, on similar timelines, without a long ramp.

We've shipped agentic systems before.

The agents weren't broken in a generic way — they were broken in agent-specific ways: idempotency, key-handling, scheduler failures, retry semantics. Streaver's engineers had shipped agentic systems before and could diagnose those failure modes without a three-week learning curve.

Stabilize and document in under 90 days.

The fix wasn't only to restore the agents. It was to make Arkeo independent of the very situation that triggered the engagement — so the next departure wouldn't break the platform again. Three months: stabilize, document, hand back the keys.

04The Rescue

Three agents. From dark to operational.

The spine of the engagement was simple to state and hard to do: get each of the three core agents back online, with the underlying failure repaired at the source — not patched downstream.

From dark to operational, in twelve weeks

ICP Agent
Offline · backend idempotency failureDown · Day 0
Restored · stable · monitoredLive · Week 4
Deal Notes Agent
Offline · key-handling regressionDown · Day 0
Restored · stable · monitoredLive · Week 4
Pipeline Agent
Offline · scheduler crashDown · Day 0
Restored · stable · monitoredLive · Week 4

+ Image placeholder

Architecture diagram of the agent system, before vs. after the rescue — ICP, Deal Notes, and Pipeline plotted against the scheduler, idempotency layer, and key-handling boundary that took them down. Hand-drawn-feeling vector preferred.

05Decisions

Three calls that defined the rescue.

Restore before redesign.

It was tempting to look at the broken platform and start redesigning it. We resisted that. The first job was to get the three agents back online with as few changes as possible, then earn the right to refactor. Rescue first, opinions later.

ResultICP and Deal Notes agents restored by Week 2. Pipeline Agent restored by Week 4. Zero customer-facing redesign in the rescue phase.

Write the missing documentation.

Half of the engagement was writing the docs that should have existed. Setup guides, agent architecture, scheduler behavior, environment references, deployment steps, on-call runbooks. We wrote them as we worked, not after — so each fix landed with its own paper trail.

PrincipleIn a rescue engagement, the worst outcome is becoming the new single point of failure. We documented relentlessly to make sure that didn't happen.

Harden for independence.

Restoring the agents wasn't enough — they had to survive without us. We hardened the scheduler, added throttling and error recovery on the agent loops, improved logging, and tightened the deploy path so a single bad release couldn't take the platform down again.

06Honest

What's missing. What we won't fake.

Missing · founder quote

The single most valuable addition isn't a number.

We don't have a founder quote on the page, and it's the missing piece. A first-person line from Arkeo's leadership on what the rescue felt like — and what stabilized — would carry more weight than any restored-agent count. We've asked. It's held open here until they sign off.

Qualitative · pre-baseline

We can't quote a before/after that was never measured.

There was no uptime dashboard before we arrived. Saying 'uptime went from X to Y' would be a fabrication. Before/after onboarding completion is qualitative today — agents worked again, customers could onboard again — and we're honest with Arkeo about which metrics still need a baseline before we publish them.

Reconstructed · timeline

The week-by-week is reconstructed, not logged.

The three-month timeline came out of the rescue sequence, not a logged engagement plan. The milestones below (Week 0, 2, 4, 8, 12, 16) are reconstructed for narrative clarity — directionally accurate, not commit-log precise. We'd rather flag that than dress it up.

07Outcomes

From on-fire to ready to scale.

Each outcome below is paired with the baseline we inherited. Where no baseline existed before the engagement, we say so — rather than invent a number for the after.

All three dark3 of 3 restored

Three core agents back online

ICP, Deal Notes, and Pipeline Analysis agents diagnosed, repaired, and back in production — including the backend idempotency, key-handling, and scheduler issues that were the root cause.

Unstable · unmonitored · undocumentedStable · monitored · documented

Platform stabilized end-to-end

Frontend onboarding bugs resolved, scheduler hardened, retry semantics fixed. The platform now ships changes without one bad release taking everything down.

Effectively no docsDocumented · re-onboardable

Foundation documented

Setup, architecture, scheduler, environment, deployment, and on-call docs written as the rescue happened. The platform is no longer one departure away from disaster.

Single-deploy fragilityScheduler · throttling · recovery · logging

Infrastructure hardened

Hardened the agent loops and the deploy path so a single regression can no longer dark the platform. Independence-by-design, not by hope.

Onboarding brokenCustomers onboarding again

Customer trust held

Customers in the middle of onboarding when the agents went dark were unblocked by Week 4. Trust held through the rescue window — no accounts lost to the team-departure event.

Founder operating without a tech teamFounder unblocked · hiring

Founder unblocked to hire

By handoff, Arkeo's CEO had a stable platform, the documentation to onboard new engineers against, and the room to hire deliberately — not in panic.

08The Team

A small team. An unfamiliar codebase. One on-call.

A rescue is a continuity problem first and an engineering problem second. The people below walked into a codebase they had never seen and started making decisions in the first week — by handoff, they had written down enough that the next engineer wouldn't have to.

Fede

Fractional CTO

After the CTO left, I reverse-engineered the architecture in weeks to find core issues and stabilize it, then explored improvements. Product ended in a product-to-service pivot.

How the engagement was structured

Cadence

Daily standups during the rescue window. Weekly review with Arkeo's CEO. On-call held by Streaver through handoff. Cadence relaxed once stability was confirmed.

Communication & IP

Shared Slack channel, joint repo access from day one, and every document landed in Arkeo's workspace so nothing depended on us after handoff.

Pricing

Fixed-fee, milestone-based. Three checkpoints: agents restored, documentation complete, infrastructure hardened. Renewal optional — the engagement was scoped to end.

Trust model

Full IP transfer at handoff. Every credential, every secret, every doc lives with Arkeo. We exit clean by design — that's what makes us a rescue partner instead of a custodian.

Three months, traced

WEEK 00Engagement signed · rescue beginsTook ownership of the codebase, the agent stack, and the on-call window. First job: catalog what's actually broken vs. what only looks broken.

WEEK 02Two of three agents back onlineICP and Deal Notes agents diagnosed and restored. Backend idempotency and key-handling fixed at the source, not patched downstream.

WEEK 04Three of three agents back onlinePipeline Agent restored after fixing the scheduler crash. All three core agents in production again. Onboarding starts working for new customers.

WEEK 08Documentation completeSetup guides, architecture overview, scheduler behavior, environment references, deployment steps, on-call runbooks — all live in Arkeo's workspace.

WEEK 12Infrastructure hardened · handoffThrottling, error recovery, deeper logging, and a safer deploy path landed. The platform survives a bad release without going dark.

WEEK 16Founder shipping independentlyArkeo's CEO is shipping against the new foundation, interviewing engineers against the new docs, and no longer dependent on us for day-to-day operations.

09Stack

Agentic by design. Documented by handoff.

Arkeo's stack at handover was multi-language and multi-runtime by necessity — agent code, customer-facing surfaces, and the data plane each have different needs. We didn't consolidate it. We documented it.

Languages

Gobackend services
Pythonagent loops & ML glue
TypeScriptstrict, across the frontend
Vue.jscustomer-facing surfaces

Data & Infrastructure

AWSprimary cloud
Airbytedata movement & ingest
Postgresprimary datastore

Tooling

Sentryerror tracking & alerts
GitHub ActionsCI & deploy

10What's Next

The foundation was the point.

The rescue is finished. The interesting work — agent expansion, frontend modernization, managed scheduling — is what the rescue unlocked. Three directions Arkeo can now move in without re-running this engagement.

Agent expansion · three to twenty-plus.

Arkeo's product roadmap calls for scaling the agent surface from three to twenty-plus. The hardened scheduler, retry semantics, and documentation make that expansion possible without re-running this rescue. The work we did was the foundation for that scale, not the destination.

Next.js / Vercel migration for the frontend.

The Vue.js frontend is a candidate for a Next.js + Vercel migration once the agent expansion settles. Faster shipping, edge rendering, and a hiring pool that matches the rest of the stack. Modernization where it earns its keep, not for its own sake.

Managed scheduling via Trigger.dev.

The hand-rolled scheduler that broke the Pipeline Agent is a candidate for replacement with a managed scheduler like Trigger.dev. Less code to own, less surface to break, and a clear separation between agent logic and orchestration.

Inherited a broken platform after key engineers left?

We bring the lights back on.

Streaver walks into unfamiliar, undocumented codebases after team departures and stabilizes them — then writes down what should have been written down the first time. Three months, fixed-fee, clean handoff.

Book a rescue call See more case studies

11Continue reading

Featured

Live

AIBuildRegulatedPublic Affairs · Switzerland

DELOS AG

First paying enterprise customers in sixteen weeks

91% precision · 4-agent systemRead the case study