The work
We build agentic systems for serious workflows — billing, intake, ops. You will own the agent layer: tool design, evals, prompt regression, latency budgets, fallback flows.
You should have already deployed something agentic to real users (not a demo notebook).
What we're looking for
- Production experience with Anthropic / OpenAI tool-use
- Strong evaluation discipline (golden sets, regression suites) - Comfortable with retries, idempotency, and partial-failure handling - TypeScript + Python both fluent - Bonus: published an eval framework or agent post-mortem