Back to Careers
Engineering

Founding AI Engineer, Recent Graduate

Early-career founding AI role for a recent graduate or engineer with roughly two years of experience who has already shipped AI systems to real users and wants to grow into senior ownership across agents, retrieval, evals, observability, and document or voice pipelines.

San Francisco, CA (Onsite)
Full-time

About Matterhaul

Matterhaul is building the AI-native operating system for the physical goods supply chain.

Distributors and manufacturers run on disconnected systems, manual re-entry, and tribal knowledge that never gets captured. The software that was supposed to fix that never did. We're changing that.

Matterhaul sits above the systems these businesses already run — unifying their data, capturing the operational context legacy software misses, and deploying AI agents across quoting, order entry, procurement, dispatch, and customer updates. No rip and replace. Teams go live fast, and Matterhaul expands until it becomes the system the business runs on.

That's the wedge. The vision is bigger: a purpose-built, AI-native platform that doesn't just automate what ERPs do today — it does what they were never capable of.

We're a small team with deep roots in this space. Our founders grew up in the trades and spent careers building products for the physical world at Stripe, Verkada, and Cisco Meraki. We're based in San Francisco's SOMA/Transbay neighborhood, in-office four days a week, and we spend real time with the distributors and operators we build for.

We move fast, ship often, and build for the people who actually do the work.

Why now

Three things are true at once and rarely line up:

  • AI has stopped being an experiment. A year ago this was a pilot conversation. Today, executives are being asked why they aren't deploying AI in operations.
  • Legacy ERPs are no longer defensible. The customers know it. Their leadership teams are openly looking for a way out.
  • The supply-chain trauma of the last five years is fresh. The companies that move physical goods lived through it in a way software companies didn't. They want visibility and automation, and they aren't going back.

The window is open. Windows like this don't stay open.

Why this role exists

The AI is the product. Quoting, order intake, procurement triage, dispatch — every workflow we ship leans on a model doing real work, against messy distributor data, with real money on the line.

The AI engineering surface area is bigger than a senior + staff team can chew through: new evals to write for every agent, retrieval to tune per customer, document pipelines to harden, prompt regressions to chase down, voice flows to iterate on, model routing to refine. We want a strong young engineer who has already shipped AI to production during school and an early job, and is hungry to operate at the frontier alongside senior people who've been doing this for years.

You will not be making coffee. You will own real surfaces. You will also be mentored deliberately — code review, design review, eval review, on-call shadowing — because we want this role to compound into a senior AI engineer here in 2–3 years.

What you'll own

Year one, concretely:

  • Agent surfaces — Pick up ownership of one or two existing agents (e.g. quote intake, supplier matching, call summarization) end-to-end: prompts, tools, evals, observability, on-call. Ship improvements weekly.
  • Evals — Write the golden sets, the regression suites, the LLM-as-judge rubrics. This is where you'll have outsized impact early; the team that has good evals wins.
  • Retrieval tuning — Chunking, embeddings, reranking, query rewriting on real customer data. You'll see the difference good retrieval makes the first week.
  • Prompt and tool iteration — Tight loops. Many small experiments. You'll learn fast which changes move metrics and which just feel better.
  • Document & voice pipelines — Help own the extraction pipelines that turn supplier PDFs, scanned quotes, and call transcripts into structured facts the agents act on.
  • Observability — Add the traces, dashboards, and alerts that catch silent regressions. If a model degraded yesterday, you should be the one who notices today.
  • Research triage — When a new model or technique drops, you'll be one of the people who runs it against our evals and writes up whether it's worth adopting.

You will write a lot of code, do a lot of experiments, and learn the difference between "the eval went up" and "the product got better" the same way the rest of us did: shipping it.

What we're looking for

Must have:

  • Bachelor's or Master's from a top engineering program (CMU, MIT, Stanford, Berkeley, Waterloo, Georgia Tech, UIUC, UW, UMich, UT Austin, Cornell, Princeton, Caltech, Harvey Mudd, or international equivalent). Strong CS fundamentals.
  • ~2 years of post-grad engineering experience, or equivalent serious work during school (research lab, AI-heavy internship, founded something that shipped).
  • 12+ months building and running AI systems in production — even if "production" was a research deployment, an internship project that real users hit, or a startup you built. We want to hear about something real users used, not a class project.
  • Solid software engineering chops. You write tests. You read other people's code. You can debug a failing service, not just a failing prompt.
  • Comfort with the modern LLM stack — at least one of Anthropic, OpenAI, or Google APIs at depth, tool use, structured outputs, streaming. You know what a token costs.
  • TypeScript or Python proficiency. Our AI surface is largely TypeScript; Python is welcome for evals and experimentation. You'll pick up the other one here.
  • You've built something with retrieval. Even if it was naive RAG. You've felt why naive RAG breaks.
  • You've built something with an agent loop. Tool calls, multi-step plans, something that wasn't a single-turn chatbot.
  • Hunger to operate. When the model misbehaves at 9pm, you want to know why, not file a ticket.

Nice to have:

  • Open-source contributions to AI/ML repos (LangChain, LlamaIndex, vLLM, transformers, dspy, etc.) or your own published projects.
  • Eval frameworks you've built or used seriously (not just "I ran some prompts and eyeballed the outputs").
  • Research experience: a published paper, a strong undergrad/grad thesis, or a research-lab role. We don't require it but we read it carefully.
  • Fine-tuning, LoRA, distillation experience even at a small scale.
  • Hackathon wins, AI side-projects with real users, AI-heavy internships at a company that shipped AI to customers.
  • Familiarity with structured output / function calling at depth, prompt caching, speculative decoding, or other techniques beyond the basics.

What we're not looking for:

  • Five years of "ML engineer" titles where the work was tuning XGBoost on tabular data. This role is about LLM-driven systems in production.
  • Pure prompt-engineering portfolios with no software engineering underneath.

How we work

  • Small founding team — under ten people — building the system distributor sales, procurement, and dispatch teams will run on. You'll be the most junior person on a team of senior engineers, by design — the ramp is steep and the mentorship is real.
  • San Francisco — 4 days a week in the office. This matters more for early-career engineers, not less. Whiteboards, pair-debugging, and the kind of feedback that's hard to type happen in person.
  • We ship to production frequently and trust each other to do it. You'll be on-call for the agents you own, with backup, within a few months.
  • We write specs (/specs) and architecture docs (AGENTS.md per directory) before big changes. We expect the same of you.

Compensation

  • Base: $150,000 – $185,000, depending on experience.
  • Equity: 0.1% – 0.3%. Early enough to matter.
  • Health / dental / vision; 401(k); commuter benefits.
  • Hardware + AI coding / API budget and a real office in SF.

Apply

Email hiring@matterhaul.com with a short note on an AI system you've shipped that real users (not just classmates) used, and one thing about it that surprised you when it hit production. A link to code, a demo, or a writeup beats a resume. Send both if you've got them.

Ready to apply?

We'd love to hear from you. Send us your resume, LinkedIn, and a note to: hiring@matterhaul.com or click the button below.

Apply Now