ENGINEERING

5 Years, 7 Engineers, No Cloud: How We Build Cancer Diagnostics on Tinyboxes

March 25, 2026 · 9 min read

Valar Labs builds AI that predicts how a patient's cancer will respond to treatment. Our diagnostics analyze routine pathology slides — the same tissue samples already collected during standard care — and generate predictions validated across thousands of patients in peer-reviewed clinical studies. Physicians use our tests today to make real treatment decisions for bladder, prostate, and pancreatic cancer.

The engineering team behind all of this is seven people.

We don't run inference in the cloud. Our models run on local hardware in our own facility. We built our own whole-slide image viewer, our own lab operating system, our own clinical data interrogation system. We own the stack from the glass slide to the clinical report.

This isn't a manifesto about why everyone should work this way. It's a technical account of how we arrived here over five years and what the architecture actually looks like. But we think what's happening here is not unique to pathology. The gap between teams that deeply think about how they build and teams that don't is growing — by orders of magnitude — regardless of company size, stage, or headcount. The companies that look disproportionately small for what they ship are not getting lucky. They're building differently.

Why Local Compute

Building a diagnostic is not a single training run. It's hundreds of them. Between different data mixes, annotation strategy adjustments, architecture experiments, and multi-site validation, each iteration of a diagnostic might take 300-500 training runs before it's ready for clinical validation. Any individual run would cost upwards of ten thousand dollars in cloud compute — and that's after years of investment in optimizing our training pipeline and model architectures. Multiply by hundreds of runs per diagnostic, across multiple cancer types, and cloud economics don't break down at the margin. They break down fundamentally.

So we bought hardware. First a workstation we specced and built ourselves. Then Lambda Labs machines when we needed more capacity. And most recently, tinyboxes — which for pure price-to-performance on this kind of workload are hard to beat right now. The fleet isn't uniform and we're not religious about vendor — we buy whatever works best for the workload at the time.

The conventional wisdom in ML infrastructure is that data goes where the compute is. You upload your dataset to the cloud, spin up instances, and train. In computational pathology, this logic inverts. Whole-slide images are gigapixel scans, typically 1-3 GB each as compressed images — uncompressed, a single slide can exceed 50 GB. A training dataset for a single cancer type is thousands of slides.

When your data is this heavy, the compute must come to where the data is — not the other way around.

What we didn't build is the platform layer everyone assumes you need.

There's no MLOps stack. No experiment tracking service. No managed orchestration layer. We're a team of seven. There are no separate "researchers" and "engineers" — everyone understands the ML and the software, and anyone can work on most parts of the stack. You SSH into the cluster from your laptop, and you're training.

For larger inference jobs that need to fan out across the full fleet, we run a lightweight local Kubernetes setup. The orchestration code that manages it is a single Python CLI file. It assigns work to GPUs using weighted round-robin based on each machine's compute profile, creates pods, and tracks completion by checking which output files exist on disk. Job state is a directory on the filesystem. Pausing a job writes a file. Resuming reads the original work list, diffs against what's done, and relaunches the rest. Loss curves live in TensorBoard. When experiment data piles up, we clean it out.

If a machine goes down, someone walks over and reboots it. Everyone sits in the same room. There's no support ticket, no auto-scaling event, no waiting for cloud capacity to come back online. The failure mode is physical and the recovery is immediate.

This sounds reckless by the standards of modern ML infrastructure. It isn't. It is the result of thinking carefully about what actually slows down the cycle of building a diagnostic: not the tooling around the training, but the training itself — the quality of the data, the architecture decisions, the clinical validation design. We invested the time we would have spent on MLOps into the actual ML. Every hour not spent configuring a managed service is an hour spent improving a model that will inform real treatment decisions.

The result is a seven-person team that ships validated, peer-reviewed clinical AI diagnostics across multiple cancer types — and works harder than anybody to do it. Our competitors in this space employ engineering teams many times our size. We think this comes down to infrastructure philosophy as much as effort. When you own the hardware, own the data pipeline, strip away every layer of abstraction between the person with the idea and the GPU that runs it, and hire people who don't need to be told what to do next — the entire cycle compresses. An idea becomes a training run in minutes, not hours. A training run becomes a result the same day, not next week. And over hundreds of iterations, that compression is the difference between shipping a diagnostic and still building the platform to maybe someday develop one.

Why We Build Everything

The standard advice for startups is: don't build what you can buy. Focus on your core differentiation and outsource everything else. We tried. Early on, we bought a SaaS annotation platform for labeling pathology slides. It was expensive, slow, and produced outputs we couldn't trust. The tool treated annotation as a step in a pipeline — something that happens before the real work. We needed it to be part of the real work.

What we got right was how quickly we moved away from it.

In general, what matters in engineering is not the quality of the initial design but the speed of iteration. We've found this to be true at every level: in the models, in the infrastructure, and in the tools we build for ourselves. Our problem space is unique enough that almost nothing off the shelf fits without extensive modification, and maintaining a fork of someone else's software is worse than owning your own. So we build, ship, learn what's wrong, and rebuild — fast.

Tulkas, our whole-slide image viewer, started as a replacement for that SaaS annotation tool and evolved into something much more. A single pathology slide is a gigapixel image. Viewing it smoothly is hard enough. Overlaying hundreds of thousands of AI-generated cell detections and tissue segmentations at full resolution — with real-time editing, at 60 frames per second — is a different class of problem. We built a WebGPU rendering engine in Rust compiled to WebAssembly, with a custom TIFF decoder for whole-slide formats, GPU-tessellated geometry rendering, and our own serialization formats optimized for the spatial data structures our models produce. A pathologist annotating training data and an engineer debugging model behavior use the same tool. This matters more than it sounds. When ML engineers are pulled away from the annotation process — when they don't see the data they're training on, rendered the way the model will see it — they stop questioning the data. That was an important failure mode for us early on.

Tulkas exists to keep the entire team close to the tissue.

Anta, our lab operating system, exists because running a CLIA-certified diagnostic laboratory involves a long tail of operational workflows that generic LIMS software handles in the most generic and expensive way possible. We started with very little automated. Now, the lab team handles all our orders with far fewer people than you'd expect. The system ingests faxes and classifies them with AI — segmenting multi-page documents into pathology reports, requisition forms, insurance facesheets, clinical notes. It extracts structured data from each: patient demographics, insurance details, diagnosis codes, specimen metadata.

It verifies insurance eligibility, submits claims, tracks denials, manages appeals, and handles the entire case lifecycle from order receipt to report delivery. But the most important thing a system like this can do well is know when to get a human involved. The automation handles the routine; when something is ambiguous, unusual, or high-stakes, the system flags it and gets out of the way. The goal is the same goal we have for engineering broadly: automate every routine task so that people spend their time only on real problems.

Aule, our clinical data system, is the newest. As we expand across cancer types, the volume of clinical data our team needs to interrogate grows faster than the team does. How many patients do we have for a given indication? What does the breakdown look like across subgroups? What do survival trends look like when we cut the data this way? Before Aule, every question like this meant a Slack message to someone who knows R, a wait, a result, a follow-up, another wait — the same back-and-forth repeating dozens of times a week across the organization. Aule is a thin wrapper around a coding model — running in a sandbox — with access to our de-identified clinical datasets and analysis scripts. Anyone on the team can ask a question in Slack and get a survival curve back in minutes. If Tulkas exists to keep the team close to the tissue, Aule exists to keep the team close to the data. It shipped a few weeks ago.

The thread connecting these is not just that we enjoy building things — though we do. It's that in a company this small, every seam between systems is a tax someone pays every day. When you have seventy engineers, the cost of bridging three vendor tools with glue code is absorbed across the organization. When you have seven, that same integration work crowds out the mission. Owning the full stack eliminates that overhead. And because everyone works across the stack, improvements propagate naturally. A change to how Tulkas handles a data format can immediately inform features for the next file-type standard we want to build. When the same people build the viewer, the lab system, and the models, the system evolves as one.

We're aware this is an unusual set of choices. Most AI companies in our space buy a slide viewer, contract out lab operations software, and wouldn't think to build a clinical data interrogation system. We think that's part of why most of them have much larger teams and still ship less.

What Makes This Work

We didn't start here. Five years ago, this was one person and a laptop. The team grew slowly — one, then two, then four, then seven, where it's been for a while. Each system we built compounded on the ones before it. The annotation tool informed the training pipeline. The training pipeline shaped the inference infrastructure. The inference infrastructure determined what the lab system needed to handle. None of this was planned as an architecture. It accreted, one problem at a time, because the next bottleneck was always obvious and the team was small enough to just go fix it.

There's a pattern in how we make these decisions that's worth naming. When we hit a friction point — something that's slow, expensive, or fragile — we ask whether it's a problem we'll hit once or a problem we'll hit every week. If it's every week, we build. If it's once, we work around it. This filter is why we have a custom slide viewer but don't have a custom text editor. It's why we built a lab operating system but use off-the-shelf tools for accounting. The line isn't "build everything" — it's "build anything that sits in the critical path of shipping a diagnostic."

We think the default in our industry is backwards. Most companies start by building the infrastructure they think they'll need, hire the team they think that infrastructure requires, and then try to ship a product through all of it. We started with the product — the clinical question, the patient outcome — and built only what stood between us and answering it.

The infrastructure is a byproduct of shipping, not a prerequisite.

This is not a pitch for minimalism. Our systems are not minimal — Tulkas is a WebGPU rendering engine, Anta manages thousands of cases, the training infrastructure processes terabytes of imaging data. But each of them exists because a specific problem demanded it, not because a roadmap said so. When you let the problem pull the engineering forward instead of pushing the engineering ahead of the problem, you end up with a small team that builds a surprising amount — because nothing they built was wasted.

We're seven engineers. We build cancer diagnostics validated in peer-reviewed clinical studies across multiple cancer types. We built every major system in the stack. We don't think this is the only way to do it. But we think more teams could work this way if they were willing to own the hardware, own the tools, and trust that a small group of people who understand the full problem can move faster than a large group that's been told to stay in their lane.

ENGINEERING

5 Years, 7 Engineers, No Cloud: How We Build Cancer Diagnostics on Tinyboxes

Why Local Compute

Why We Build Everything

What Makes This Work

More from the Blog

Announcing Our $22 Million Series A to Accelerate AI in Precision Oncology

Valar Labs: Reimagining Cancer Treatment Decisions with Artificial Intelligence