QPillars LogoQPillars
SolutionsSiLA 2 StudioCase StudiesAboutBlogCareersContact
Book a Demo
Back to Blog
Engineering

Agentic AI for Lab Automation: Why a Lab Instrument Is Not Just Another Tool

July 4, 202611 min readIacob Marian

Agentic AI for Lab Automation: Why a Lab Instrument Is Not Just Another Tool

Agentic AI for lab automation fails at a boundary that ordinary tool-using agents never reach: the agent-to-instrument edge, where a single action moves real liquid, consumes a real sample, or heats a real reactor. Protocols like the Model Context Protocol (MCP) standardize how an agent calls a software tool, but a lab instrument is a stateful, safety-critical, physically embodied resource - not a stateless function you can retry for free. The teams that make agents work on the bench are the ones that treat that edge as its own engineering problem.

The money and the narrative in autonomous science are already here. Lila Sciences raised a combined $550M to build "AI Science Factories," Periodic Labs took a $300M seed, and MIT Technology Review's December 2025 assessment was blunt: AI-driven discovery must move into the real world. Moving into the real world means one thing technically - letting agents operate physical instruments reliably. This post is about the layer where that actually happens.

Key Takeaways

  • Agentic AI for lab automation has three edges - agent-to-tool, agent-to-agent, and agent-to-instrument. The first two have protocols. The third is where most projects quietly fail.
  • A lab instrument is not a tool. It is stateful, exclusively locked, safety-critical, and physically irreversible. Modeling it as a stateless function is the root cause of unsafe or brittle agent behavior.
  • The reliability problem is a software problem, not a model problem. A bigger model does not fix an agent that can drive hardware directly from free-text output.
  • MCP and SiLA 2 are complementary, not competing - MCP exposes capability to the agent, SiLA 2 defines deterministic instrument control. Neither, on its own, closes the safety gap.
  • The pattern that works: propose, validate, then actuate. Natural language never drives hardware directly; every physical action passes through a validated, structured, reviewable call.
  • Evaluation comes before autonomy. You earn each higher level of autonomy by measuring the agent against the physical world, not by trusting the demo.

The Three Edges of an Agentic Lab System

An agent that does real work touches three different kinds of counterpart, and each is a distinct engineering edge. A 2025 survey of agent interoperability protocols frames the ecosystem this way, and a June 2026 paper, LAP: An Agent-to-Instrument Protocol for Autonomous Science, sharpens it into the distinction that matters for the lab:

  • The agent-to-tool edge. The agent calls a software function - a database query, a calculation, a web search. MCP standardizes this edge. Tool calls are stateless and cheap to retry.
  • The agent-to-agent edge. The agent delegates to another agent. Protocols like Google's A2A standardize this edge.
  • The agent-to-instrument edge. The agent commands a physical machine. As the LAP authors put it, MCP standardizes the agent-to-tool edge and A2A the agent-to-agent edge, "but neither models the agent-to-instrument edge, where operations are stateful, safety-critical, exclusively locked, and physically embodied."

Diagram of the three edges of an agentic lab system - agent-to-tool via MCP, agent-to-agent via A2A, and the unsolved agent-to-instrument edge to a physical instrument

The diagram above shows the same reasoning agent facing all three edges at once. The first two are well-trodden - the archetypal agent ChemCrow wired GPT-4 to eighteen chemistry tools back in 2024, and the field has since shifted, as one Frontiers review describes, "from single-shot prompting toward agentic chains that decompose user goals into sequences of tool calls." The third edge is where the demos stop and the engineering starts, because every team that reaches it rebuilds the link between the reasoning agent and the physical instrument from scratch.

Why a Lab Instrument Is Not a Tool

The difference is not cosmetic. Four properties separate an instrument from a tool call, and each one breaks an assumption that agent frameworks are built on.

It is stateful. A liquid handler has tips loaded or not, a deck configured or not, a reservation held or not. The same command succeeds or destroys a run depending on physical state the model cannot see in its context window.

It is exclusive. Two agents cannot pipette into the same well at the same time. Instruments need reservation and locking - a concept absent from stateless tool protocols, where parallel calls are free.

It is safety-critical. A wrong temperature, a wrong aspiration volume, a collision - these damage hardware, ruin samples, or endanger an operator. There is no free retry.

It is physically irreversible. You cannot un-consume a sample or un-mix a reagent. Where a failed tool call costs a few tokens, a failed instrument call can cost a day of wet-lab work and an irreplaceable specimen.

Any one of these would justify a dedicated control layer. Together, they mean an agent that treats an instrument as "just another tool" is not simplified - it is unsafe. This is the same lesson that comes from years of building instrument control for clinical-grade diagnostic platforms: the hard part was never the happy path, it was the state, the interlocks, and the failure modes.

What Actually Breaks: Free-Text Output Meets Real Hardware

The failure mode is specific. In a naive agent, the model's output is trusted to drive the next action. That is acceptable when the action is a database query. It is not acceptable when the action moves a robotic arm.

Standards exist for the deterministic half of the problem. SiLA 2, the leading open instrument-connectivity standard, is "based on open, well-established communication protocols and defines a thin domain-specific layer on top of these" - gRPC and Protocol Buffers, with a Feature Definition Language that describes each instrument's typed capabilities. It excels at deterministic orchestration: a workflow engine can enumerate every command an instrument accepts.

But SiLA 2, like OPC-UA and SCPI, was designed for a deterministic client, not a probabilistic agent. As the LAP analysis notes, its schemas "are static and defined at design time; there is no runtime capability negotiation, no mechanism for expressing physical safety limits or hazard classifications per capability." In other words, the standard tells an agent what an instrument can do, but nothing about what is safe to do right now, or how to negotiate that at runtime. That gap - between a machine-readable capability and a safe, situated action - is exactly where an agent needs a control layer, and exactly what a generic tool protocol does not provide.

A Control Layer for Letting Agents Operate Instruments

You do not close this gap with a better prompt. You close it with an architecture that never lets natural language touch hardware directly. Five principles, learned the hard way, define what a control layer for the agent-to-instrument edge has to do. None of them are model-specific.

  1. Actuate only through validated, structured calls. Hardware is never driven from raw model output. The agent's free-text intent is resolved into a typed, schema-checked action, and only that validated action reaches the instrument. The LAP authors state the rule plainly: hardware is "only ever actuated through a validated structured call, never directly from model output."

  2. Propose, then confirm. A free-text goal becomes a structured, human-readable proposal before anything moves. For high-consequence steps, that proposal is a gate - a place for an operator to approve, not a decision buried inside a model's chain of thought. This is the practical form of human-in-the-loop for physical systems.

  3. Reserve and lock. Exclusive access is a first-class primitive. An agent acquires an instrument, holds it for the duration of a step, and releases it - so concurrency never turns into a collision.

  4. Return physically typed results. A measurement is not a string. It carries units, calibration context, and uncertainty, so the agent reasons about a real quantity and not a number that happens to look plausible.

  5. Evaluate before you trust. Every increase in autonomy is earned by measurement against the physical world - success rate, failure modes, recovery behavior - not by a convincing demo. A recent agent running a full loop over an experiment orchestration system through an MCP interface reported a 97% first-attempt success rate across 65 trials; the number is only meaningful because someone defined the trials and counted the failures.

This is the layer QPillars builds: the infrastructure that lets AI agents operate laboratory instruments safely and reliably, sitting between the reasoning agent and the physical machine. The principles above are the shape of the problem. How you implement reservation, validation, and evaluation for a specific instrument fleet is where the real engineering lives - and where a generic agent framework leaves you on your own. For the neighboring standards question, we have written separately on connecting AI agents to instruments with MCP and on building reliable AI agents for lab instruments.

Where You Are on the Autonomy Ladder

None of this requires full autonomy on day one, and it should not. Ginkgo Bioworks published a useful frame - six Levels of Laboratory Autonomy (LoLA), Levels 0 to 5, explicitly modeled on the automotive self-driving scale, and drawing a hard line between automation ("executing pre-defined protocols") and autonomy ("decision-making and adaptation").

Most credible systems today sit at Level 3, "Conditional Autonomy," where, in Ginkgo's words, "human-language AI agents and protocol debugging software allow the transfer of experimental plans to the autonomous lab without human programming." That is a realistic and valuable target: a scientist describes intent in plain language, the agent plans and executes, and a human stays in the loop for the consequential decisions. Getting there does not depend on a smarter model. It depends on the control layer at the agent-to-instrument edge doing its job. We took the same evidence-based view of the broader market in self-driving labs in 2026 - what works vs. what's hype.

Frequently Asked Questions

What is the agent-to-instrument edge?

It is the boundary where an AI agent commands a physical laboratory instrument rather than a software tool. Unlike a tool call, an instrument action is stateful, exclusive, safety-critical, and physically irreversible. It needs a dedicated control layer - reservation, validation, and safety fencing - that generic agent protocols like MCP do not provide on their own.

Can I just use MCP to let an AI agent control a lab instrument?

MCP is the right way to expose an instrument's capabilities to an agent, and it is a strong foundation. But MCP standardizes the agent-to-tool edge; it does not model instrument state, exclusive locking, or per-action safety limits. You still need a control layer that validates every action and never lets free-text model output drive hardware directly.

Is SiLA 2 enough for agentic lab automation?

SiLA 2 is excellent for deterministic instrument control and machine-readable capabilities, and it is the most mature open standard for the job. Its limitation for agents is that its schemas are static and design-time, with no runtime capability negotiation and no per-capability hazard classification. It answers what an instrument can do, not what is safe to do right now - which is what an agent needs.

Why do lab-automation agents fail even when the language model is very capable?

Because reliability at the instrument edge is an architecture problem, not a reasoning problem. A capable model still causes damage if its output can drive hardware directly, if there is no reservation to prevent collisions, or if results are untyped. The fix is a control layer that constrains actuation, not a larger model.

How do you safely keep a human in the loop for a physical AI agent?

Turn the agent's free-text intent into a structured, human-readable proposal before anything moves, and make that proposal an explicit approval gate for high-consequence steps. Approval then acts on a validated action, not on a decision hidden inside the model's reasoning - which is what makes the human-in-the-loop meaningful rather than cosmetic.


Written by Iacob Marian, Technical Lead and Co-founder at QPillars, where he builds the infrastructure that lets AI agents operate laboratory instruments safely and reliably. Published 2026-07-04.

Iacob Marian

Technical Lead & Co-founder at QPillars

Iacob builds intelligent software infrastructure for life sciences laboratories, with a focus on Rust for instrument control and agentic AI for lab automation.

Full profileLinkedInPublished July 4, 2026
agentic AI for lab automationAI agents lab instrumentsagent-to-instrument edgeMCP lab automationSiLA 2instrument control software

Related Articles

Engineering

How to Build a Lab Automation Orchestration Platform

Jun 20, 2026

Engineering

How to Build Reliable AI Agents for Lab Instruments

Jun 15, 2026

Engineering

AI Predictive Maintenance for Lab Instruments: From Reactive Alerts to Proactive Agents

May 29, 2026

QPillars LogoQPillars

Instrument software for the AI era

Agentic AI

  • Agentic AI for Instruments
  • MCP Servers for Lab Instruments
  • Agentic AI for Lab Automation

Instrument Software

  • Instrument Software
  • Instrument Control Software
  • Lab Automation Software
  • Instrument Cloud Platforms
  • Lab Software Modernization
  • Lab Systems Integration
  • SiLA 2 Studio

Company

  • About
  • Case Studies
  • Blog
  • Careers
  • Contact

Offices

Zurich, Switzerland

Chisinau, Moldova

© 2024-2026 QPillars GmbH. All rights reserved.

info@qpillars.com+41 78 262 97 97