StellarGlyph

Not all AI projects should be solved in the same way. A refund approval process should not become an open-ended reasoning problem. A research assistant that has to decide which sources to inspect will perform better if it's given more freedom and not forced through a 12-step pipeline. These are different engineering problems and treating them as the same thing is where many AI products become unreliable.

Workflows give us dependable execution where the process is known. Agents give us adaptive (or fuzzy) behaviour where the task cannot be fully specified in advance. What we found works best more on real-world problem is to use both: workflow orchestration for control, auditability, retries, state, and hand-offs; agents for ambiguous decisions, tool selection, dealing with differing inputs and higher level synthesis.

LangChain's LangGraph documentation makes the same architectural split. It describes workflows as systems with predetermined code paths that operate in a set order, while agents define their own process and tool use at runtime. Standalone agents solve problems through an internal reasoning loop, while agentic workflows externalise control flow across tools, services, agents, and human steps.

What Workflows Are Good For

A workflow is a directed flow of tasks (which can be represented as nodes and edges). It has named steps, explicit transitions, and a clear state model. Each step can fail, retry, time out, or hand control to a person (referred to as a Human in the Loop). That makes workflows a good fit for deterministic processes where the organisation already knows what should happen.

Typical examples include invoice ingestion, account provisioning, stock or logistics updates, customer onboarding, returns authorisation, and content approval. The details vary, but the engineering requirement is the same. We need to know which step ran, what it received, what it returned, who approved it, and what happens next.

This is where workflows earn their place. They make control flow inspectable and more easily observable. A failed API call does not disappear inside a model transcript. A retry policy can be applied to the failing node. A human approval step can be inserted before money moves or data leaves the organisation. Agentic workflows on the other hand are processes with loops, decision nodes, shared working memory and the ability to autonomously decide on tool selection and pathways.

A workflow can still contain AI. It can call a large language model (LLM) to classify a request, extract fields, draft a response, or score a document. The important point is that the model call sits inside an explicit process.

A simplified refund workflow might look like this:

workflow: refund_request
state:
  order_id: string
  customer_id: string
  refund_reason: string
  risk_score: number
  decision: enum[approve, review, reject]
 
steps:
  - id: extract_fields
    type: llm_structured_output
    schema: RefundRequest
 
  - id: fetch_order
    type: api_call
    endpoint: orders.get
 
  - id: score_risk
    type: deterministic_rule
    ruleset: refund_risk_v3
 
  - id: route_decision
    type: branch
    branches:
      approve: risk_score < 0.3
      review: risk_score >= 0.3 and risk_score < 0.7
      reject: risk_score >= 0.7
 
  - id: human_review
    type: manual_task
    when: decision == review

There is nothing exotic here. The AI step extracts structured data. The API step retrieves the order. The rule step applies policy. The branch controls risk. The human step handles the grey area. This is the right shape for a process where correctness, traceability, and business rules matter.

What Agents Are Good For

An agent is useful when the next action depends on context that we cannot reduce cleanly into predefined branches. The agent receives a goal, has access to tools, and decides which tool to call next. It works through a feedback loop: observe, decide, act, observe again.

LangGraph's documentation describes agents as LLMs performing actions with tools, operating in feedback loops, and making decisions about tool use and problem solving. It also notes that the available toolset and behavioural guidelines can still be defined by the developer. An agent with boundaries is an engineered component. You can leave it to run autonomously and most importantly safely. Without boundaries, it's just a matter of time before it does something unexpected...like deleting a database because it misunderstood an instruction to clean up a table.

Agents fit tasks such as research, support investigation, codebase exploration, sales account preparation, incident triage, and data quality investigation. In these cases, we often know the goal, but we do not know the path or we know the goal but the interface with the real world is varied (for example different formats of documents or file types to handle).

For example, a support investigation agent might have access to these tools:

TOOLS = [
    search_docs,
    get_customer_account,
    get_recent_orders,
    inspect_delivery_events,
    create_support_summary,
]

The user asks: "Why has this customer's replacement order not arrived?"

A fixed workflow can fetch the order and show the latest delivery event. An agent can decide to inspect the original order, compare replacement shipment dates, check whether the carrier event stalled, search the internal policy for replacement SLA rules, and produce a concise case summary for a human operator.

That flexibility is the point. It is also the risk. Agents can choose the wrong tool, loop unnecessarily, over-read context, or produce an answer that looks plausible and hides a missing check. This is why agents should rarely own the whole production process. Also while it's doing all of this, Agents can burn through an order of magnitude more tokens than workflows.

The False Choice Between Workflows And Agents

The common mistake is to ask, "Should we build this as a workflow or as an agent?" For production systems, the better question is, "Which parts need fixed control, and which parts need adaptive reasoning?"

A procurement assistant gives a clear example. The user asks for help buying a replacement laptop for a new starter. Several parts of that process should be deterministic:

The employee exists in the HR system. The role maps to an approved device policy. The budget threshold controls approval. The purchase order requires a cost centre. The supplier API requires a fixed payload. The finance system needs a recorded approval.

Other parts are fuzzy. The user might describe the requirement in vague language. The device policy might include exceptions. The employee might need a non-standard configuration. The assistant might need to compare available stock, delivery dates, and role requirements.

A good architecture uses a workflow as the outer structure. It asks an agent to handle bounded ambiguity inside specific nodes. The workflow remains responsible for state, approvals, logging, retries, and final execution.

workflow mermaid

The agent does not get to create the purchase order directly. It analyses the exception and returns structured output. The workflow decides what to do with that output.

This pattern gives us a useful contract:

{
  "recommendation": "approve_with_manager_review",
  "confidence": 0.82,
  "required_device": "MacBook Pro 14",
  "policy_exception": "role requires local model testing",
  "evidence": [
    "employee role is machine learning engineer",
    "standard device policy permits exception for local GPU workloads"
  ],
  "next_action": "manager_approval"
}

The workflow can validate this object. It can reject missing fields. It can cap low-confidence recommendations. It can require a human for any exception. The agent handles interpretation. The workflow handles authority.

Patterns That Sit Between The Two

More and more, rather than choosing a single methodology to build AI systems, we use both types in the same system. Workflows where we need repeatability and dependability and agents where we need to deal with fuzziness.

LangGraph's documentation lists several workflow patterns that matter in practice. Prompt chaining passes the output of 1 LLM call into the next and works well for tasks that can be broken into smaller verifiable steps. Parallelisation runs independent subtasks at the same time, or runs the same task multiple times to compare outputs. Routing sends an input to a specialised flow based on a classification step.

Those patterns are workflow-first. They use models, but they constrain the shape of execution.

The Orchestrator-Worker pattern adds more flexibility. The Orchestrator that breaks down a task, delegates subtasks to workers (SubAgents), and synthesises their outputs. It also notes that this is useful when subtasks cannot be predefined in the same way as simple parallelisation.

This is a common shape for research assistants, document analysis, and code modification tasks. Depending on how you set this up, you can get massive performance improvements as your tasks can be parallelised.

The Evaluator-Optimiser pattern is another useful bridge. One model generates an output, another evaluates it, and the loop continues until the output meets defined criteria or hits a limit. This is a good fit for tasks with clear success criteria where iteration is required.

These patterns show why "agent" is too broad as a design label. A system can be dynamic without giving one agent total control. We can add model-driven routing, planning, evaluation, and synthesis while keeping the surrounding process explicit.

Put Control Outside The Agent

The safest default is to put the workflow around the agent, not inside it. The agent should be a component with a narrow responsibility, a typed input, a typed output, an allowed list of tools and a budget.

A production agent node should usually have:

A goal expressed in a short system instruction.
A strict tool allowlist.
A maximum number of turns.
A structured output schema.
A confidence or evidence field where relevant.
A clear failure response.
Logs for each tool call.
A surrounding workflow that decides what authority the result has.

For example:

class InvestigationResult(BaseModel):
    status: Literal["resolved", "needs_human", "insufficient_data"]
    summary: str
    evidence_ids: list[str]
    recommended_action: Literal[
        "reply_to_customer",
        "refund",
        "escalate",
        "request_more_info"
    ]
    confidence: float

The workflow can then enforce policy:

if result.confidence < 0.75:
    route = "human_review"
elif result.recommended_action == "refund":
    route = "manager_approval"
else:
    route = "draft_customer_reply"

This keeps the agent useful without giving it authority it should not have. The agent can investigate. The workflow can approve, reject, escalate, or ask for missing information.

By their very nature workflows are more traceable and audit-friendly, with explicit control flow, checkpoints, timeouts, retries, and human sign-offs. Agents are more adaptive but this comes at the cost of being less predictable.

How To Decide The Boundary

When we design these systems with clients, we start by mapping the process before choosing the AI pattern. The key is to classify each part of the job by the kind of uncertainty it contains.

If the processing pathway and outcomes are very clear use a sequence of steps (workflow). The system needs to know whether an API failed, whether an approval was given, whether a record exists, or whether a payment has settled.

If the uncertainty is semantic, use an agent or an LLM step. The system needs to understand intent, compare evidence, decide what to inspect, or turn messy context into a structured judgement.

If the uncertainty is policy-related, keep a human or deterministic rule in the loop. The model can prepare evidence, but the workflow should hold the decision.

A practical decision map helps:

 
Process Characteristic -> Better Default
Fixed sequence of steps -> Workflow
Regulated approval -> Workflow with human review
Tool choice depends on context -> Agent inside workflow
Output must match a schema -> Workflow with structured LLM call
Unknown number of subtasks -> Orchestrator-worker pattern
Need retries, audit logs, and timeouts -> Workflow
Need exploratory investigation -> Bounded agent
Need final sign-off or authoritisation -> Workflow-controlled decision

This shouldn't be considered a maturity ladder where workflows are primitive and agents are advanced. They solve different problems. A dependable AI product will contain both.

What To Test Before Shipping

Testing an AI workflow requires more than prompt evaluation. We need to test the process contract.

For workflows, test branch coverage, retry behaviour, timeout handling, idempotency, schema validation, and human hand-off. A workflow that creates duplicate tickets on retry has a conventional engineering bug. The presence of an LLM does not make that bug special.

For agents, test tool selection, stopping conditions, evidence quality, refusal behaviour, and output validity. Feed the agent cases where the right answer is "insufficient data". Feed it cases where 2 tools return conflicting information. Feed it cases where the user asks for an action outside the agent's authority.

The most useful tests assert behaviour at the boundary:

def test_low_confidence_refund_routes_to_review():
    result = InvestigationResult(
        status="resolved",
        summary="Carrier event is ambiguous.",
        evidence_ids=["ship_123"],
        recommended_action="refund",
        confidence=0.61,
    )
 
    assert route_result(result) == "human_review"

This is where dependable AI products improve. The model can vary. The boundary remains enforceable.

The Product Shape That Works

The best AI systems do not ask agents to be dependable in ways that agents are not designed to be. They use workflows for dependable process execution and agents for bounded reasoning inside that process. Product owners can see the business process. Engineers can observe and test each step. Compliance teams can inspect approvals and logs. Users still get flexibility where the task genuinely requires interpretation.

If you already have a process map, the next step is to mark which nodes are deterministic, which nodes are semantic, and which nodes carry authority. That map will usually tell you where the workflow ends, where the agent starts, and where a human still belongs.

Workflows Vs Agents