CrewAI: Role-Based Multi-Agent Orchestration in Production

CrewAI bet on a different abstraction than LangGraph. LangGraph models the agent as a state machine and asks you to write the control flow yourself. CrewAI models the agent system as a set of roles collaborating on tasks and hides the loop entirely. In 2024 that looked like a toy. In 2026, after the framework introduced Flows alongside Crews, it became one of the cleanest patterns for the class of agent systems that are genuinely about people-shaped collaboration: a researcher, a writer, a critic, a publisher, each with a clear job and a clear handoff.

This post is how to use it well, when to choose it over LangGraph, and where the abstraction leaks.

The dual-layer architecture

CrewAI today is two cooperating things, not one:

Crews — the reasoning layer. A group of agents (roles) execute a sequence of tasks. The agents share context, can ask each other for help, and an optional manager agent decides who does what.
Flows — the orchestration layer. Event-driven control flow that triggers crews, branches on state, persists across runs, and chains multiple crews together.

Dashed lines are flow kicks crew. The flow is the conductor; the crews are the band. In CrewAI’s enterprise deployments this separation is what makes the framework production-ready — the flow handles the boring deterministic state machine while the crews handle the messy creative reasoning.

Roles, goals, tasks

The basic vocabulary:

from crewai import Agent, Task, Crew, Process

researcher = Agent(
    role="Senior research analyst",
    goal="Find primary sources on {topic} and cite them precisely",
    backstory="Trained librarian who hates secondary sources.",
    tools=[web_search_tool, arxiv_tool],
    llm="anthropic/claude-opus-4-7",
    memory=True,
    verbose=True,
)

writer = Agent(
    role="Technical writer",
    goal="Turn the research brief into a 1200-word article in the house style",
    backstory="Veteran of three engineering blogs. Allergic to fluff.",
    llm="anthropic/claude-opus-4-7",
)

research_task = Task(
    description="Compile a research brief on {topic} with 5+ primary sources.",
    expected_output="A markdown brief with citations.",
    agent=researcher,
)

write_task = Task(
    description="Using the brief, write a 1200-word article.",
    expected_output="A markdown article, no preamble.",
    agent=writer,
    context=[research_task],
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    process=Process.sequential,
)

A few things to notice. backstory is not decoration — it goes into the system prompt and meaningfully shapes the agent’s voice and judgment. context=[research_task] tells the writer task to receive the researcher’s output as input; the framework wires the message-passing. Process.sequential runs the tasks in order. The other built-in is Process.hierarchical, which adds a manager agent that delegates tasks dynamically.

When hierarchical beats sequential

Sequential is fine when the task list is fixed. Hierarchical is the right choice when:

The plan depends on the input (research a topic → if it’s technical, use crew A; if it’s regulatory, use crew B).
There are more workers than steps and you want load-balancing.
The manager should validate and re-route work the first agent did poorly.

crew = Crew(
    agents=[researcher, writer, fact_checker, editor],
    tasks=[research_task, write_task, fact_check_task, edit_task],
    process=Process.hierarchical,
    manager_llm="anthropic/claude-opus-4-7",
)

The manager agent is itself an LLM call per delegation. That’s a real cost — not just tokens, but a layer of indirection that adds variance. Use hierarchical when the routing decision is genuinely non-trivial; don’t use it as a default.

Flows: where the production logic lives

A Crew is a single execution of a multi-agent collaboration. A Flow is the workflow that invokes crews, branches on results, persists state, and reacts to events. This is what makes CrewAI a peer to LangGraph for the orchestration tier:

from crewai.flow.flow import Flow, listen, start
from pydantic import BaseModel

class PostState(BaseModel):
    topic: str = ""
    brief: str = ""
    article: str = ""
    approved: bool = False

class ContentPipeline(Flow[PostState]):

    @start()
    def receive_topic(self):
        self.state.topic = self.kickoff_inputs["topic"]
        return self.state.topic

    @listen(receive_topic)
    def run_research(self, topic):
        crew = research_crew()
        self.state.brief = crew.kickoff(inputs={"topic": topic}).raw
        return self.state.brief

    @listen(run_research)
    def run_writing(self, brief):
        crew = writing_crew()
        self.state.article = crew.kickoff(inputs={"brief": brief}).raw
        return self.state.article

    @listen(run_writing)
    def request_approval(self, article):
        # Pause and wait for human approval (post 4 covered the pattern).
        self.state.approved = approval_gate(article)
        if self.state.approved:
            publish(article)

flow = ContentPipeline()
result = flow.kickoff(inputs={"topic": "agent observability"})

@start marks the entrypoint, @listen listens for the completion of another step. The state is a typed Pydantic model that persists across invocations — exactly what LangGraph’s checkpointer does, with a different API. Flows can also branch (@or_), trigger on external events (webhooks), and run partially (flow.kickoff_async resumes from a stored checkpoint).

What CrewAI gets right

The role abstraction reads like product requirements. Stakeholders can read a Crew definition and understand it; they generally can’t read a LangGraph.
Sequential is the right default. A surprising number of “multi-agent” systems are linear pipelines pretending otherwise. CrewAI doesn’t penalize you for being honest about that.
Built-in memory. Setting memory=True on an agent gives you short-term, long-term, and entity memory with reasonable defaults. Sufficient for most use cases without writing your own memory layer.
Output schemas via output_json / output_pydantic. Every task can demand a typed result; the framework wraps the LLM call in the right structured-output mode.

Where the abstraction leaks

Sub-tasks inside an agent are invisible. CrewAI runs its own internal loop per task; the trace can be coarser than LangSmith’s per-node spans. You can recover the detail with the verbose mode, but it’s not as clean as LangGraph’s checkpoints.
Conditional branching inside a crew is awkward. If task_2 should sometimes run and sometimes not, you push that logic up into a flow. Don’t try to express it inside the crew with cleverly-worded task descriptions.
Manager agents can hallucinate plans. Hierarchical crews where the manager re-routes work can land in loops where the manager keeps re-assigning the same task. The fix is a step cap on the manager and explicit max_iter on agents.
Tool calling quality depends on the model. Less-capable models can struggle with CrewAI’s tool descriptions (they’re terse). Bigger frontier models — Claude Opus, GPT, Gemini Pro — handle them well; older OSS models often need rewording.

Choosing between CrewAI and LangGraph

The pragmatic rule:

You want…	Use
Explicit state machine, dynamic branching, complex routing	LangGraph
Roles that map cleanly to job titles, simple sequential or hierarchical handoffs	CrewAI Crews
Long-running, event-driven workflow that triggers multiple crews	CrewAI Flows
Maximum control, minimum framework opinion	LangGraph
Fastest path from PRD to working multi-agent prototype	CrewAI

It’s also legitimate to use both. A common production shape in 2026 is LangGraph for the runtime control plane, CrewAI for individual “research-and-write” crews invoked as subgraphs. The frameworks compose: a CrewAI crew’s kickoff() is just a function call you can put inside a LangGraph node.

Next week we look at Google’s Agent Development Kit, which takes a third path — an event-driven runtime that’s deliberately less prescriptive than either.