You can build your first useful AI agent in two weeks if the job is narrow, the data sources are known, and the first version stays draft-only. Do not start with a company-wide assistant. Pick one repeat workflow, write the rules, connect only the tools it needs, and measure how often humans edit the output. The goal is not autonomy on day one. The goal is a reliable assistant that makes one real job faster.
What does it mean to build your first AI agent?
Building your first AI agent means turning one repeat business workflow into a controlled software worker. The agent gets a role, context, tools, memory, and approval rules. It should start by drafting or preparing work, not by taking irreversible actions.
The two-week rule
A first agent should be small enough to understand in one meeting. If the workflow takes three departments, four edge-case policies, and a dozen tools, it is too big for the first build.
Two weeks is enough time to ship a useful first version when the scope is honest. It is not enough time to rebuild your whole company around AI. That is fine. The first agent is supposed to prove the operating pattern.
| Week | Focus | Output | Do not do |
|---|---|---|---|
| 1 | Workflow, context, prompts, source data. | A draft-only agent in a test environment. | Connect write tools or customer sends. |
| 2 | Evals, guardrails, logging, deployment, review loop. | A production pilot with human approval. | Promote autonomy without correction data. |
Day 1: pick one repeat workflow
The right first workflow is boring. It happens often, has known inputs, has a human reviewer, and has a clear definition of good output.
Good examples include support reply drafting, intake triage, weekly finance briefs, stale lead follow-up, proposal prep, review response drafts, and order-risk summaries. Bad examples include "help run the business" or "act like the founder."
Write the workflow in one sentence: When this input arrives, the agent prepares this output, using these sources, for this human to approve. If that sentence gets long, the scope is too wide.
Days 2-3: gather the context
Most weak agents are not weak because the model is bad. They are weak because the context is scattered. The agent cannot follow policies it cannot see.
Collect the source material before writing clever prompts. For a support agent, that means policies, macros, product facts, order fields, escalation rules, and strong past replies. For a finance agent, it means chart of accounts, sales data, payout data, refund data, COGS, and the questions the owner asks every morning.
Keep the context small at first. A tight source pack beats a massive folder of stale documents. Version it so you know what changed when the agent gets better or worse.
Days 4-5: write the first prompt like an operating procedure
A useful agent prompt is not a poem. It is an operating procedure. It tells the agent its role, the job, the sources it may trust, what to do when data is missing, what not to do, and when to escalate.
The most important line is usually the stop rule. If the customer asks for a refund outside policy, stop. If the order data is missing, stop. If the source conflicts with the customer, stop. If confidence is low, stop.
Do not ask the agent to "be smart." Tell it exactly what useful looks like.
The first-agent prompt stack
Role, task, trusted sources, output format, examples, escalation rules, banned actions, and review checklist. That is enough for version one.
Days 6-7: wire only the tools it needs
The first tool is usually a read tool. Let the agent pull orders, tickets, docs, notes, CRM fields, or files. Do not give it send, refund, delete, publish, or charge permissions yet.
Every tool needs a reason to exist. If the agent can do the job with three reads, do not connect ten systems. Each extra tool adds permission risk, debugging surface, and more ways to be confidently wrong.
Log every tool call. The log should show what the agent asked for, what it received, what it drafted, and what the human changed. Without logs, you cannot improve the system honestly.
Week 2: evals before confidence
Evals are not fancy. For a first agent, an eval can be a spreadsheet of twenty real examples with expected behavior. Feed each example to the agent and compare the output to what a good human would do.
Track simple categories: correct, mostly correct, missing context, wrong policy, bad tone, unsafe action, should have escalated. That gives you a punch list.
If the agent fails the same way three times, do not blame the model first. Fix the source material, prompt, examples, or tool output.
Deployment should be boring
For the pilot, put the agent where the reviewer already works. That might be a helpdesk internal note, Slack channel, email draft folder, dashboard, or daily brief.
Do not make the reviewer open five new tools. If adoption requires a new habit, the agent has to be unusually good. Most first versions are not there yet.
Ship with a kill switch, a clear owner, and a rollback plan. The first version should be easy to turn off without breaking the workflow.
What to measure after launch
Measure the agent like an operator, not a demo viewer. Output volume does not matter by itself. Useful output matters.
- Draft acceptance rate: how often humans use the draft with light edits.
- Correction types: what the human keeps fixing.
- Escalation quality: whether the agent stops on risky work.
- Time saved: minutes removed from the workflow.
- Trust: whether the team asks for it without being pushed.
The mistakes that burn the first build
The first mistake is starting too broad. The second is connecting write tools too early. The third is hiding failures because the demo looked cool.
The fourth mistake is skipping examples. A model needs to see your standard. Give it good drafts, bad drafts, edge cases, and the reason each one is good or bad.
The fifth mistake is treating the prompt as the product. The product is the workflow: data in, agent draft, human review, logs, correction loop, safer output next time.
A concrete first-agent example
Take a support reply agent. The input is a customer ticket. The agent reads the latest ticket message, order status, shipping policy, return policy, and the customer's recent history. Then it drafts an internal note for the support rep.
Version one should not send the message. It should prepare the reply, show the sources it used, state any missing data, and suggest whether the ticket is safe, sensitive, or needs escalation.
The output might include a short summary, a drafted response, source links, risk level, and one sentence explaining why the agent chose that path. That is enough for a real pilot.
The simple data model
You do not need a huge platform for the first version. You need enough state to know what happened.
- Workflow item: the ticket, lead, brief, or task being processed.
- Status: new, drafted, needs review, approved, rejected, sent, failed, or escalated.
- Inputs: the raw message and the context the agent saw.
- Output: the draft or brief the agent produced.
- Reviewer action: accepted, edited, rejected, escalated.
- Correction notes: what the human changed and why.
This structure gives you the correction loop. Without it, every agent improvement becomes a hunch.
Definition of done
A first agent is done when a human can use it in the real workflow without extra explanation. It does not need to be perfect. It does need to be understandable, reviewable, and safe to turn off.
My bar is simple: the agent handles the happy path, stops on obvious risk, logs its sources, saves a usable draft, and makes the reviewer's day easier. If it creates more review work than it removes, it is not done.
The next build should not start until the first one has correction data. Otherwise you are scaling uncertainty.
The week-one checklist
By the end of week one, you should be able to run the workflow with test inputs and see a useful draft. It does not need deployment polish yet. It needs truth.
- The workflow is written in one sentence.
- The reviewer and approval point are named.
- The trusted source list is small and current.
- The agent has at least ten good examples and ten edge cases.
- The output format is fixed enough to validate.
- The agent has stop rules for missing data, conflict, policy issues, and low confidence.
- Every tool is read-only unless there is a strong reason it cannot be.
If week one ends with a broad assistant that can "help with anything," stop and rescope. That is a chat toy, not a production agent.
The week-two checklist
Week two is where the build becomes operational. You are not trying to make the agent sound smarter. You are trying to make it safer, easier to review, and easier to improve.
- Run the eval set and record failures by category.
- Add logging for inputs, tool calls, retrieved context, output, and reviewer action.
- Add a visible confidence or risk label where the reviewer sees it.
- Put the draft inside the existing workflow surface.
- Write a one-page runbook that explains how to pause the agent.
- Pick the first week of pilot metrics before launch.
The agent should launch quietly. The best first launch is a reviewer saying, "This saves me a few minutes," then using it again the next day.
How to turn this into a project brief
If this topic is moving from article to build, write the project brief before picking tools. The brief should fit on one page. If it cannot, the scope is probably still too wide.
Use five fields: workflow, owner, sources, allowed actions, and proof. The workflow names the repeat job. The owner names the human reviewer. The sources name the systems and documents the agent may trust. The allowed actions name what the agent can read, draft, update, or never touch. The proof names the metric that decides whether the build worked.
- Workflow: what input starts the agent and what output should exist at the end?
- Owner: who reviews quality and who can pause the agent?
- Sources: which records, files, policies, and examples are trusted?
- Actions: what is read-only, what is draft-only, and what requires approval?
- Proof: what correction rate, time saved, or risk reduction would make this worth keeping?
This keeps the build tied to business work. Agents fail when they become an abstract technology project. They work when the job, reviewer, sources, permissions, and proof are clear before code starts.
Frequently asked questions
Can you really build an AI agent in two weeks?
Yes, if the first agent has a narrow job, known sources, and human approval. Two weeks is enough for a useful pilot, not a whole company operating system.
What should a first AI agent do?
A first AI agent should draft, summarize, route, or prepare work that already repeats. It should not take high-risk actions without approval.
Should the first agent use tools?
Usually yes, but start with read-only tools. Let it pull context before giving it permissions to send, refund, publish, delete, or change records.
How many examples does a first agent need?
Start with 20 to 50 real examples. Include good outputs, bad outputs, edge cases, and notes on why the human made each decision.
When is the first agent ready for more autonomy?
It is ready for more autonomy only after correction data shows consistent output, clean escalation behavior, reliable logs, and a low-risk action worth promoting.
Key takeaways
- A first AI agent should handle one repeat workflow, not the whole business.
- Two weeks is enough for a draft-only production pilot when the scope is tight.
- Context beats clever prompting. Gather trusted sources first.
- Start with read-only tools and human approval.
- Evals can be simple: real examples, expected behavior, correction categories.
- Measure acceptance, corrections, escalation quality, time saved, and trust.
Related reading
Want the first agent scoped before anyone builds?
The intake gives us your stack, repeat work, risk level, and approval needs. From there, we pick the first workflow and build the guardrails around it.
Start the intake →