Andrej Karpathy:
"Remove yourself as the bottleneck. Maximize your leverage. Put in very few tokens, and a huge amount of stuff happens on your behalf."
loop engineering is the exact thing that gets you there.
in a hand-run session you do two things. you decide what the agent runs next, and you check its output before the next step. both are manual, and both are the ceiling on how far the agent gets without you.
loop engineering moves both steps into the system. the diagram below shows the operating structure that surrounds the loop:
→ a trigger decides what to run, whether that's a message, an event, or a schedule, so the agent starts without you there to kick it off.
→ the loop is the maker that produces the work, thinking, acting, and observing until it's done or the brakes stop it.
→ a separate checker grades the output, because a model grading its own work justifies what it already did instead of catching where it failed. the checker's findings return to the maker as the next instruction, and the cycle repeats until nothing is left to fix.
→ state lives on disk, not in context, since the model forgets everything between runs. an MD file or a knowledge graph holds what's done and what's still open, so a loop can pick up again days later.
for that state layer, Zep's Graphiti is a clean open-source option, a temporal knowledge graph that invalidates stale facts and returns context through vector, full-text, and graph search in one call.
repo: https://github.com/getzep/graphiti
two things decide whether an unattended loop holds up.
the exit has to be set before the loop runs, not while it's running. a loop with no stop condition burns tokens, and the cost climbs fast once sub-agents and long runs stack up. a clean exit reads like "all tests pass and lint is clean, stop after two passes."
and the checker only catches failures inside a run. the harness around the loop, the prompts, tools, and checks wrapped around the model, still drifts and breaks in production as models change. catching that needs observability on every run, not a green checkmark.
Comet's Opik is built for that layer, an open-source tool that traces every call and turns a failing production trace into a regression test so the same break can't recur.
repo: https://github.com/comet-ml/opik
your job stops being the hands inside the loop. it becomes designing the machine that runs without you, then watching the traces closely enough to trust it.
the model is becoming a commodity. the loop around it is where the real engineering lives now.
I wrote the full breakdown. the article is quoted below.
stay tuned for more on this!
"Remove yourself as the bottleneck. Maximize your leverage. Put in very few tokens, and a huge amount of stuff happens on your behalf."
loop engineering is the exact thing that gets you there.
in a hand-run session you do two things. you decide what the agent runs next, and you check its output before the next step. both are manual, and both are the ceiling on how far the agent gets without you.
loop engineering moves both steps into the system. the diagram below shows the operating structure that surrounds the loop:
→ a trigger decides what to run, whether that's a message, an event, or a schedule, so the agent starts without you there to kick it off.
→ the loop is the maker that produces the work, thinking, acting, and observing until it's done or the brakes stop it.
→ a separate checker grades the output, because a model grading its own work justifies what it already did instead of catching where it failed. the checker's findings return to the maker as the next instruction, and the cycle repeats until nothing is left to fix.
→ state lives on disk, not in context, since the model forgets everything between runs. an MD file or a knowledge graph holds what's done and what's still open, so a loop can pick up again days later.
for that state layer, Zep's Graphiti is a clean open-source option, a temporal knowledge graph that invalidates stale facts and returns context through vector, full-text, and graph search in one call.
repo: https://github.com/getzep/graphiti
two things decide whether an unattended loop holds up.
the exit has to be set before the loop runs, not while it's running. a loop with no stop condition burns tokens, and the cost climbs fast once sub-agents and long runs stack up. a clean exit reads like "all tests pass and lint is clean, stop after two passes."
and the checker only catches failures inside a run. the harness around the loop, the prompts, tools, and checks wrapped around the model, still drifts and breaks in production as models change. catching that needs observability on every run, not a green checkmark.
Comet's Opik is built for that layer, an open-source tool that traces every call and turns a failing production trace into a regression test so the same break can't recur.
repo: https://github.com/comet-ml/opik
your job stops being the hands inside the loop. it becomes designing the machine that runs without you, then watching the traces closely enough to trust it.
the model is becoming a commodity. the loop around it is where the real engineering lives now.
I wrote the full breakdown. the article is quoted below.
stay tuned for more on this!
17
49
235
23.6K




















