Raindrop Workshop uses Codex to automatically find and fix bugs in your AI Agent (free and open source)

📄Full Article· Automatically extracted by trafilaturaGemini 翻譯1609 words

Raindrop, an AI Agent developer tool company, open-sourced its local debugger Workshop (v0.1.6) this week, allowing developers to track every token output and tool call of an Agent in real-time, and enabling Claude Code to automatically read, write tests, and fix issues via MCP. (Previous coverage: Claude making constant mistakes and playing dumb? Transforming Andrej Karpathy's 12 rules helps cut your error rate from 41% to 3%) (Background: Anthropic launches "Claude for Small Business": Targeting AI automation for SMEs) You open the logs and see a pile of API calls and token numbers, but there are no clues telling you which decision went wrong. Your AI Agent just produced a strange result. It chose a tool you didn't expect and output a semantically ambiguous response. On May 14, Raindrop released an open-source tool attempting to prevent this scenario: Workshop, a fully local, completely free AI Agent debugger. It allows developers to track every token output and tool call of an Agent in real-time, and then hand over the debugging process itself to Claude Code or Codex. Traditional software debugging has breakpoints, complete call stacks, and deterministic execution paths. AI Agent debugging is different. Its behavior is probabilistic; the same input may lead to completely different paths across different executions. Its decisions are formed across multiple layers of LLM calls, and it is almost impossible to discern any logic from terminal output alone. The essence of the problem is: you are not looking for "which line of code is wrong," but rather "at which step did the Agent make an unexpected judgment under a specific combination of contexts." Traditional debuggers cannot find the answer to this type of problem. Existing solutions usually offer only two paths: - First, cloud monitoring platforms, which send traces to third-party services for dashboard analysis. - Second, stuffing custom logging logic into the code. The former is unfriendly to developers with data privacy concerns, while the latter is time-consuming and labor-intensive, requiring the maintenance of a new logging infrastructure with every framework upgrade. Furthermore, both share a common problem: they tell you "what happened," but they don't help you "fix it." Workshop chooses a third path: fully local execution, no data sent to external servers, open-source, free, and allowing AI to participate directly in the debugging loop. Once launched, Workshop runs a visual interface locally and exposes an MCP (Model Context Protocol) Server. MCP is the "standard communication protocol that allows AI tools to call external capabilities"—it is the bridge for AI coding tools like Claude Code to read external data. Once connected to the supported SDK, every execution node of the Agent—every token output, every tool call, every decision branch—appears in real-time via streaming at localhost:5899, with no polling or manual refreshing required. In plain terms, it's like opening a monitoring window locally on your computer, letting you watch what the AI Agent is doing in real-time, just like a live stream. The most critical design of Workshop is bringing coding assistants like Claude Code into the debugging loop. Since Workshop exposes an MCP Server, Claude Code can directly read the trace content, write eval tests based on these traces, execute the tests, observe failed assertions, go back to modify the Agent's code, and re-execute—until all tests pass. Raindrop calls this loop the "self-healing eval loop." The entire process is a closed loop locally; Claude Code reads the trace, writes the eval, sees the failure, modifies the code, and re-runs, without requiring the developer to manually intervene in every step. Workshop also supports a Replay feature: pulling traces from the production environment back to the local machine and re-executing them against the actual code for regression testing. This is particularly useful for situations where "errors occur in production but cannot be reproduced locally," as you can run the actual trace directly, saving the time required to construct reproduction scenarios.

Data Status✓ Full text extractedRead Original (動區 BlockTempo)

🔍Historical Similar Events· Keyword + Asset Matching6 items

2026-05-27

Base launches MCP tools, enabling AI Agents to directly control crypto wallets: transfers, token swaps, and balance checks all in one go

Similarity 120%關鍵字 agent同分類 zh

2026-05-26

Google and Meta researchers jointly speak out: AI Agent security is not a model problem, it's a system problem

Similarity 120%關鍵字 agent同分類 zh

2026-05-26

How to Make AI Write Code Slower, but More Correctly: Multi-Model PR Review to Minimize Bug Probability

Similarity 120%關鍵字 bug同分類 zh

2026-05-26