News listRaindrop Workshop uses Codex to automatically find and fix bugs in your AI Agent (free and open source)
動區 BlockTempo2026-05-15 01:43:44

Raindrop Workshop uses Codex to automatically find and fix bugs in your AI Agent (free and open source)

ORIGINALRaindrop Workshop 用 Codex 幫你的 AI Agent 自動找 bug 並修復(免費開源)
AI Impact AnalysisGrok analyzing...
📄Full Article· Automatically extracted by trafilaturaGemini 翻譯1609 words
Raindrop, an AI Agent developer tool company, open-sourced its local debugger Workshop (v0.1.6) this week, allowing developers to track every token output and tool call of an Agent in real-time, and enabling Claude Code to automatically read, write tests, and fix issues via MCP. (Previous coverage: Claude making constant mistakes and playing dumb? Transforming Andrej Karpathy's 12 rules helps cut your error rate from 41% to 3%) (Background: Anthropic launches "Claude for Small Business": Targeting AI automation for SMEs) You open the logs and see a pile of API calls and token numbers, but there are no clues telling you which decision went wrong. Your AI Agent just produced a strange result. It chose a tool you didn't expect and output a semantically ambiguous response. On May 14, Raindrop released an open-source tool attempting to prevent this scenario: Workshop, a fully local, completely free AI Agent debugger. It allows developers to track every token output and tool call of an Agent in real-time, and then hand over the debugging process itself to Claude Code or Codex. Traditional software debugging has breakpoints, complete call stacks, and deterministic execution paths. AI Agent debugging is different. Its behavior is probabilistic; the same input may lead to completely different paths across different executions. Its decisions are formed across multiple layers of LLM calls, and it is almost impossible to discern any logic from terminal output alone. The essence of the problem is: you are not looking for "which line of code is wrong," but rather "at which step did the Agent make an unexpected judgment under a specific combination of contexts." Traditional debuggers cannot find the answer to this type of problem. Existing solutions usually offer only two paths: - First, cloud monitoring platforms, which send traces to third-party services for dashboard analysis. - Second, stuffing custom logging logic into the code. The former is unfriendly to developers with data privacy concerns, while the latter is time-consuming and labor-intensive, requiring the maintenance of a new logging infrastructure with every framework upgrade. Furthermore, both share a common problem: they tell you "what happened," but they don't help you "fix it." Workshop chooses a third path: fully local execution, no data sent to external servers, open-source, free, and allowing AI to participate directly in the debugging loop. Once launched, Workshop runs a visual interface locally and exposes an MCP (Model Context Protocol) Server. MCP is the "standard communication protocol that allows AI tools to call external capabilities"—it is the bridge for AI coding tools like Claude Code to read external data. Once connected to the supported SDK, every execution node of the Agent—every token output, every tool call, every decision branch—appears in real-time via streaming at localhost:5899, with no polling or manual refreshing required. In plain terms, it's like opening a monitoring window locally on your computer, letting you watch what the AI Agent is doing in real-time, just like a live stream. The most critical design of Workshop is bringing coding assistants like Claude Code into the debugging loop. Since Workshop exposes an MCP Server, Claude Code can directly read the trace content, write eval tests based on these traces, execute the tests, observe failed assertions, go back to modify the Agent's code, and re-execute—until all tests pass. Raindrop calls this loop the "self-healing eval loop." The entire process is a closed loop locally; Claude Code reads the trace, writes the eval, sees the failure, modifies the code, and re-runs, without requiring the developer to manually intervene in every step. Workshop also supports a Replay feature: pulling traces from the production environment back to the local machine and re-executing them against the actual code for regression testing. This is particularly useful for situations where "errors occur in production but cannot be reproduced locally," as you can run the actual trace directly, saving the time required to construct reproduction scenarios.
Data Status✓ Full text extractedRead Original (動區 BlockTempo)
🔍Historical Similar Events· Keyword + Asset Matching6 items
💡 Currently matching via keywords + symbols (MVP) · Will be upgraded to embedding semantic search later
Raw Information
ID:f6d4f23309
Source:動區 BlockTempo
Published:2026-05-15 01:43:44
Category:zh_news · Export Category zh
Symbols:Unspecified
Community Votes:+0 /0 · ⭐ 0 Important · 💬 0 Comments