News listWeb scraping tool browse.sh: Provides AI agents with a complete skill package for operating over 500 popular websites
動區 BlockTempo2026-05-20 07:46:17

Web scraping tool browse.sh: Provides AI agents with a complete skill package for operating over 500 popular websites

ORIGINAL爬蟲神器 browse.sh:提供 AI 代理超 500+ 常用網站完整操作技能包 Skill
AI Impact AnalysisGrok analyzing...
📄Full Article· Automatically extracted by trafilaturaGemini 翻譯5318 words
AI browser infrastructure company Browserbase has officially launched browse.sh, a browser CLI tool specifically designed for AI agents, featuring over 500 pre-written "web operation skills." (Context: Claude Code Ultimate Cheat Sheet: Shortcuts, Slash Commands, Skills, Agents, MCP complete guide) (Background: Y Combinator Startup Guide: What are the future trends for AI Agents?) This month, a tool called browse.sh was officially launched to solve the problem of how AI agents can "get things done online" more quickly and accurately. The previous answer was: feeding the entire HTML source code of a webpage to a language model to determine where to click or what fields to fill. This method is not only slow but also very expensive: a slightly complex e-commerce page can have tens of thousands of characters in HTML alone, and feeding all of it to a language model results in significant token costs. Browserbase's answer is: pre-writing the operation logic for each website into a "skill," so the agent only needs to call the skill without reading the entire HTML page every time. browse.sh is the command-line entry point for this idea and an open web skill catalog. The official definition of browse.sh is "Browser CLI and open web skill catalog for agents," which in plain English means: a browser CLI tool for AI agents, plus an open web operation skill store. There are two core concepts to clarify first: What is a CLI (Command Line Interface)? It is a tool you run by typing in a terminal window. npm, git, and python are all CLI tools. browse is the same; after installation, you can type browse click "input#search" in the terminal to make the browser click a specific element. What is a headless browser? A browser program that does not actually open a window on the screen, but behaves exactly like a real Chrome: it can execute JavaScript, handle cookies, and bypass basic anti-bot detection. AI agents use it to "see" webpages, fill out forms, and click buttons without the user needing to open anything. What is a skill? A pre-written operation script that tells the agent "where the search bar is on this website, what the ID of the checkout button is, and what the JSON format returned by the API looks like." Compared to letting the agent figure it out every time, skills make the entire process faster and save tokens. The underlying technology of browse.sh is Browserbase's own open-source Stagehand: an open-source toolkit written by Browserbase for "letting AI operate browsers," which can be thought of as Playwright plus an AI semantic understanding layer. browse.sh packages Stagehand's functionality into a more user-friendly CLI tool and adds over 500 ready-made skills on top of it. There are three entry points to the entire ecosystem: - https://browse.sh/: Official website and skill catalog browsing entry - https://browse.sh/llms.txt: A concise skill index for AI agents to read (small size, suitable for feeding directly to language models) - https://browse.sh/llms-full.txt: The complete SKILL.md documentation, including DOM selectors and usage instructions for each skill This design itself is very interesting: browse.sh knows its users are not humans, but AI, so the index format was designed for language models from the start. Installation takes only one line: $ npm install -g browse Once installed, the basic operation commands cover the entire lifecycle of a browser session: $ browse click "input#search" $ browse type "Apartments in SF" $ browse press "Enter" $ browse screenshot $ browse network --tail $ browse console --tail What is a DOM selector here? The DOM is the structure tree of a webpage, and every button, input box, and link is a node on that tree. A DOM selector is the precise address that tells the browser "which node to operate on," for example, input#search means "the input box with the ID 'search'," and button.submit-btn means "the button with the class 'submit-btn'." browse screenshot allows the agent to take screenshots at any time during the operation to confirm the screen state; browse network --tail prints out all HTTP requests made by the browser in real-time: this is very useful for debugging and allows developers to directly see the backend API endpoints called by the website, making it easier to write them as API-type skills later. Installing a skill takes only one line, after which the agent can directly use the pre-written operation logic for that website: $ browse skills add airbnb.com The official full-scenario example demonstrates the upper limits of this tool's capabilities, letting Claude plan a road trip to Utah, including charging stations and campsites, and finally automatically applying for reimbursement on Ramp
Data Status✓ Full text extractedRead Original (動區 BlockTempo)
🔍Historical Similar Events· Keyword + Asset Matching0 items
No similar events found (requires more data samples or embedding search; currently MVP keyword matching)
Raw Information
ID:a710da4229
Source:動區 BlockTempo
Published:2026-05-20 07:46:17
Category:zh_news · Export Category zh
Symbols:Unspecified
Community Votes:+0 /0 · ⭐ 0 Important · 💬 0 Comments