diff --git a/docs/examples/tools/browserless.ipynb b/docs/examples/tools/browserless.ipynb new file mode 100644 index 00000000000..9d126f917f3 --- /dev/null +++ b/docs/examples/tools/browserless.ipynb @@ -0,0 +1,320 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Browserless: browser automation via MCP\n", + "\n", + "[Browserless](https://browserless.io) is a hosted browser-automation MCP server. It exposes 10 tools \u2014 search, smart scraper, map, crawl, export, performance, function, download, and a multi-turn browser agent \u2014 over a single streamable-HTTP MCP endpoint. With `llama-index-tools-mcp` you can auto-import all 10 as LlamaIndex tools without writing a partner package.\n", + "\n", + "Browser sessions are pinned by `Mcp-Session-Id`, so multi-turn workflows preserve browser state across calls. This notebook shows both stateless single-shot tools and the multi-turn `browserless_agent`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-07T15:45:05.406332Z", + "iopub.status.busy": "2026-05-07T15:45:05.406175Z", + "iopub.status.idle": "2026-05-07T15:45:06.263463Z", + "shell.execute_reply": "2026-05-07T15:45:06.262613Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Note: you may need to restart the kernel to use updated packages.\n" + ] + } + ], + "source": [ + "%pip install -q llama-index llama-index-tools-mcp llama-index-llms-anthropic" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Credentials\n", + "\n", + "Get a Browserless API token at [account.browserless.io](https://account.browserless.io). The agent example also uses Anthropic \u2014 set `ANTHROPIC_API_KEY` to run those cells." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-07T15:45:06.265412Z", + "iopub.status.busy": "2026-05-07T15:45:06.265252Z", + "iopub.status.idle": "2026-05-07T15:45:06.268033Z", + "shell.execute_reply": "2026-05-07T15:45:06.267529Z" + } + }, + "outputs": [], + "source": [ + "import getpass\n", + "import os\n", + "\n", + "if not os.environ.get(\"BROWSERLESS_TOKEN\"):\n", + " os.environ[\"BROWSERLESS_TOKEN\"] = getpass.getpass(\"BROWSERLESS_TOKEN:\\n\")\n", + "\n", + "if not os.environ.get(\"ANTHROPIC_API_KEY\"):\n", + " os.environ[\"ANTHROPIC_API_KEY\"] = getpass.getpass(\"ANTHROPIC_API_KEY:\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Connect and list tools\n", + "\n", + "Connect with `BasicMCPClient` and load the 10 tools as LlamaIndex tools. The default 30-second `timeout` is too short for the first call's MCP handshake plus browser warm-up \u2014 pass `timeout=120`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-07T15:45:06.269501Z", + "iopub.status.busy": "2026-05-07T15:45:06.269384Z", + "iopub.status.idle": "2026-05-07T15:45:14.364280Z", + "shell.execute_reply": "2026-05-07T15:45:14.363260Z" + } + }, + "outputs": [], + "source": [ + "import logging\n", + "\n", + "from llama_index.tools.mcp import BasicMCPClient, McpToolSpec\n", + "\n", + "# Silence an MCP SDK pydantic validation warning on server progress\n", + "# notifications. Harmless and unrelated to the tool call results below.\n", + "logging.getLogger().addFilter(\n", + " lambda r: \"Failed to validate notification\" not in r.getMessage()\n", + ")\n", + "\n", + "client = BasicMCPClient(\n", + " \"https://mcp.browserless.io/mcp\",\n", + " headers={\"Authorization\": f\"Bearer {os.environ['BROWSERLESS_TOKEN']}\"},\n", + " timeout=120,\n", + ")\n", + "\n", + "tools = await McpToolSpec(client=client).to_tool_list_async()\n", + "for t in tools:\n", + " print(t.metadata.name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Invoke a stateless tool directly\n", + "\n", + "Eight of the 10 tools are stateless single-shot calls (`browserless_smartscraper`, `browserless_search`, `browserless_map`, `browserless_crawl`, `browserless_export`, `browserless_performance`, `browserless_function`, `browserless_download`). Each call is independent, so you can invoke them without an LLM in the loop." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-07T15:45:14.366532Z", + "iopub.status.busy": "2026-05-07T15:45:14.366194Z", + "iopub.status.idle": "2026-05-07T15:45:17.758696Z", + "shell.execute_reply": "2026-05-07T15:45:17.757656Z" + } + }, + "outputs": [], + "source": [ + "smartscraper = next(t for t in tools if t.metadata.name == \"browserless_smartscraper\")\n", + "result = await smartscraper.acall(url=\"https://example.com\", formats=[\"markdown\"])\n", + "print(str(result)[:400])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use stateless tools in an agent\n", + "\n", + "Hand the stateless tools to a `FunctionAgent` powered by Claude. The model picks which tool to call at each step. This pattern works without any session-management gymnastics because every step is a single self-contained call." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-07T15:45:17.760943Z", + "iopub.status.busy": "2026-05-07T15:45:17.760719Z", + "iopub.status.idle": "2026-05-07T15:45:32.933839Z", + "shell.execute_reply": "2026-05-07T15:45:32.932818Z" + } + }, + "outputs": [], + "source": [ + "from llama_index.core.agent.workflow import FunctionAgent\n", + "from llama_index.llms.anthropic import Anthropic\n", + "\n", + "stateless_tools = [\n", + " t for t in tools\n", + " if t.metadata.name not in (\"browserless_agent\", \"browserless_skill\")\n", + "]\n", + "\n", + "agent = FunctionAgent(\n", + " name=\"BrowserlessAgent\",\n", + " description=\"Browses and scrapes the web via the Browserless MCP server.\",\n", + " llm=Anthropic(model=\"claude-sonnet-4-6\"),\n", + " tools=stateless_tools,\n", + " system_prompt=\"You are a research assistant. Use the Browserless tools to scrape and summarize the web.\",\n", + ")\n", + "\n", + "response = await agent.run(\n", + " \"Scrape https://news.ycombinator.com and list the top 5 headlines as a markdown bullet list.\"\n", + ")\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Multi-turn browser agent\n", + "\n", + "The `browserless_agent` tool is different: it's a stateful, multi-turn driver where each call (`goto`, `snapshot`, `click`, `type`, `text`, `evaluate`, ...) operates on a persistent browser session. To preserve that session across calls you need a single MCP `ClientSession` for the whole flow.\n", + "\n", + "**The default `McpToolSpec.to_tool_list_async()` + `tool.acall(...)` pattern does not preserve state.** `BasicMCPClient.call_tool()` opens a fresh `_run_session()` per invocation, with a fresh `Mcp-Session-Id`. That's fine for the stateless tools above, but a multi-turn `browserless_agent` flow gets routed to a different browser on every call \u2014 your `goto` lands on browser A, your `snapshot` lands on browser B, and you get back `about:blank`.\n", + "\n", + "To share state, use the `client._run_session()` async context manager directly and call `session.call_tool(...)` inside it. The method is underscore-prefixed (Python convention for non-public) but it's just an `@asynccontextmanager`. Until upstream ships a public `client.session()` API, this is the cleanest way to keep one MCP session alive across multiple tool calls." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-07T15:45:32.939703Z", + "iopub.status.busy": "2026-05-07T15:45:32.939477Z", + "iopub.status.idle": "2026-05-07T15:45:36.844708Z", + "shell.execute_reply": "2026-05-07T15:45:36.843094Z" + } + }, + "outputs": [], + "source": [ + "async with client._run_session() as session:\n", + " await session.call_tool(\n", + " \"browserless_agent\",\n", + " arguments={\"method\": \"goto\", \"params\": {\"url\": \"https://example.com\"}},\n", + " )\n", + " await session.call_tool(\"browserless_agent\", arguments={\"method\": \"snapshot\"})\n", + " result = await session.call_tool(\n", + " \"browserless_agent\",\n", + " arguments={\"method\": \"text\", \"params\": {\"selector\": \"h1\"}},\n", + " )\n", + " print(result.content[0].text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Drive the multi-turn agent from an LLM\n", + "\n", + "Wrap session-bound `call_tool` invocations in a `FunctionTool` the LLM can call. The whole agent run lives inside one `client._run_session()` block so every step shares the same browser." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "execution": { + "iopub.execute_input": "2026-05-07T15:45:36.850121Z", + "iopub.status.busy": "2026-05-07T15:45:36.849728Z", + "iopub.status.idle": "2026-05-07T15:46:12.378373Z", + "shell.execute_reply": "2026-05-07T15:46:12.376315Z" + } + }, + "outputs": [], + "source": [ + "from llama_index.core.tools import FunctionTool\n", + "\n", + "async with client._run_session() as session:\n", + " async def browser_action(method: str, params: dict | None = None) -> str:\n", + " \"\"\"Drive the Browserless multi-turn browser agent. Methods include goto, snapshot, click, type, text, evaluate. Pass params per the method's schema (e.g. {\\\"url\\\": \\\"...\\\"} for goto, {\\\"selector\\\": \\\"...\\\"} for text).\"\"\"\n", + " result = await session.call_tool(\n", + " \"browserless_agent\",\n", + " arguments={\"method\": method, \"params\": params or {}},\n", + " )\n", + " return result.content[0].text\n", + "\n", + " browser_tool = FunctionTool.from_defaults(\n", + " async_fn=browser_action,\n", + " name=\"browserless_agent\",\n", + " description=\"Drive a stateful browser. Use methods like goto, snapshot, click, type, text, evaluate. Snapshot before extracting elements so you know what selectors are available.\",\n", + " )\n", + "\n", + " browser_agent = FunctionAgent(\n", + " name=\"BrowserlessBrowserAgent\",\n", + " description=\"Drives a real browser to navigate, click, and extract content.\",\n", + " llm=Anthropic(model=\"claude-sonnet-4-6\"),\n", + " tools=[browser_tool],\n", + " system_prompt=(\n", + " \"You drive a real browser via the browserless_agent tool. \"\n", + " \"Always snapshot the page before extracting elements so you know what selectors are available. \"\n", + " \"Use selector references from the snapshot when calling text/click.\"\n", + " ),\n", + " )\n", + "\n", + " response = await browser_agent.run(\n", + " \"Navigate to https://news.ycombinator.com, click the first story link, and report the title and first paragraph of that page.\"\n", + " )\n", + " print(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Resources\n", + "\n", + "- Hosted MCP server: [mcp.browserless.io](https://mcp.browserless.io)\n", + "- Runnable cookbook with all examples above: [`browserless/browserless-llamaindex`](https://github.com/browserless/browserless-llamaindex)\n", + "- LlamaIndex MCP adapter: [`llama-index-tools-mcp`](https://github.com/run-llama/llama_index/tree/main/llama-index-integrations/tools/llama-index-tools-mcp)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file