开发者
我要评分
获取效率
正确性
完整性
易理解
在线提单
论坛求助

AgentBrowser(浏览器)的沙箱内执行

  1. 制作沙箱模板。
    1. 下载镜像。
      wget https://repo.openeuler.org/openEuler-24.03-LTS-SP3/docker_img/aarch64/openEuler-docker.aarch64.tar.xz
    2. 解压缩。
      xz -d openEuler-docker.aarch64.tar.xz
    3. 加载镜像。
      docker load -i openEuler-docker.aarch64.tar
    4. 加载容器。
      docker run -itd --name search-image openeuler-24.03-lts-sp3:latest /bin/bash
  2. 进入容器,安装agent-browser所需组件。
    1. 进入容器。
      docker exec -it search-image bash
    2. 安装以下组件。
      dnf install -y nodejs
      npm install -g agent-browser
      npx playwright install
      npx playwright install chromium
      dnf install -y nss nspr atk at-spi2-atk gtk3 alsa-lib libgbm libdrm mesa-libEGL
  3. 安装E2B沙箱必备组件,并制作镜像。
    1. 使用yum命令安装以下组件。
      yum install -y wget systemd systemd-sysv openssh-server sudo chrony linuxptp socat curl iputils bind-utils iproute nc tcpdump passwd && yum clean all && rm -rf /var/cache/yum /var/tmp/* /tmp/*
    2. 使用wget命令安装以下组件。
      wget -O /usr/local/bin/websocat https://github.com/vi/websocat/releases/latest/download/websocat.aarch64-unknown-linux-musl && chmod a+x /usr/local/bin/websocat && websocat --version
    3. 执行exit退出容器,并制作镜像。
      docker commit <容器名> <镜像名>:<版本号>
  4. 创建容器,进入龙虾,绑定制作的沙箱模板。可参考Python代码的沙箱内执行E2B沙箱服务部署(可选)
  5. agent-browser原生SKILL(即https://clawhub.ai/matrixy/agent-browser-clawdbot#files)不考虑E2B沙箱环境,需修改SKILL.md。此任务由Agent自动执行,修改结果为:
    ---
    name: Agent Browser (local-exec / E2B Sandbox)
    description: Headless browser automation via agent-browser CLI inside E2B sandbox using local-exec. Chrome path and launch args are pre-configured.
    read_when:
      - Automating web interactions inside E2B sandbox
      - Extracting structured data from pages via local-exec
      - Filling forms programmatically in sandboxed browser
      - Testing web UIs via local-exec tool
    metadata: {"clawdbot":{"emoji":"","requires":{"bins":["agent-browser","sudo"]}}}
    allowed-tools: Bash(agent-browser:*)
    ---
     
    # Browser Automation with agent-browser (E2B Sandbox)
     
    ## Execution Environment
     
    All `agent-browser` commands run inside the **E2B cloud sandbox** via `local-exec` (kind: bash).
     
    **Key constraints:**
    - Chrome is installed by Playwright at: `/root/.cache/ms-playwright/chromium-1223/chrome-linux/chrome`
    - Chrome requires `sudo` to run (sandbox user has no permission to `/root/.cache`)
    - Chrome **must** be launched with: `--no-sandbox --disable-gpu --disable-dev-shm-usage`
    - Must run in **headless** mode: `--headed false`
    - Always close existing daemon before starting with new options: `sudo agent-browser close --all`
     
    **Command template:**
     
    ```bash
    sudo agent-browser close --all 2>&1 && \
    sudo agent-browser \
      --executable-path /root/.cache/ms-playwright/chromium-1223/chrome-linux/chrome \
      --args "--no-sandbox,--disable-gpu,--disable-dev-shm-usage" \
      --headed false \
      <command> <args>
    ```
     
    ## Quick Start
     
    ### Via local-exec (tool call)
     
    ```
    local-exec(kind="bash", command="sudo agent-browser close --all 2>&1 && sudo agent-browser --executable-path /root/.cache/ms-playwright/chromium-1223/chrome-linux/chrome --args \"--no-sandbox,--disable-gpu,--disable-dev-shm-usage\" --headed false open https://www.example.com 2>&1")
    ```
     
    ### Core workflow
     
    1. **Close daemon + navigate**: `sudo agent-browser close --all && sudo agent-browser --executable-path <path> --args "<chrome-args>" --headed false open <url>`
    2. **Snapshot**: `sudo agent-browser snapshot -i` (returns elements with refs like `@e1`, `@e2`)
    3. **Interact** using refs from the snapshot
    4. **Re-snapshot** after navigation or significant DOM changes
     
    ## Commands
     
    ### Navigation
     
    ```bash
    sudo agent-browser close --all && sudo agent-browser --executable-path /root/.cache/ms-playwright/chromium-1223/chrome-linux/chrome --args "--no-sandbox,--disable-gpu,--disable-dev-shm-usage" --headed false open <url>
    sudo agent-browser back
    sudo agent-browser forward
    sudo agent-browser reload
    sudo agent-browser close --all
    ```
     
    ### Snapshot (page analysis)
     
    ```bash
    sudo agent-browser snapshot            # Full accessibility tree
    sudo agent-browser snapshot -i         # Interactive elements only (recommended)
    sudo agent-browser snapshot -c         # Compact output
    sudo agent-browser snapshot -d 3       # Limit depth to 3
    sudo agent-browser snapshot -s "#main" # Scope to CSS selector
    ```
     
    ### Interactions (use @refs from snapshot)
     
    ```bash
    sudo agent-browser click @e1           # Click
    sudo agent-browser dblclick @e1        # Double-click
    sudo agent-browser focus @e1           # Focus element
    sudo agent-browser fill @e2 "text"     # Clear and type
    sudo agent-browser type @e2 "text"     # Type without clearing
    sudo agent-browser press Enter         # Press key
    sudo agent-browser press Control+a     # Key combination
    sudo agent-browser hover @e1           # Hover
    sudo agent-browser check @e1           # Check checkbox
    sudo agent-browser uncheck @e1         # Uncheck checkbox
    sudo agent-browser select @e1 "value"  # Select dropdown
    sudo agent-browser scroll down 500     # Scroll page
    sudo agent-browser scrollintoview @e1  # Scroll element into view
    ```
     
    ### Get information
     
    ```bash
    sudo agent-browser get text @e1        # Get element text
    sudo agent-browser get html @e1        # Get innerHTML
    sudo agent-browser get value @e1       # Get input value
    sudo agent-browser get attr @e1 href   # Get attribute
    sudo agent-browser get title           # Get page title
    sudo agent-browser get url             # Get current URL
    sudo agent-browser get count ".item"   # Count matching elements
    sudo agent-browser get box @e1         # Get bounding box
    ```
     
    ### Check state
     
    ```bash
    sudo agent-browser is visible @e1      # Check if visible
    sudo agent-browser is enabled @e1      # Check if enabled
    sudo agent-browser is checked @e1      # Check if checked
    ```
     
    ### Screenshots & PDF
     
    ```bash
    sudo agent-browser screenshot          # Screenshot to stdout
    sudo agent-browser screenshot path.png # Save to file
    sudo agent-browser screenshot --full   # Full page
    sudo agent-browser pdf output.pdf      # Save as PDF
    ```
     
    ### Wait
     
    ```bash
    sudo agent-browser wait @e1                     # Wait for element
    sudo agent-browser wait 2000                    # Wait milliseconds
    sudo agent-browser wait --text "Success"        # Wait for text
    sudo agent-browser wait --url "/dashboard"      # Wait for URL pattern
    sudo agent-browser wait --load networkidle      # Wait for network idle
    ```
     
    ### Mouse control
     
    ```bash
    sudo agent-browser mouse move 100 200      # Move mouse
    sudo agent-browser mouse down left         # Press button
    sudo agent-browser mouse up left           # Release button
    sudo agent-browser mouse wheel 100         # Scroll wheel
    ```
     
    ### Semantic locators (alternative to refs)
     
    ```bash
    sudo agent-browser find role button click --name "Submit"
    sudo agent-browser find text "Sign In" click
    sudo agent-browser find label "Email" fill "user@test.com"
    sudo agent-browser find first ".item" click
    sudo agent-browser find nth 2 "a" text
    ```
     
    ### Browser settings
     
    ```bash
    sudo agent-browser set viewport 1920 1080      # Set viewport size
    sudo agent-browser set device "iPhone 14"      # Emulate device
    sudo agent-browser set geo 37.7749 -122.4194   # Set geolocation
    sudo agent-browser set offline on              # Toggle offline mode
    sudo agent-browser set headers '{"X-Key":"v"}' # Extra HTTP headers
    sudo agent-browser set media dark              # Emulate color scheme
    ```
     
    ### Cookies & Storage
     
    ```bash
    sudo agent-browser cookies                     # Get all cookies
    sudo agent-browser cookies set name value      # Set cookie
    sudo agent-browser cookies clear               # Clear cookies
    sudo agent-browser storage local               # Get all localStorage
    sudo agent-browser storage local key           # Get specific key
    sudo agent-browser storage local set k v       # Set value
    sudo agent-browser storage local clear         # Clear all
    ```
     
    ### Network
     
    ```bash
    sudo agent-browser network route <url>              # Intercept requests
    sudo agent-browser network route <url> --abort      # Block requests
    sudo agent-browser network route <url> --body '{}'  # Mock response
    sudo agent-browser network unroute [url]            # Remove routes
    sudo agent-browser network requests                 # View tracked requests
    sudo agent-browser network requests --filter api    # Filter requests
    ```
     
    ### Tabs
     
    ```bash
    sudo agent-browser tab                 # List tabs
    sudo agent-browser tab new [url]       # New tab
    sudo agent-browser tab 2               # Switch to tab
    sudo agent-browser tab close           # Close tab
    ```
     
    ### Frames
     
    ```bash
    sudo agent-browser frame "#iframe"     # Switch to iframe
    sudo agent-browser frame main          # Back to main frame
    ```
     
    ### JavaScript
     
    ```bash
    sudo agent-browser eval "document.title"   # Run JavaScript
    ```
     
    ### JSON output (for parsing)
     
    Add `--json` for machine-readable output:
     
    ```bash
    sudo agent-browser snapshot -i --json
    sudo agent-browser get text @e1 --json
    ```
     
    ## Example: Form submission
     
    ```bash
    # Step 1: Navigate
    sudo agent-browser close --all && sudo agent-browser --executable-path /root/.cache/ms-playwright/chromium-1223/chrome-linux/chrome --args "--no-sandbox,--disable-gpu,--disable-dev-shm-usage" --headed false open https://example.com/form 2>&1
     
    # Step 2: Snapshot to get refs
    sudo agent-browser snapshot -i 2>&1
    # Output: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]
     
    # Step 3: Fill and submit
    sudo agent-browser fill @e1 "user@example.com"
    sudo agent-browser fill @e2 "password123"
    sudo agent-browser click @e3
    sudo agent-browser wait --load networkidle
     
    # Step 4: Check result
    sudo agent-browser snapshot -i 2>&1
    ```
     
    ## Troubleshooting
     
    | Symptom | Fix |
    |---|---|
    | `Chrome not found` | Add `--executable-path /root/.cache/ms-playwright/chromium-1223/chrome-linux/chrome` |
    | `Permission denied` | Use `sudo` — Chrome binary is under `/root/.cache` |
    | `Operation timed out` | Add `--args "--no-sandbox,--disable-gpu,--disable-dev-shm-usage"` and `--headed false` |
    | `--executable-path ignored` | Daemon already running with old config. Run `sudo agent-browser close --all` first |
    | Element not found | Run `snapshot -i` again — refs change after navigation |
     
    ## Notes
     
    - **Refs are stable per page load** but change on navigation. Always snapshot after navigating.
    - **Use `fill`** instead of `type` for input fields to clear existing text first.
    - **Always close daemon** before re-launching with different Chrome options.
    - **Command chaining** works in a single `local-exec` call via `&&`.

    本SKILL.md的实际制作过程如下,仅关心如何使用agent-browser可忽略以下内容:

    1. 要求CLAW在沙箱环境中执行如下命令打开网页。
      sudo agent-browser open www.baidu.com --executable-path /root/.cache/ms-playwright/chromium-1223/chrome-linux/chrome

      验证结果是否符合预期,若执行报错,要求CLAW自主排查问题原因并解决。

    2. workspace下提供原agent-browserSKILL.md,要求Agent对其进行修改,以支持local-exec沙箱调用场景
    3. 要求Agent严格遵循SKILL,完成网页浏览任务,验证SKILL.md是否正确。若卡住,要求CLAW自主排查原因,并修改SKILL.md。
  6. 工具验证,告诉龙虾“请使用你已经改好的agent-browser的SKILL.md,然后看能否抓取clawhub上排名第37的SKILL是什么,并告诉我他的SKILL.md里面description是什么”。