BrowserUse – Web Automation Library for AI Agents

Đại Phạm Ngọc

1. What is BrowserUse?

BrowserUse is an open-source Python library that enables AI Agents to interact with web browsers using Playwright commands. Instead of maintaining complex and brittle selectors/XPath expressions to find and interact with elements, you can simply describe the desired actions in natural language, and let the AI handle the element finding.

This allows automation QA engineers to focus on logic and business requirements rather than struggling with selectors every time the website is updated.

2. Easiest Way to Set Up BrowserUse (BrowserUse – Gemini LangChain – Python)

Install Python and uv (a fast Python package installer and resolver).

Set up a Python environment and activate it:

uv venv --python 3.11
# For Mac/Linux:
source .venv/bin/activate
# For Windows:
.venv\Scripts\activate

Install the required dependencies:

uv pip install browser-use
uv run playwright install

Note: playwright install (formerly referred to as patchright in some contexts) is a command typically run once during setup to:
- Download Playwright browser drivers.
- Modify certain sensitive parts of Chromium to reduce the likelihood of being detected as an automation bot.

Set up LLM API Keys in a .env file:

Create a file named .env in your project root and add your API key:

GOOGLE_API_KEY=YOUR_GOOGLE_API_KEY_HERE #Replace YOUR_GOOGLE_API_KEY_HERE with your actual Google API Key.
ANONYMIZED_TELEMETRY=false
BROWSER_USE_LOGGING_LEVEL=debug

Set up a basic Python program for an agent: Create a Python file (e.g., agent.py):

from langchain_google_genai import ChatGoogleGenerativeAI
from browser_use import Agent
from dotenv import load_dotenv
import asyncio
import os

# Load environment variables from .env file
load_dotenv()

# Initialize the LLM
# Ensure your GOOGLE_API_KEY is set in your environment or .env file
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash-latest") # Using a generally available model

async def main():
    agent = Agent(
        task="Open https://katalon.com/ and get the pricing info",
        llm=llm,
    )
    result = await agent.run()
    print(result.final_result())

if __name__ == "__main__":
    asyncio.run(main())

Run the Python program: The program will then execute the task, and the agent’s findings will be printed to the console.

uv run python agent.py

Results:

3. Strengths of BrowserUse

Vision & HTML Extraction

Enabled by default via the Agent(use_vision=True) parameter. This allows sending images and a summarized JSON structure of the HTML to the AI Agent for analysis.
This helps the AI Agent comprehensively understand the website’s context and make more accurate decisions for actions, even when dealing with dynamic content or JavaScript-driven elements.

Multi-tab Management

BrowserUse allows performing tasks concurrently with multiple Agents.
Each Agent can interact with different browser types and multiple windows simultaneously.

userAgent = Agent(
    task="...",
    llm=llm,
    browser=browser,
)

adminAgent = Agent(
    task="...",
    llm=llm,
    browser=browser,
)

Element Tracking

Saves XPath maps and accurately re-executes LLM actions for synchronous flows.
Why are tokens still consumed heavily even if XPaths are saved?

The reason is:
- Unlike some tools (e.g., “Shortest”), BrowserUse’s mechanism sends a request to the LLM at each step. This is to verify the result of the previous action and determine the appropriateness of the next action. It also revalidates selectors in case the old XPath is no longer active.
- Many complex tasks require logical reasoning, access to execution history, and detailed prompt samples, which can consume a significant number of tokens. Examples include selecting the item with the lowest price or logging in with a username and password defined on the login screen.
- This thoroughness is also a strength of BrowserUse, ensuring robustness (Cost implications are discussed later).

Custom Actions

You can customize existing functions or add new ones, such as saving data to files or handling API interactions.

from browser_use import Controller, ActionResult
controller = Controller()

@controller.action('Ask user for information')
def ask_human(question: str) -> str:
    answer = input(f'\n{question}\nInput: ')
    return ActionResult(extracted_content=answer)

# ... then pass controller to the agent
agent = Agent(
    task=task,
    llm=llm,
    controller=controller
)

Self-correcting

The code for this mechanism is quite complex to fully detail in a blog post, but here’s the basic execution flow:
- The Agent attempts a step (e.g., clicking a button), and the step fails.
- The system will:
  - Log the error.
  - Mark the step status as “failed/exception”.
  - Mark the flow status as “paused/needs fix”.
- The LLM is called to re-read the step’s goal. It compares the requirement with the current DOM/screenshot and infers the cause of the error. Based on this, it proposes a new version of the step (e.g., finding an alternative button).
- If a new step is proposed, the Agent retries with the new step:
  - If successful, the flow continues.
  - If it fails again, the system loops back to propose another new solution.
- The maximum number of retry attempts can be configured using max_failure

agent = Agent(
    task="...",
    ...,
    max_failures = 3
)

Any LLM Support

Most LLMs available through Langchain are supported.
The syntax for declaring and using them is straightforward.

from langchain_openai import ChatOpenAI

agent = Agent(
    task="Search for latest news about AI",
    llm=ChatOpenAI(model="gpt-4o"),
)

4. Security Considerations

Only use trusted Agents: Be cautious about the source and capabilities of the Agents you employ.

Hide sensitive information in logs and steps:

Store sensitive data (like passwords or API keys) in environment variables.
Pass this data to the sensitive_data field of the Agent, which BrowserUse will then recognize and mask in logs.

sensitive_data = {
    'https://*.example.com': {
        'x_password': os.getenv("X_PASSWORD"),
    },
}
agent = Agent(
        task="Login by the password x_password"
        llm=llm,
        browser=browser,
        controller=controller,
        sensitive_data=sensitive_data
)

Encrypt API KEYs: Do not hardcode API keys or push them to remote repositories.

Restricted URLs: Limit and only perform tasks on trusted websites. BrowserUse allows agents to interact with web interfaces with the same permissions as a user, potentially exposing important user-agent information. (Refer to the BrowserUse documentation for details on how to declare restricted URLs).

Turn off vision mode and Telemetry during test execution to prevent data leaks:

Vision mode (use_vision=True) sends images to the AI Agent, which can improve logic verification. However, this carries a risk of data leakage. Always set use_vision=False when running the framework for your project’s production assets if data sensitivity is a concern.
Telemetry allows BrowserUse to collect anonymous usage data. While this helps improve the library, it might not be desirable in all security contexts. Ensure ANONYMIZED_TELEMETRY=false is set in your .env file to disable it.

5. Scalability and Operational Costs

5.1. Scalability

BrowserUse can run multiple independent Agents simultaneously to perform tasks in parallel. For stable performance, it’s generally recommended to run a maximum of around 10 Agents concurrently.
Like other Chromium-based libraries, scaling up the number of agents increases CPU, RAM, and potentially disk space usage (especially if simulating real users for scraping tasks). A typical computer might be able to handle around 20 concurrent Chromium browser instances for automation.
Token Consumption:
- OpenAI’s GPT-4o (Tier 1) allows approximately 200,000 Tokens Per Minute (TPM).
- Gemini Flash models (Tier 1) offer higher limits, ranging from 1,000,000 TPM (Gemini 1.5 Flash) up to potentially more with newer versions.
- On average, BrowserUse might require each Agent to consume around 20,000 TPM.
- Therefore, depending on the AI Agent model used, running 10-20 concurrent browser sessions is generally feasible from a token perspective.

Practical experience with parallel execution suggests that system performance can slow down and become unstable with more than 10 Agent sessions.
Consider implementing caching mechanisms if you can control or predict the rate of change on the websites you interact with.

5.2. Token Costs

For up-to-date pricing on LLM tokens, you can refer to resources like: https://llm-price.com/

6. Some Feasible Ideas for Using BrowserUse

Performing exploratory testing, smoke testing, module-based testing, and risk-based testing.
Simulating User Acceptance Testing (UAT) scenarios for clients.
Web scraping and data migration tasks for clients.
Automating report generation from web-based dashboards.
Continuously monitoring specific functionalities or reports on websites.
Integrate with n8n to create automated business flows easily

Bonus- Explore more interesting things with BrowserUse in:

BrowserUse Cloud: cloud.browser-use.com
Awesome project with BrowserUse: browser-use/awesome-projects: List of Open Source projects built on Browser Use
Awesome prompts: browser-use/awesome-prompts: Table of awesome Browser Use prompts
Vibe Testing: browser-use/vibetest-use
Workflow Use: https://github.com/browser-use/workflow-use – Record once, reuse forever.

Đại Phạm Ngọc

I am an automation test engineer with 9 years of experience in the software testing field across various platforms. I have extensive experience in applying testing development techniques such as BDD and TDD. I have previously spent more than 3 years managing teams in different areas of testing. Currently, I am responsible for teaching automation test-related content.

Đại Phạm Ngọc

Table of Contents

Đại Phạm Ngọc

NashTech

Solutions

Useful links

Connect with us

Our achievements

BrowserUse – Web Automation Library for AI Agents

Đại Phạm Ngọc

Table of Contents

1. What is BrowserUse?

2. Easiest Way to Set Up BrowserUse (BrowserUse – Gemini LangChain – Python)

3. Strengths of BrowserUse

4. Security Considerations

5. Scalability and Operational Costs

5.1. Scalability

5.2. Token Costs

6. Some Feasible Ideas for Using BrowserUse

Đại Phạm Ngọc

Suggested Article

NashTech

Solutions

Useful links

Connect with us

Our achievements