1. What is BrowserUse?
BrowserUse is an open-source Python library that enables AI Agents to interact with web browsers using Playwright commands. Instead of maintaining complex and brittle selectors/XPath expressions to find and interact with elements, you can simply describe the desired actions in natural language, and let the AI handle the element finding.
This allows automation QA engineers to focus on logic and business requirements rather than struggling with selectors every time the website is updated.
2. Easiest Way to Set Up BrowserUse (BrowserUse – Gemini LangChain – Python)
Install Python and uv (a fast Python package installer and resolver).
Set up a Python environment and activate it:
uv venv --python 3.11
# For Mac/Linux:
source .venv/bin/activate
# For Windows:
.venv\Scripts\activate
Install the required dependencies:
uv pip install browser-use
uv run playwright install
- Note:
playwright install(formerly referred to aspatchrightin some contexts) is a command typically run once during setup to:- Download Playwright browser drivers.
- Modify certain sensitive parts of Chromium to reduce the likelihood of being detected as an automation bot.
Set up LLM API Keys in a .env file:
- Create a file named
.envin your project root and add your API key:
GOOGLE_API_KEY=YOUR_GOOGLE_API_KEY_HERE #Replace YOUR_GOOGLE_API_KEY_HERE with your actual Google API Key.
ANONYMIZED_TELEMETRY=false
BROWSER_USE_LOGGING_LEVEL=debug
Set up a basic Python program for an agent: Create a Python file (e.g., agent.py):
from langchain_google_genai import ChatGoogleGenerativeAI
from browser_use import Agent
from dotenv import load_dotenv
import asyncio
import os
# Load environment variables from .env file
load_dotenv()
# Initialize the LLM
# Ensure your GOOGLE_API_KEY is set in your environment or .env file
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash-latest") # Using a generally available model
async def main():
agent = Agent(
task="Open https://katalon.com/ and get the pricing info",
llm=llm,
)
result = await agent.run()
print(result.final_result())
if __name__ == "__main__":
asyncio.run(main())
Run the Python program: The program will then execute the task, and the agent’s findings will be printed to the console.
uv run python agent.py
Results:

3. Strengths of BrowserUse
Vision & HTML Extraction
- Enabled by default via the
Agent(use_vision=True)parameter. This allows sending images and a summarized JSON structure of the HTML to the AI Agent for analysis. - This helps the AI Agent comprehensively understand the website’s context and make more accurate decisions for actions, even when dealing with dynamic content or JavaScript-driven elements.
Multi-tab Management
- BrowserUse allows performing tasks concurrently with multiple Agents.
- Each Agent can interact with different browser types and multiple windows simultaneously.
userAgent = Agent(
task="...",
llm=llm,
browser=browser,
)
adminAgent = Agent(
task="...",
llm=llm,
browser=browser,
)
Element Tracking
- Saves XPath maps and accurately re-executes LLM actions for synchronous flows.
- Why are tokens still consumed heavily even if XPaths are saved?

- The reason is:
- Unlike some tools (e.g., “Shortest”), BrowserUse’s mechanism sends a request to the LLM at each step. This is to verify the result of the previous action and determine the appropriateness of the next action. It also revalidates selectors in case the old XPath is no longer active.
- Many complex tasks require logical reasoning, access to execution history, and detailed prompt samples, which can consume a significant number of tokens. Examples include selecting the item with the lowest price or logging in with a username and password defined on the login screen.
- This thoroughness is also a strength of BrowserUse, ensuring robustness (Cost implications are discussed later).
Custom Actions
- You can customize existing functions or add new ones, such as saving data to files or handling API interactions.
from browser_use import Controller, ActionResult
controller = Controller()
@controller.action('Ask user for information')
def ask_human(question: str) -> str:
answer = input(f'\n{question}\nInput: ')
return ActionResult(extracted_content=answer)
# ... then pass controller to the agent
agent = Agent(
task=task,
llm=llm,
controller=controller
)
Self-correcting
- The code for this mechanism is quite complex to fully detail in a blog post, but here’s the basic execution flow:
- The Agent attempts a step (e.g., clicking a button), and the step fails.
- The system will:
- Log the error.
- Mark the step status as “failed/exception”.
- Mark the flow status as “paused/needs fix”.
- The LLM is called to re-read the step’s goal. It compares the requirement with the current DOM/screenshot and infers the cause of the error. Based on this, it proposes a new version of the step (e.g., finding an alternative button).
- If a new step is proposed, the Agent retries with the new step:
- If successful, the flow continues.
- If it fails again, the system loops back to propose another new solution.
- The maximum number of retry attempts can be configured using
max_failure
agent = Agent(
task="...",
...,
max_failures = 3
)
Any LLM Support
- Most LLMs available through Langchain are supported.
- The syntax for declaring and using them is straightforward.
from langchain_openai import ChatOpenAI
agent = Agent(
task="Search for latest news about AI",
llm=ChatOpenAI(model="gpt-4o"),
)
4. Security Considerations
Only use trusted Agents: Be cautious about the source and capabilities of the Agents you employ.
Hide sensitive information in logs and steps:
- Store sensitive data (like passwords or API keys) in environment variables.
- Pass this data to the
sensitive_datafield of the Agent, which BrowserUse will then recognize and mask in logs.
sensitive_data = {
'https://*.example.com': {
'x_password': os.getenv("X_PASSWORD"),
},
}
agent = Agent(
task="Login by the password x_password"
llm=llm,
browser=browser,
controller=controller,
sensitive_data=sensitive_data
)
Encrypt API KEYs: Do not hardcode API keys or push them to remote repositories.
Restricted URLs: Limit and only perform tasks on trusted websites. BrowserUse allows agents to interact with web interfaces with the same permissions as a user, potentially exposing important user-agent information. (Refer to the BrowserUse documentation for details on how to declare restricted URLs).
Turn off vision mode and Telemetry during test execution to prevent data leaks:
- Vision mode (
use_vision=True) sends images to the AI Agent, which can improve logic verification. However, this carries a risk of data leakage. Always setuse_vision=Falsewhen running the framework for your project’s production assets if data sensitivity is a concern. - Telemetry allows BrowserUse to collect anonymous usage data. While this helps improve the library, it might not be desirable in all security contexts. Ensure
ANONYMIZED_TELEMETRY=falseis set in your.envfile to disable it.
5. Scalability and Operational Costs
5.1. Scalability
- BrowserUse can run multiple independent Agents simultaneously to perform tasks in parallel. For stable performance, it’s generally recommended to run a maximum of around 10 Agents concurrently.
- Like other Chromium-based libraries, scaling up the number of agents increases CPU, RAM, and potentially disk space usage (especially if simulating real users for scraping tasks). A typical computer might be able to handle around 20 concurrent Chromium browser instances for automation.
- Token Consumption:
- OpenAI’s GPT-4o (Tier 1) allows approximately 200,000 Tokens Per Minute (TPM).
- Gemini Flash models (Tier 1) offer higher limits, ranging from 1,000,000 TPM (Gemini 1.5 Flash) up to potentially more with newer versions.
- On average, BrowserUse might require each Agent to consume around 20,000 TPM.
- Therefore, depending on the AI Agent model used, running 10-20 concurrent browser sessions is generally feasible from a token perspective.
- Practical experience with parallel execution suggests that system performance can slow down and become unstable with more than 10 Agent sessions.
- Consider implementing caching mechanisms if you can control or predict the rate of change on the websites you interact with.
5.2. Token Costs
- For up-to-date pricing on LLM tokens, you can refer to resources like: https://llm-price.com/
6. Some Feasible Ideas for Using BrowserUse
- Performing exploratory testing, smoke testing, module-based testing, and risk-based testing.
- Simulating User Acceptance Testing (UAT) scenarios for clients.
- Web scraping and data migration tasks for clients.
- Automating report generation from web-based dashboards.
- Continuously monitoring specific functionalities or reports on websites.
- Integrate with n8n to create automated business flows easily
Bonus- Explore more interesting things with BrowserUse in:
- BrowserUse Cloud: cloud.browser-use.com
- Awesome project with BrowserUse: browser-use/awesome-projects: List of Open Source projects built on Browser Use
- Awesome prompts: browser-use/awesome-prompts: Table of awesome Browser Use prompts
- Vibe Testing: browser-use/vibetest-use
- Workflow Use: https://github.com/browser-use/workflow-use – Record once, reuse forever.