# Tired of Manual Testing? Web-Eval-Agent Automates Web App Debugging with AI

## Tired of Manual Testing? Web-Eval-Agent Automates Web App Debugging with AI

For web app developers embracing AI-assisted coding, a new tool promises to streamline the often-tedious testing phase. Web-Eval-Agent, recently showcased on Hacker News, aims to automate the process of verifying that changes made by AI coding agents within an IDE actually work as intended.

The developers behind Web-Eval-Agent realized that while AI could drastically improve coding speed, manually testing the results remained a significant bottleneck. The repetitive cycle of opening the app, clicking through workflows, analyzing network tabs, and copying console errors back into the editor became a time-consuming burden.

Existing solutions like Cline and Windsurf, while offering browser integrations, were deemed either too slow or unreliable. This led to the creation of Web-Eval-Agent, a solution designed to bridge the gap between AI code generation and automated validation.

**How it Works:**

Web-Eval-Agent operates as an MCP (Meta-Control Protocol) server that sits between your IDE agent (such as Cursor, Windsurf, Cline, or Continue) and a Playwright-powered browser agent. It functions as follows:

1. **Browser Initialization:** The server spins up a browser instance and navigates to your application based on instructions from the IDE agent.
2. **Data Collection:** It monitors the browser’s activity, sending back steps taken, console events, and network events to the IDE agent. This provides a comprehensive view of the app’s state during testing.
3. **Performance Optimization:** The agent proxies Browser-use’s original Claude calls and swaps them with Gemini Flash 2.0, significantly reducing latency (from approximately 8 seconds to 3 seconds per step).
4. **Log Management:** To avoid overwhelming context limits, the system caps console and network logs at 10,000 characters and filters out irrelevant information, such as noisy XHR requests.
5. **Evaluation Report:** Finally, the browser agent generates a summary report detailing the outcome of the test, including steps taken, success status, and any issues encountered in the UX, console, or network activity. A sample report looks like this:

“`
Web Evaluation Report for http://localhost:5173
Task: delete an API key and evaluate UX
Steps: Home → Login → API Keys → Create Key → Delete Key
Flow tested successfully; UX had problems X, Y, Z…
Console (8)… Network (13)… Timeline of events (57) …
“`

This detailed feedback allows the coding agent to identify and rectify console errors, network issues, or problems with user interaction before presenting the final result to the user. The tool is intended to enable coding agents to “vibe-test” their own code, as easily as they “vibe-code” it.

**Getting Started:**

Web-Eval-Agent is currently compatible with macOS and Linux and can be easily installed using a provided shell script. (A manual installation guide is available in the README for Windows users.) After installation, developers can craft prompts within their IDE (Cursor, Cline, Windsurf, or Continue) utilizing the `web_eval_agent` tool.

**Future Developments:**

The developers have outlined several planned features for Web-Eval-Agent, including:

* Pause/go functionality for OAuth screens.
* Saving and loading browser authentication states.
* Playwright step recording for automated test creation and regression testing.
* Support for Loveable, v0, and Bolt.new sites through a web version.

**Pricing and Open Source:**

The MCP server itself is open-source, allowing for community contributions and customization. The tool utilizes a proxy server to cover Gemini tokens and offers a free tier for users. Heavy users can opt for a $10 plan to help offset the model billing costs.

Web-Eval-Agent presents a promising solution for developers seeking to streamline the testing process in AI-assisted web development. By automating the validation of AI-generated code, it has the potential to significantly reduce development time and improve the reliability of web applications. The developers encourage users to provide feedback and contribute to the project’s ongoing development. You can find the project on GitHub at [https://github.com/Operative-Sh/web-eval-agent](https://github.com/Operative-Sh/web-eval-agent).

Yorumlar

Bir yanıt yazın

E-posta adresiniz yayınlanmayacak. Gerekli alanlar * ile işaretlenmişlerdir