We set out to see how close a generic LLM workflow could get to a purpose-built AI test creation experience. Specifically, we tried using Claude with Playwright and Claude with Selenium to recreate the workflow of the Endtest AI Test Creation Agent.

The short version: Claude can generate useful browser automation code, but the workflow quickly becomes a loop of prompting, repairing selectors, adding waits, restructuring tests, debugging browser state, and maintaining framework glue. That is very different from using an AI agent built inside a mature test automation platform.

Lab notes

Experiment type: technical recreation, not a formal benchmark
Tools compared: Claude plus Playwright, Claude plus Selenium, and Endtest AI Test Creation Agent
Primary question: can a generic LLM plus an open-source automation framework reproduce the speed and reliability of a specialized AI test creation workflow?
Audience: SDETs, QA automation engineers, developers, CTOs evaluating AI generated test automation
Target workflow: describe a user scenario in plain English, get a working end-to-end test that can be executed, reviewed, edited, and maintained

This article is not an argument against Playwright or Selenium. Both are serious tools. Playwright has excellent browser automation ergonomics, auto-waiting, tracing, and modern test runner features. Selenium remains a widely adopted implementation of the WebDriver standard with broad language support and a long history in enterprise automation. If your team has strong automation engineers and wants full code ownership, both can be the right choice.

The experiment here is narrower: can Claude plus Playwright or Claude plus Selenium feel like the Endtest AI Test Creation Agent, where you describe a test and the system generates editable, platform-native test steps with assertions and stable locators, ready to run in the Endtest cloud?

Our conclusion is that the gap is not mainly about whether Claude can write code. It can. The gap is everything around the code.

AI-generated browser automation code is useful. A maintainable test asset is more than code.

What we tried to recreate

The ideal AI test creation workflow looks deceptively simple:

  1. Provide a scenario in plain English.
  2. Let the agent inspect or reason about the target application.
  3. Generate a runnable end-to-end test.
  4. Include assertions, not just clicks and typing.
  5. Use stable locators.
  6. Preserve browser state where needed.
  7. Produce something a human tester can inspect and edit.
  8. Run it reliably without building a framework from scratch.
  9. Make failures diagnosable through screenshots, logs, traces, and reports.
  10. Keep the test maintainable as the UI changes.

The important part is that a real AI test creation agent is not just a text-to-code wrapper. It needs to know how tests are represented, executed, retried, debugged, shared, and maintained.

That is where Endtest has an architectural advantage. Endtest is an agentic AI, low-code/no-code test automation platform. Its AI Test Creation Agent creates standard editable Endtest steps inside the platform. The generated test is not a blob of Playwright, Selenium, JavaScript, Python, or TypeScript source code that you now need to own. It lands in the same environment where execution, editing, reporting, scheduling, cloud browsers, and maintenance features already exist.

To understand the difference, we attempted to recreate the workflow with two common stacks:

We used intentionally realistic prompts, the kind an SDET or QA engineer might type while evaluating Claude, Playwright, Selenium, and AI test creation for a product team.

The sample scenario

For the experiment, we used a generic SaaS onboarding flow. The specific app does not matter as much as the test characteristics.

The scenario:

text Create an end-to-end test for a web application. Open the login page, sign in as a test user, verify the dashboard loads, create a new project, invite a team member, verify the invitation appears in the members table, then log out.

This is a useful AI test creation scenario because it includes several common problems:

  • Authentication
  • Multi-step navigation
  • Dynamic data
  • Form inputs
  • A table assertion
  • Possible email or invite side effects
  • Session cleanup
  • Locators across multiple pages
  • Timing issues after form submission

It is not a trivial “click a button and check text” example, but it is also not an exotic workflow.

Attempt 1: Claude with Playwright

Playwright is often the first choice for AI generated test automation experiments because the API is concise and readable. Claude tends to generate Playwright tests that look plausible, especially in TypeScript or JavaScript.

A first prompt might be:

Write a Playwright test in TypeScript for this scenario:
- Go to https://app.example.test/login
- Log in with QA_EMAIL and QA_PASSWORD from environment variables
- Verify the dashboard loads
- Create a project with a unique name
- Invite qa-invitee@example.test
- Verify the invitee appears in the members table
- Log out
Use stable locators and assertions.

A reasonable Claude-generated test often resembles this:

import { test, expect } from '@playwright/test';
test('user can create project and invite team member', async ({ page }) => {
  const projectName = `QA Project ${Date.now()}`;

await page.goto(‘/login’); await page.getByLabel(‘Email’).fill(process.env.QA_EMAIL!); await page.getByLabel(‘Password’).fill(process.env.QA_PASSWORD!); await page.getByRole(‘button’, { name: ‘Sign in’ }).click();

await expect(page.getByRole(‘heading’, { name: ‘Dashboard’ })).toBeVisible();

await page.getByRole(‘button’, { name: ‘New project’ }).click(); await page.getByLabel(‘Project name’).fill(projectName); await page.getByRole(‘button’, { name: ‘Create project’ }).click();

await expect(page.getByRole(‘heading’, { name: projectName })).toBeVisible();

await page.getByRole(‘link’, { name: ‘Members’ }).click(); await page.getByRole(‘button’, { name: ‘Invite member’ }).click(); await page.getByLabel(‘Email address’).fill(‘qa-invitee@example.test’); await page.getByRole(‘button’, { name: ‘Send invite’ }).click();

await expect(page.getByRole(‘row’, { name: /qa-invitee@example.test/i })).toBeVisible();

await page.getByRole(‘button’, { name: /account/i }).click(); await page.getByRole(‘menuitem’, { name: ‘Log out’ }).click(); await expect(page).toHaveURL(/login/); });

This is not bad. In fact, it is a useful starting point. The selectors are mostly accessibility-based, the test uses environment variables, and the assertions are understandable.

But the moment we move from generated text to a real repository, the hidden work begins.

The setup work Claude does not eliminate

Before this test runs, someone still needs to create or maintain:

  • playwright.config.ts
  • Environment variable loading
  • Base URL configuration
  • Browser project configuration
  • Test data strategy
  • Authentication state reuse, if desired
  • CI pipeline integration
  • HTML reports, traces, screenshots, and videos
  • Retry policy
  • Test isolation rules
  • Cleanup logic for created projects and invites

A minimal Playwright setup might look like this:

import { defineConfig, devices } from '@playwright/test';

export default defineConfig({ testDir: ‘./tests’, retries: process.env.CI ? 2 : 0, use: { baseURL: process.env.BASE_URL || ‘https://app.example.test’, trace: ‘retain-on-failure’, screenshot: ‘only-on-failure’, video: ‘retain-on-failure’ }, projects: [ { name: ‘chromium’, use: { …devices[‘Desktop Chrome’] } } ] });

This is normal engineering work, but it is not the same thing as describing a scenario and receiving a ready-to-run test inside a complete testing platform.

Claude can help write the config, but then you are prompting for the config. If the test fails, you prompt for debugging. If CI needs artifacts, you prompt again. If authentication should be stored, you prompt again. The AI becomes a coding assistant, not a test creation agent.

The selector problem

Playwright’s locator model is excellent, and the official Playwright documentation encourages resilient locators such as roles, labels, text, and test IDs. Claude will often use these if prompted.

The problem is that Claude does not automatically know your actual DOM, accessibility tree, component behavior, or design system conventions unless you provide them. It may guess labels that are close but wrong:

typescript

await page.getByLabel('Email address').fill('qa-invitee@example.test');

But the real app might use:

```html
<input name="inviteEmail" placeholder="name@company.com" aria-label="Invitee email" />

Then the test fails. You inspect the DOM, copy markup into Claude, ask for a fix, rerun the test, and repeat.

A more robust locator might become:

typescript
```typescript
await page.getByRole('textbox', { name: /invitee email/i }).fill('qa-invitee@example.test');

Or, if the app has test IDs:

typescript

await page.getByTestId('invite-member-email').fill('qa-invitee@example.test');

That works, but notice who is doing the real work: the engineer. Claude is accelerating syntax generation, but the human still validates the UI model, locator strategy, and failure modes.

The wait problem

Playwright’s auto-waiting reduces many classic Selenium timing issues, but it does not remove application-level uncertainty. For example, after sending an invite, the table may update only after an API response, a websocket event, or a background job.

Claude might generate:

typescript

await page.getByRole('button', { name: 'Send invite' }).click();
await expect(page.getByRole('row', { name: /qa-invitee@example\.test/i })).toBeVisible();

If the UI shows a toast first and updates the table later, this might be fine. If the invite appears after a refetch that occasionally takes longer, the test becomes flaky. You may need a better assertion sequence:

typescript

await page.getByRole('button', { name: 'Send invite' }).click();
await expect(page.getByText(/invitation sent/i)).toBeVisible();

const membersTable = page.getByRole(‘table’, { name: /members/i });

await expect(membersTable.getByRole('row', { name: /qa-invitee@example\.test/i })).toBeVisible({
  timeout: 15000
});

That is good Playwright code, but it emerges from knowing how the product behaves. Claude can revise the test after you explain the failure, but the trial-and-error loop remains.

Attempt 2: Claude with Selenium

Selenium is a different experience. The API is more verbose, waits are usually more explicit, and framework structure matters more. The official Selenium documentation is thorough, but there are many ways to structure a suite, especially across Java, Python, C#, and JavaScript.

A prompt for Selenium might be:

Write a Selenium Python test for the same scenario.
Use pytest, explicit waits, environment variables, and stable selectors.

A plausible generated test:

import os
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def test_create_project_and_invite_member(driver): wait = WebDriverWait(driver, 15) project_name = f”QA Project {int(time.time())}”

driver.get(os.getenv("BASE_URL", "https://app.example.test") + "/login")

wait.until(EC.visibility_of_element_located((By.NAME, "email"))).send_keys(os.environ["QA_EMAIL"])
driver.find_element(By.NAME, "password").send_keys(os.environ["QA_PASSWORD"])
driver.find_element(By.XPATH, "//button[normalize-space()='Sign in']").click()

wait.until(EC.visibility_of_element_located((By.XPATH, "//h1[normalize-space()='Dashboard']")))

driver.find_element(By.XPATH, "//button[normalize-space()='New project']").click()
wait.until(EC.visibility_of_element_located((By.NAME, "projectName"))).send_keys(project_name)
driver.find_element(By.XPATH, "//button[normalize-space()='Create project']").click()

wait.until(EC.visibility_of_element_located((By.XPATH, f"//h1[normalize-space()='{project_name}']")))

driver.find_element(By.LINK_TEXT, "Members").click()
wait.until(EC.element_to_be_clickable((By.XPATH, "//button[normalize-space()='Invite member']"))).click()
wait.until(EC.visibility_of_element_located((By.NAME, "inviteEmail"))).send_keys("qa-invitee@example.test")
driver.find_element(By.XPATH, "//button[normalize-space()='Send invite']").click()

wait.until(EC.visibility_of_element_located((By.XPATH, "//tr[contains(., 'qa-invitee@example.test')]")))

Again, this is a useful draft. But Selenium exposes the AI assistant problem even more clearly.

The missing fixture layer

The test assumes a driver fixture exists. Claude might generate one if asked:

import pytest
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

@pytest.fixture def driver(): options = Options() options.add_argument(“–headless=new”) browser = webdriver.Chrome(options=options) browser.set_window_size(1440, 1000) yield browser browser.quit()

Now you have the beginning of a framework. But real teams soon need:

  • Remote browser execution
  • Grid or cloud provider configuration
  • Download handling
  • Screenshot capture on failure
  • Logging
  • Page objects or screen objects
  • Test data cleanup
  • Parallel execution safety
  • Browser capabilities per environment
  • CI secrets management

Claude can generate all of those pieces, but they arrive as code fragments. Someone has to design the architecture and own the maintenance.

The XPath drift problem

Claude often falls back to XPath in Selenium because it is expressive and widely supported. Some XPath is perfectly acceptable. But AI generated Selenium tests often accumulate brittle selectors like:

driver.find_element(By.XPATH, "//div[@class='modal']//button[2]").click()

or:

python wait.until(EC.element_to_be_clickable((By.XPATH, “//*[@id=’root’]/div/main/div[2]/div/button”))).click()

These are maintenance traps. They might pass today, then fail after a harmless layout refactor.

An experienced SDET would push the app toward stable attributes:

```html
<button data-testid="create-project-submit">Create project</button>

Then use:

```python
driver.find_element(By.CSS_SELECTOR, "[data-testid='create-project-submit']").click()

But this is not Claude solving test creation. This is the engineering organization adopting testability conventions, then asking Claude to produce code that follows them.

The prompt tax

The most underestimated cost in Claude, Playwright, and Selenium AI test creation is the prompt tax.

A generic LLM needs context. For each meaningful test, you may need to provide:

  • The scenario
  • The framework preference
  • The project structure
  • The app’s locator conventions
  • Existing helper functions
  • Authentication strategy
  • Test data rules
  • Assertion style
  • Retry policy
  • CI constraints
  • Known flaky areas
  • Browser compatibility requirements
  • How to capture diagnostics
  • What not to do

A more realistic prompt quickly becomes long:

Generate a Playwright TypeScript test using our existing fixtures.
Use test.step for readability.
Use data-testid when available, otherwise use role locators.
Do not use arbitrary timeouts or waitForTimeout.
Use createUniqueProjectName from ../helpers/data.
Use authenticatedPage fixture instead of logging in through the UI.
After creating the project, clean it up through the API helper.
Use expect.poll only for async backend updates.
Follow our naming convention: feature_area.spec.ts.
Return only the test body.

This can work well for a developer who already knows the automation stack. But for a QA manager or product team member who wants to describe behavior and get coverage, it is too much machinery.

This is where Endtest’s AI Test Creation Agent is meaningfully different. Endtest is an agentic AI, low-code/no-code test automation platform. The user describes behavior, and the agent generates editable Endtest steps inside a platform built for running and maintaining tests. The person creating the test does not have to decide whether expect.poll, WebDriverWait, a page object, a fixture, a trace setting, or a CSS selector is the right implementation detail.

The more context a prompt needs, the closer you are to coding assistance and the farther you are from productized test creation.

Debugging: where the generic LLM loop slows down

When a generated Playwright or Selenium test fails, the next step is not simply “ask Claude again.” You need to collect useful evidence.

For Playwright, that may mean:

bash npx playwright test tests/project-invite.spec.ts –headed –trace on

Then you inspect the trace:

bash npx playwright show-trace test-results/project-invite/trace.zip

For Selenium, you might add screenshot capture in pytest:

import pytest

@pytest.hookimpl(hookwrapper=True) def pytest_runtest_makereport(item, call): outcome = yield report = outcome.get_result() if report.when == “call” and report.failed: driver = item.funcargs.get(“driver”) if driver: driver.save_screenshot(f”artifacts/{item.name}.png”)

These are good practices. They are also signs that you are now building a test automation platform around the AI output.

The actual loop looks like this:

  1. Generate test.
  2. Run test.
  3. Failure occurs.
  4. Open trace, screenshot, logs, DOM snapshot, or CI artifact.
  5. Infer whether the issue is selector, timing, state, data, environment, or app bug.
  6. Prompt Claude with the failure details.
  7. Apply patch.
  8. Rerun.
  9. Repeat until stable.

For an SDET, this loop is familiar. For a team expecting an AI test creation agent, it is slower than it looks in demos.

Browser state and authentication are not side details

Many AI generated tests start with UI login because it is easy to describe. But mature suites often avoid logging in through the UI for every test. In Playwright, you might store authenticated state:

import { test as setup, expect } from '@playwright/test';

setup(‘authenticate’, async ({ page }) => { await page.goto(‘/login’); await page.getByLabel(‘Email’).fill(process.env.QA_EMAIL!); await page.getByLabel(‘Password’).fill(process.env.QA_PASSWORD!); await page.getByRole(‘button’, { name: ‘Sign in’ }).click(); await expect(page.getByRole(‘heading’, { name: ‘Dashboard’ })).toBeVisible(); await page.context().storageState({ path: ‘playwright/.auth/user.json’ }); });

Then configure tests to reuse it:

use: {
  storageState: 'playwright/.auth/user.json'
}

This improves speed and reliability, but it introduces lifecycle questions:

  • When does the auth state expire?
  • Is the user shared across parallel tests?
  • What happens if a test mutates account settings?
  • Should setup run per environment?
  • How are secrets stored in CI?

Claude can suggest a pattern. It cannot decide your product’s isolation model without context.

The same issue appears in Selenium, often with cookies or API-based login helpers. The test code is only the visible tip of a larger reliability design.

Reports, retries, and maintenance are part of the product

A generated test is valuable only if the team can operate it. That means reports, failure diagnostics, ownership, and maintenance workflows.

With Playwright, you can configure reports:

reporter: process.env.CI
  ? [['html'], ['junit', { outputFile: 'results/junit.xml' }]]
  : [['list'], ['html']]

With pytest and Selenium, you may add plugins, JUnit XML, screenshots, and CI artifact upload:

pytest tests/ --junitxml=results/junit.xml

Again, these are solvable problems. But they are not solved by the first AI generated test.

This distinction matters commercially. If you are evaluating AI generated test automation, do not compare only the first generated script. Compare the complete path from scenario to maintainable test asset.

Questions to ask:

  • Can a non-developer review and edit the generated test?
  • Does the output live in a shared test management surface?
  • Are locators generated and maintained in a test-aware way?
  • Are reports available without custom CI work?
  • Does the platform help when UI changes break selectors?
  • Can the team schedule and run tests across browsers without managing infrastructure?
  • Can the same platform support related validation needs such as cross-browser testing, accessibility testing, or Visual AI?

This is where Endtest is a stronger fit for teams that want AI test creation, not just AI-assisted coding. Endtest is an agentic AI, low-code/no-code test automation platform. Its platform capabilities also include no-code testing, self-healing tests, cross-browser testing, accessibility testing, and Visual AI.

What Claude did well

The experiment was not a failure for Claude. It did several things well:

1. Fast first drafts

Claude can quickly produce a plausible Playwright or Selenium test. For experienced automation engineers, this is useful. It reduces blank-page time.

2. Syntax translation

If you have a Selenium Java example and want a Playwright TypeScript equivalent, Claude can often produce a reasonable conversion. The same applies for refactoring raw Selenium into page objects or moving repeated Playwright flows into helpers.

3. Explaining failures

If you provide an error message, relevant code, and DOM snippet, Claude can often identify likely causes. For example, it may explain that an element is inside an iframe, hidden behind a dialog, or matched by multiple locators.

4. Generating framework glue

Claude can write fixtures, helper functions, cleanup hooks, and CI YAML. This is useful for SDETs who already understand the desired architecture.

5. Suggesting better locators

When given HTML, Claude can suggest more resilient locators. It is especially helpful when converting brittle XPath into role-based Playwright locators or CSS selectors using stable attributes.

Where Claude struggled as an AI test creation agent

The problems were not about grammar or code formatting. They were about productized test creation.

1. It guessed application details

Without live inspection or enough DOM context, Claude guessed labels, routes, roles, table structures, and messages. Some guesses were close. Close is still failing automation.

2. It produced code, not a maintained test asset

A source file is not automatically a usable QA workflow. Someone has to put it in the right folder, connect it to fixtures, configure execution, and maintain conventions.

3. It needed repeated prompting

Every missing detail became another prompt. The interaction felt less like “create a test” and more like pair programming with a fast but context-limited assistant.

4. It overfit to the visible prompt

If the prompt said “use stable locators,” Claude tried. If the prompt forgot to mention cleanup, cleanup was often absent. If the prompt omitted CI reports, reports were not handled.

5. It did not own reliability

Retries, screenshots, traces, self-healing behavior, visual assertions, and reporting were separate tasks. The AI could generate pieces, but the user had to assemble and operate them.

The key architectural difference: code generation vs test generation

The clearest lesson from this experiment is that “AI writes browser automation code” and “AI creates a test” are not the same capability.

A Playwright or Selenium script is code. It needs a repository, runtime, dependencies, CI, reporting, maintenance patterns, and skilled ownership.

An Endtest AI Test Creation Agent test is an editable test inside an agentic AI, low-code/no-code test automation platform. The AI output becomes regular Endtest steps that can be inspected, changed, executed, and managed by the team. That is a different abstraction level.

This matters for mixed teams. Developers may be comfortable reviewing TypeScript or Python. Manual QA testers, support engineers, product managers, and founders may not be. Editable test steps lower the collaboration barrier while still producing executable coverage.

It also matters for maintenance. If a generated Selenium selector fails, you debug code. If a generated Playwright locator is wrong, you patch code. In a platform-first workflow, locator management, run history, screenshots, and test editing live together.

The real comparison is not Claude versus Endtest. It is code generation versus a complete test creation and execution workflow.

A practical comparison matrix

Evaluation area Claude + Playwright Claude + Selenium Endtest AI Test Creation Agent
First draft speed Fast Moderate Fast
Output format TypeScript or JavaScript code Python, Java, C#, or JavaScript code Editable Endtest steps
Framework setup Required Required Built into platform
Locator accuracy Depends on prompt and DOM context Depends on prompt and DOM context Purpose-built test creation workflow
Wait strategy Strong framework support, still needs app knowledge Explicit and verbose Platform-managed test workflow
Debugging Traces, screenshots, logs, configured by team Screenshots and logs, configured by team Built into platform workflow
Non-developer editing Limited Limited Stronger, step-based editing
Maintenance burden Engineering-owned Engineering-owned Reduced by platform abstraction
Best fit Code-first SDET teams Existing Selenium-heavy teams Teams wanting faster AI-assisted test creation with less framework work

This table is intentionally qualitative. Your exact experience will depend on your app, testability, team skill, and tooling constraints. But the pattern is consistent: generic LLMs help write automation code, while a specialized platform changes the workflow.

When Claude plus Playwright or Selenium is still a good choice

There are valid reasons to choose the code-first route.

Claude plus Playwright can be a strong fit if:

  • Your team already has a Playwright framework.
  • Developers own most end-to-end tests.
  • You want tests reviewed like application code.
  • You need custom fixtures, API setup, or deep integration with internal tooling.
  • You are comfortable maintaining CI, reports, and browser infrastructure.

Claude plus Selenium can be a strong fit if:

  • You already have a large Selenium suite.
  • Your organization standardizes on Java, Python, or C#.
  • You rely on existing WebDriver infrastructure.
  • You need compatibility with established enterprise tooling.
  • Your team has mature Selenium patterns and page objects.

In these cases, Claude is a productivity enhancer. It can draft, refactor, explain, and translate. Just do not confuse that with a turnkey AI test creation agent.

When Endtest is the better choice

Endtest is the better fit when the goal is to turn behavior descriptions into maintainable tests without forcing the team into the slow code-generation loop. Endtest is an agentic AI, low-code/no-code test automation platform, so the generated output is editable platform-native steps rather than source files your team must operationalize.

That is especially true when:

  • QA engineers and non-developers need to create or edit tests.
  • The team does not want to maintain Playwright or Selenium framework plumbing.
  • Browser infrastructure and reporting should be part of the platform.
  • Test creation speed matters, but so does long-term maintainability.
  • You want generated tests to be editable as normal test steps, not raw source code.
  • You are evaluating AI test automation as a product capability, not just as a coding assistant.

The commercial distinction is important. A generic LLM can make an automation engineer faster. Endtest can make test creation itself more accessible and operationally complete.

For teams that also care about compliance and inclusive design, it is worth reviewing Endtest’s accessibility testing product page, accessibility testing documentation, and the WCAG standards. For AI test creation details, see the Endtest AI Test Creation Agent documentation.

If your team is comparing these options, run a small proof of concept with the same scenarios across all tools.

Use three flows:

  1. A happy-path authentication and dashboard test.
  2. A multi-step CRUD flow with dynamic data.
  3. A workflow with asynchronous UI updates, such as invitations, imports, or background processing.

For Claude plus Playwright or Selenium, track:

  • Number of prompts required before the test passes.
  • Number of manual code edits.
  • Time spent on selectors.
  • Time spent on waits and timing issues.
  • Time spent on framework setup.
  • Time spent on reports and CI.
  • Whether a non-developer can review the result.

For Endtest, track:

  • Time from plain-English scenario to runnable test.
  • Quality and editability of generated steps.
  • Locator stability.
  • Ease of debugging failures.
  • How well the generated test fits into the broader suite.
  • Whether QA and product stakeholders can understand the test without reading code.

Do not evaluate only the first green run. Change a label, move a button, adjust a table, or alter a loading state. Maintenance is where test automation tools reveal their real cost.

Final verdict

Claude with Playwright is impressive for generating modern browser automation code. Claude with Selenium is useful for teams that already live in the WebDriver ecosystem. Both approaches can save time for skilled automation engineers.

But recreating the Endtest AI Test Creation Agent with Claude plus Playwright or Selenium is not just a matter of writing better prompts. The generic LLM workflow leaves the user responsible for selectors, waits, framework structure, browser state, retries, screenshots, reports, debugging, CI, and maintenance. That is a lot of hidden labor.

Endtest is doing something more specialized: it uses AI inside an agentic AI, low-code/no-code test automation platform, generating editable Endtest steps rather than handing you raw source code to operationalize. For teams evaluating Claude, Playwright, Selenium, and AI test creation, that distinction should be central.

If your goal is to help SDETs write code faster, Claude plus Playwright or Selenium is worth exploring. If your goal is faster, more reliable AI test creation with less framework overhead, Endtest is the stronger and more purpose-built approach.