Sitemap

Generative Automation Testing with Playwright MCP Server

As a test engineer, you cannot ignore AI trends and hyped tools. You have to give them a try.

6 min readMay 31, 2025
GitHub Copilot agent creates an autotest through the Playwright MCP server
GitHub Copilot agent creates an autotest through the Playwright MCP server

A while ago, Playwright introduced Playwright MCP server — a tool for generating end-to-end automation tests through a vibe coding experience.

I decided to give it a try with several intentions:

  • Get to know the tool as quickly as possible — go through the path from installation to code as straightforward as possible, without diving into details;
  • Check my use case, as if I had used this tool myself for real work, to convert the existing manual test case (user scenario) into an automated test;
  • Are generated end-to-end tests worthwhile?

What is MCP?

MCP (Model Context Protocol) is a universal interpreter between LLMs and real-world applications (in this case, it is a Playwright). If you need to solve a clear problem (in this case, testing a web application with Playwright), MCP allows you to give the LLM tasks focused on your goals, without including excessive explanations about your tooling in the prompt, then the LLM may control the application as an AI agent.

So, the Playwright MCP server handles browser integrations, chooses locators, and writes tests, as it gains access to the page’s code and already knows how Playwright works, among other things.

Installing Playwright MCP

Before you start, you need to prepare the tooling. In my example, I have:

First, you need to set up Playwright MCP within VS Code. This can be done by adding the MCP server config in settings.json (Settings → Open Settings JSON (the icon in the top right corner)):

{
"chat.mcp.enabled": true,
"mcp": {
"servers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
}
}
}
}
Fig. 1. This config also enables the use of MCP in the chat

Once you have added an MCP server, you can start it and see the log in the output (CMD+SHIFT+U).

Running Playwright MCP server in VS Code
Fig. 2. Running Playwright MCP server in VS Code

Then, the Playwright MCP server will be available for use within GitHub Copilot agent mode in VS Code.

Generating autotests

Before giving tasks to LLM, you still need to have a basic prompt. I got one from Debbie O’Brien’s example:

​​- You are a playwright test generator.
- You are given a scenario and you need to generate a playwright test for it.
- DO NOT generate test code based on the scenario alone.
- DO run steps one by one using the tools provided by the Playwright MCP.
- Only after all steps are completed, emit a Playwright TypeScript test that uses @playwright/test based on message history
- Save generated test file in the tests directory
- Execute the test file and iterate until the test passes

You need to add this prompt to the chat’s context, which will be reused during the next conversation with LLM.

Adding a prompt to the context
Fig. 3. Adding prompt to the context

Now, you can give LLM (in mycase, it is Copilot) a task to write a test in natural language.

I took an ordinary website of CERN’s shop, and came up with a pretty common test case:

  1. Navigate to https://visit.cern/shop
  2. Filter shop items by category = Postcards
  3. Verify that «Atlas postcard» item is filtered
  4. Open «Atlas postcard» item
  5. Verify that the price is 1 CHF

Giving Copilot a task to write a test based on a presented test case will trigger commands, which will ask for your permission to execute it by the Agent. For simplicity, you can add an auto-approve for this action in the VS Code settings: "chat.tools.autoApprove": true

Ready to send prompt
Fig. 4. Ready to send prompt

When it runs, the Playwright MCP server runs the browser, examines the page’s code, determines what to do, writes the test accordingly, and provides you with the file with the autotest. It not only creates the test file, but also offers a command to execute it immediately.

Playwright MCP agent execution
Fig. 5. Playwright MCP agent execution. You can notice two unsuccessful commands — LLM tried a few variants and at least found the right way to pass the given scenario

Execution of the npx command also works for me (for some reason, it run all the tests). And at the same time, when I run autotest directly in the terminal, it passes on the first run — I am impressed!

Generated autotest passed
Fig. 6. Generated autotest passed

Example of generated test:

import { test, expect } from '@playwright/test';

test('Atlas postcard price is 1 CHF after filtering by Postcards', async ({
page,
}) => {
// 1. Navigate to https://visit.cern/shop
await page.goto('https://visit.cern/shop');

// 2. Filter shop items by category = Postcards
await page.getByLabel('Category').selectOption('Postcards');
await page.getByRole('button', { name: 'Filter' }).click();

// 3. Verify that «Atlas postcard» item is filtered
await expect(
page.getByRole('heading', { name: 'Atlas postcard' }),
).toBeVisible();

// 4. Open «Atlas postcard» item
// The click is intercepted, so navigate directly
await page.goto('https://visit.cern/node/609');

// 5. Verify that the price is 1 CHF
await expect(page.getByText('Price')).toBeVisible();
await expect(page.getByText('1 CHF')).toBeVisible();
});

Of course, there is plenty of room for improvement in the generated test:

  • Split the whole test into steps;
  • Add custom expect messages;
  • Add more intermediate expectations during the test;
  • A human test engineer can definitely find a way to click on the item, instead of navigating to it directly (it is the main AI’s mistake);
  • Rewrite the verification of the price.

Anyway, the test works; most elements get by roles and labels, and verification steps have expectations. For a person who is not a professional test automation engineer, this test may appear fine because it solves the problem of increasing test coverage for regression testing.

If I test this «server» for a longer time, on a large number of different and complicated scenarios, I would face issues. However, I do not see any obstacles to using this AI tool if the test cases are simple enough, the project is not too serious, and the team does not have dedicated automation test engineers.

In conclusion, I answered all my questions:

  • This tool works almost out of the box and requires minimal settings at the start;
  • It works for my use case — it can convert user scenarios into autotests;
  • These autotests are mediocre, but surprisingly, they are workable.

Just half a year ago, LLMs were generating awful tests. They looked fine, but in practice, they would not run and required a lot of manual refactoring. However, it is a whole different story now.

  • AI still would not replace the test automation engineer, but now the same engineer may perform twice as well, because editing tests is easier than writing them from scratch.
  • Manual testers, who never write code but write test cases, can now process their cases into somehow working autotests. This is a good starting point for anyone who wants to develop their skills in automation testing but has hesitated or does not know how to begin.
  • Frontend developers who do not have enough time and resources to deal with UI testing can now quickly increase the test coverage of their interfaces. Some tests are always better than nothing.

Let’s see where this promising technology will lead us in the near future.

Read more:

--

--

Andrey Enin
Andrey Enin

Written by Andrey Enin

Quality assurance engineer: I’m testing web applications, APIs and do automation testing.

Responses (3)