Writing a bot to play Google's Pani Puri Doodle

Occasionally Google will publish tiny free games as their logo for the day. The "Doodle" on July 12, 2023 was unusally easy to automate.

How does the game work?

This game is a tile-matching game, similar to Same Game, Bejeweled, or Candy Crush. The theme is pani puri, a South Asian street food. Customers visit your stall to make an order, and you must find a cluster of tiles to fulfill it.

The game also has a customer satisfaction meter. It depletes when you make mistakes, and fills when you fill orders correctly. If it fills all the way, you will get a brief 10x score bonus. If it depletes all the way, that's game over.

If you'd like to give it a try, go ahead now! Because we're about to break it, badly.

How can a bot interact with the game?

Often, video games will be made by drawing pixels to a canvas. Bots will then have to resort to scanning pixels, analyzing the screen as an image, and simulating user input from the highest levels. Happily, this game was made by passing HTML elements to the browser. This means that the game state can be read with simple document queries, and user input is trivial to simulate.

The game's markup is structured (more or less) like this:


<div class="ddl-screen-root">
	<div class="ddl-receipt">
		<text class="ddl-receipt-count">1</text>
		<text class="ddl-receipt-a11y">Tamarind Puri</text>
		<text class="ddl-receipt-count">2</text>
		<text class="ddl-receipt-a11y">Potato Puri</text>
	</div>
	<table class="ddl-puri-table">
		<tr>
			<td><button aria-label="Mint Puri">
			<td><button aria-label="Mint Puri">
			<td><button aria-label="Potato Puri">
			...
		</tr>
		<tr>
			<td><button aria-label="Tamarand Puri">
			<td><button aria-label="Mint Puri">
			<td><button aria-label="Potato Puri">
			...
		</tr>
		...
	<table>
</div>
	

So we can write these JS functions to read the game state:


function zip (listA, listB) {
	return Array.from(listA).map((a, i) => [a, listB[i]]);
}

// Returns an array of { count: number, flavor: string } objects.
function getCurrentOrders () {
	return zip(
			document.querySelectorAll(".ddl-receipt-count"),
			document.querySelectorAll(".ddl-receipt-a11y")
		).map(([countElement, flavorElement]) => ({
			count: parseInt(countElement.innerHTML),
			flavor: flavorElement.innerHTML,
		}));
}

// Returns a 2D array of { flavor: string, button: HTMLElement } objects.
function getCurrentGrid () {
	return Array
		.from(document.querySelectorAll(".ddl-puri-table tr"))
		.map(row => Array
			.from(row.querySelectorAll("button"))
			.map(button => ({
				flavor: button.ariaLabel,
				button,
			}))
		);
}
	

And once we know where we want to click, we can do that with a simple button.click().

How can a bot know where to click?

The core of our bot will be the Disjoint Set Union algorithm, which can efficiently find clusters of matching elements in the grid. Each cluster will have a root element, and each element will point (perhaps indirectly) to its cluster's root.

In English, the algorithm will run like this: Start with each element as the root of its own cluster. Then examine each element in the grid, left to right and top to bottom. If it is the same flavor as its neighbor above, point it to that neighbor's root. If it is the same flavor as its neighbor left, point it to that neighbor's root. If it is the same flavor as both neighbors, point its left neighbor's root to its above neighbor's root.

In code, there are a few more details to handle:


// Returns an array of { flavor: string, elements: Object[] } objects
function findClusters (grid) {
	function findRootCoordinates (element) {
		const [x, y] = element.rootCoordinates;
		if (x === element.row && y === element.column) {
			// If an element is pointing to itself,
			// it is the root of a cluster.
			return [x, y];
		}
		// Otherwise, follow wherever it's pointing.
		return findRootCoordinates(grid[x][y]);
	}

	for (let i = 0; i < grid.length; i += 1) {
		for (let j = 0; j < grid[0].length; j += 1) {
			const element = grid[i][j];
			element.row = i;
			element.column = j;
			element.rootCoordinates = [i, j];

			const matchAbove = i > 0 && grid[i-1][j].flavor === element.flavor;
			const matchLeft = j > 0 && grid[i][j-1].flavor === element.flavor;

			if (matchAbove && matchLeft) {
				const aboveRootCoordinates = findRootCoordinates(grid[i-1][j]);
				const [leftRootRow, leftRootCol] = findRootCoordinates(grid[i][j-1]);
				const leftRoot = grid[leftRootRow][leftRootCol];

				element.rootCoordinates = aboveRootCoordinates;
				leftRoot.rootCoordinates = aboveRootCoordinates;
			}
			else if (matchAbove) {
				element.rootCoordinates = findRootCoordinates(grid[i-1][j]);
			}
			else if (matchLeft) {
				element.rootCoordinates = findRootCoordinates(grid[i][j-1]);
			}
		}
	}

	const clusters = new Map();
	for (let element of grid.flat()) {
		const key = findRootCoordinates(element).join("-");
		if (clusters.has(key)) {
			const cluster = clusters.get(key);
			cluster.elements.push(element);
		}
		else {
			const cluster = {
				flavor: element.flavor,
				elements: [element],
			};
			clusters.set(key, cluster);
		}
	}

	return Array.from(clusters.values());
}
	
If that was clear as mud, this animation might help.

Automating one turn

All that's left is to hook these functions together:


function fulfillOrder () {
	const orders = getCurrentOrders();
	const grid = getCurrentGrid();
	const clusters = findClusters(grid);
	for (let order of orders) {
		const cluster = clusters.find(c =>
			c.flavor === order.flavor
			&& c.elements.length === order.count);
		cluster.elements[0].button.click();
	}
}
	

And that's it! Each time we call fulfillOrder(), our bot will scrape the current game state, find a valid move, and make it. We're done!

...Or rather, we could be done. The algorithmically interesting challenge is solved, but I'd like to see how fast this bot can rack up high scores.

Automating a high score

The Loop

It's simple enough to play turns on repeat:


function delayMilliseconds (ms) {
	return new Promise(resolve => setTimeout(resolve, ms));
}

async function play () {
	while (true) {
		await delayMilliseconds(500);
		fulfillOrder();
	}
}
	

With this code, our bot will attempt to fulfill a customer's order every half-second. Unfortunately, it runs into a problem pretty soon: the pause menu. This game will automatically pause after a short period of inactivity, and apparently whatever our bot is doing doesn't count as activity.

The Pause Menu

One option would be to analyze the game code, to figure out what does count as activity, and make sure our bot does some of that. But the game's code consists of 150kB of minified JS, so that's easier said than done.

What's easy to do is automatically click the "Resume" button, if it's currently on the screen. That code looks like this:


function unpauseIfNeeded () {
	const button = document.querySelector(".ddl-game-screen-pause-button-paused");
	if (button) {
		button.click();
	}
}
	

And now our game loop looks like this:


async function play () {
	while (true) {
		await delayMilliseconds(500);
		unpauseIfNeeded();
		fulfillOrder();
	}
}
	

And that's it! This bot will now play the game indefinitely — in fact, there is no way to stop it without leaving the page. If you leave it open all day, it will eventually reach a million points, and more. And if you watch long enough, you'll see very large clusters form, until a customer comes in and orders 27 Mint Puri and they all disappear at once.

What's next?

Well, our bot isn't performing optimally. In between customers, new elements fall into the grid, with an animation that takes time. If that animation takes less than half a second, great. Our bot won't miss a beat. But if it takes 0.6 seconds, our bot will do nothing for a beat.

We could improve this by simply increasing the frequency: say, run the bot every tenth of a second instead. This feels inelegant to me; I would rather only call the bot when I know it has work to do. And since we know what elements we're eliminating with each click, if we knew more about this game's gravity we could estimate expected animation delays pretty accurately. What is the acceleration? Is there a terminal velocity?

There may also be room to improve by selecting clusters more carefully. A horizontal pair of elements will be replaced with a quicker animation than a vertical pair; perhaps if more than one cluster would satisfy an order, we should prefer the more horizontal one?

I'm not sure if there's any merit to this idea. Perhaps by selecting faster clusters now, we're only setting ourselves up for more delay later. And if that's the case, maybe we should instead try to avoid creating large clusters? More analysis is needed here.

Well, that's it.

Maybe someday I'll come back and analyze the inactivity timer, or experiment finer timing for the physics, or try some cluster choice optimization. But for now I'm done. I hope you enjoyed!