Vibe Coding 9 -- AI Agents That Play Taboo

One thing that I've gotten pretty interested in recently is "agent engineering" or "agentic architectures" or "agentic system design" (I'm not sure there's a canonical name for it). Of course there is a lot of interesting work on "how do I use an LLM to solve some contained inference or generation problem."

But there's also some pretty interesting (to me) problems related to "how do I architect a system of autonomous agents toward some end?" What are the patterns for communication, and for orchestration, and for self-organization, and so on?

Every framework takes some point of view on these, but as always I am partial to dspy, which (as I understand it) is architecturally pretty unopinionated beyond "use signatures" and "use modules", and in particular about multi-agent coordination. Which is fine, I like the flexibility and challenge of figuring out these designs myself.

Already I've been building a lot of "agentic workflows," but one thing that I hadn't built yet is a real "system of agents" where they interact with each other and do things autonomously (as opposed to moving themselves through pre-determined workflows).

system-of-agents

So I started thinking about what might be a fun, impractical example, and I came up with the idea of Taboo:

Taboo

taboo

If you're not familiar, it's a word-guessing party game, where there's a word to guess (e.g. "APPLE") and several taboo words (e.g. "RED", "FRUIT", "PIE", "COMPUTER", "MACINTOSH") that you're not allowed to use in your clues. At a party there would typically be two teams, and they would take turns: one team would "clue" and "guess" while the other monitors them and sounds a buzzer if they say any of the taboo words.

If you squint hard enough, this looks like a bunch of "agents":

  • Cluer (generate clues)
  • Buzzer (buzz if any of the clues are taboo)
  • Guesser (try to guess the word)
  • Judge (say if any of the guessers are correct)

What's more, this is not a turn-based game but more of a free-for-all: the cluer keeps shouting out clues while the guessers keep shouting out guesses while the buzzer and judge monitor for winning or losing.

This makes it somewhat interesting as a system of agents; each operates at its own pace rather than in a predetermined sequence.

ChatGPT and I game-planned this out and came up with a pretty complicated design involving multiple message queues and so on.

(Originally, I wanted to make it a collaborative web game where one or more humans could play as cluer or guesser along with the AI, but eventually I simplified it down to just letting agents play amongst themselves.) (Maybe I'll build the web version in the future.)

As usually happens on these projects, I started with ChatGPT's PRD, found the resulting code immensely overengineered, and spent quite a while stripping it down and simplifying it. It's still complicated because of all the async pieces, but I think it's a relatively clean design for what it does.

Design

At its core, the game is a stream of (typed) messages. A menagerie of agents listens to the stream, and each of them can at any time send its own message to the stream.

So, for example, the guessing agent (or human) sends messages that look like

class GuessEvent(BaseEvent):
    role: Literal["guesser"]
    player_id: str
    guess: str
    rationale: Optional[str] = None

and similarly for the other players.

There is a Player base class with a fair amount of shared plumbing in it, and then subclasses for each of the roles:

class Cluer(Player[ClueEvent], ABC):
    """
    Player that gives clues to help guessers guess the target word.
    """
    def __init__(self):
        super().__init__()

    async def next_clue(self) -> str:
        raise NotImplementedError

    async def play(self):
        while not self.game.is_over():
            clue = await self.next_clue()
            await self.announce(ClueEvent(role="cluer", clue=clue))

Most of the magic is in next_clue which is not implemented in the base class, but you can also see the play loop, which just continuously generates and announces clues until the game is over. (self.game has a history of gameplay so that your implementation of next_clue can take into account previous clues and guesses).

There is a Game class that sets everything in motion (details elided):

    async def play(self) -> Dict[str, Any]:
        async def timeout():
            await asyncio.sleep(self.duration_sec)
            await self.publish(SystemMessage(role="system", event="timeout"))

        # Launch players and timeout tasks
        player_tasks: list[asyncio.Task] = [
            asyncio.create_task(p.play()) 
            for p in self.players
        ]
        timeout_task = asyncio.create_task(timeout())

        try:
            while True:
                # wait for the next batch of events
                # iterate over each one
                # - if it's a timeout event, game over, lose
                # - if it's a buzzer event, game over, lose
                # - if it's a judge correct event, game over, win
                ...
        catch GameOverException:
            # stop all the agent work
            ...

(I feel like I am starting to get the hang of Python async.)

So now where does the AI come into things?

AI

The only dataset of Taboo cards I could find was in German, which was not particularly useful for my purposes. So I figured I might as well let the AI generate them. Initially I had the AI generate both the target word and the taboo words, but it kept picking "VOLCANO" all the time, so eventually I changed it so that you give it the target word and it comes up with the taboo words:

class CreateTabooWords(dspy.Signature):
    """
    You are creating a game card for the game Taboo. Each card has a target word
    and a list of taboo words. One of the players ("the cluer") will try to get
    the other players to guess the target word by giving clues, but they cannot
    use any of the taboo words in their clues, or they lose.

    For example, if the target word is "apple", the taboo words might be
    ["fruit", "red", "pie", "tree", "juice"].

    You will be given a target word. Your task is to come up with a list of 5 taboo words
    that will make the game interesting and challenging. The taboo words should ideally
    be the most obvious clues to the target word, so that the cluer has to be creative.
    """
    target: str = dspy.InputField(description="The target word for the game")
    taboo_words: list[str] = dspy.OutputField(
        description="A list of taboo words that cannot be used in clues for the target word")

create_taboo_words = dspy.ChainOfThought(CreateTabooWords)

A few things here: first, you can see that I gave this signature a more substantial docstring than I often do, because in this case there is quite a bit to its task that you can't just get from the input field and output field. (Ideally I would probably use few-shot Examples instead of putting it in the docstring, but oh well.)

But at the same time you can see that the docstring prompt is a pure description of the task. I don't try to cram in there what the input types or what the output types are. That's all still handled by the Signature machinery, so the docstring "feels like" a docstring.

I also want it to be very thoughtful about generating the taboo words, so I used dspy.ChainOfThought and (although you can't see it here) I set up this agent to use gemini-2.5-pro.

(In general for this project I had different agents use different models, depending on how hard / creative their tasks were.)

And then e.g. the guesser gets a signature that looks like

class GuessWord(dspy.Signature):
    """
    You are playing a game of Taboo. Your goal is to guess the target word based on the clues given by the Cluer.
    """

    history: list = dspy.InputField(description="The history of the game so far, including previous clues, buzzes, guesses, and judgments")
    player_id: str = dspy.InputField(description="The ID of the player making the guess")
    player_personality: str | None = dspy.InputField(description="Optional personality or background information about the player making the guess")

    guess: str = dspy.OutputField(description="The guessed word")
    rationale: str | None = dspy.OutputField(description="Optional rationale for the guess")

and has a next_guess function that just grabs the history from the game and the player information from itself and makes an async dspy call. (There is a little more ceremony around it to make the call cancelable, but that part is not super important here.)

One other problem I ran into is that the "buzzer" and "judge" were not doing a very good job (possibly because I gave them 2.5-flash-lite to use so they would be fast); they would e.g. declare some guess a winner when it wasn't.

Eventually I added an extra "justification" field to the signature and that seemed to keep them more on the straight and narrow:

class BuzzClue(dspy.Signature):
    """
    Check if the clue is one of the taboo words (or a minor variation, like singular/plural).
    It's ok if the clue has a similar meaning to the taboo word, it just can't be the same word.
    """
    clue: str = dspy.InputField(description="The clue word or phrase given by the Cluer")
    taboo_words: list[str] = dspy.InputField(description="The taboo words that cannot be used in the clue, or you lose")

    buzz: bool = dspy.OutputField(description="Whether the clue violates the taboo words or not")
    justification: str = dspy.OutputField()

I guess that's kind of a poor man's chain-of-thought.

And it all works. The whole thing is up on GitHub, as usual, so check it out:

https://github.com/joelgrus/taboollm

Here's one where it gets it really quick:

$ uv run python -m taboo play --target vibes
Card: target=vibes, taboo_words=['Feeling', 'Energy', 'Mood', 'Atmosphere', 'Vibrations']
[cluer] Aura

[guesser p2-friendly] Vibe — The clue "Aura" directly brings to mind "vibe" as a synonym for a distinctive atmosphere or feeling around a person or place.

[cluer] Spirit

[judge] Vibe by p2-friendly -> CORRECT

[system] end: correct, winner: p2-friendly

Round finished. Winner: p2-friendly

And here's a hard one:

$ uv run python -m taboo play --target convolution

Card: target=convolution, taboo_words=['Kernel', 'Filter', 'Image', 'Signal', 'Neural']

[cluer] Combining

[guesser p2-friendly] Mix — The clue "Combining" directly suggests the action of mixing things together.

[judge] Mix by p2-friendly -> INCORRECT

[guesser p3-sarcastic] Merging — "Combining." Wow, such a deep and insightful clue. It's almost as if things, when combined, become... *merged*. Who would've thought?

[guesser p2-friendly] Merge — "Combining" can mean bringing two or more things together to form a single entity, which is what "merge" implies, like companies merging.

[cluer] Folding

[judge] Merging by p3-sarcastic -> INCORRECT

[judge] Merge by p2-friendly -> INCORRECT

[guesser p3-sarcastic] Uniting — Since "merging" and "mixing" weren't quite abstract enough for the intellectual heights of "combining," perhaps we're going for something more profound. Like... bringing things together in a grand, unified front.

[judge] Uniting by p3-sarcastic -> INCORRECT

[guesser p2-friendly] Uniting — The clue is "Combining," and since "Mix" and "Merge" have been ruled out, I'm thinking of another way to describe bringing things together, especially in a more collective or harmonious sense. "Uniting" fits this idea well.

[judge] Uniting by p2-friendly -> INCORRECT

[guesser p1-thoughtful] MIX — The clue "Combining" directly describes the action of mixing multiple elements together. It's a very common and straightforward synonym.

[judge] MIX by p1-thoughtful -> INCORRECT

[cluer] Slide

[guesser p2-friendly] Kneading — The first clue was "Combining," and then the second clue was "Folding." When you combine ingredients and then fold them, especially in cooking, it often refers to the process of kneading dough. This combines both actions.

[judge] Kneading by p2-friendly -> INCORRECT

[guesser p3-sarcastic] Consolidate — Ah, yes, 'Combining,' followed by the subtle hint of 'Folding.' Clearly, the grand master is guiding us towards the profound concept of... consolidating. Because when you combine and fold, you obviously... consolidate. The intellectual journey is truly breathtaking.

[judge] Consolidate by p3-sarcastic -> INCORRECT

[cluer] Overlap

[guesser p1-thoughtful] Incorporate — The first clue "Combining" points to bringing things together. The second clue "Folding" suggests a specific way of combining, often by bringing something into an existing structure or mixture. "Incorporate" means to take in or include something as part of a whole, which directly relates to combining, and it also fits the idea of "folding" something into a plan, system, or mixture.

[judge] Incorporate by p1-thoughtful -> INCORRECT

[guesser p1-thoughtful] Tuck — The clues seem to build on ways of arranging or inserting things. "Combining" is a general action. "Folding" is a specific way of preparing something to be combined or inserted. "Slide" suggests a smooth, deliberate movement, often into a confined space. "Overlap" implies that one part rests on or covers another. "Tuck" fits well with all these: you combine things by tucking one into another; you often fold something before you tuck it; you slide something into place when tucking it; and the tucked element will often overlap with the material it's tucked into.

[judge] Tuck by p1-thoughtful -> INCORRECT

[guesser p2-friendly] Joint — The clue \"Combining\" suggests bringing different elements together. \"Folding\" implies a bend or crease, which is characteristic of certain types of connections. \"Slide\" refers to a smooth movement of one part past another. A \"joint\" serves as a point where things are combined, and many types of joints facilitate either a folding motion (like a hinge) or a sliding motion. It seems to connect all three clues well.

[judge] Joint by p2-friendly -> INCORRECT

[cluer] Aggregation

[guesser p1-thoughtful] Weave — The clues "Combining," "Folding," "Slide," and "Overlap" strongly suggest the process of weaving. Weaving involves combining threads, which are folded or passed over and under each other. The threads slide past each other, and they naturally overlap to form a fabric. This integrates all the clues into a single coherent action.

[judge] Weave by p1-thoughtful -> INCORRECT

[guesser p2-friendly] Weave — The clues "Combining," "Folding," "Slide," and "Overlap" all strongly point to the action of weaving. When you weave, you are combining threads or strands. The process involves threads folding over and under each other, sliding past each other, and continuously overlapping to create fabric or a pattern. It perfectly encapsulates all the actions described.

[judge] Weave by p2-friendly -> INCORRECT

[cluer] Sweep

[guesser p3-sarcastic] Collapse — Ah, the intellectual odyssey continues! We began with the majestic 'Combining,' then progressed to the intricate 'Folding,' and now, the profound 'Slide.' It's almost as if things, when combined and folded, have a tendency to, you know, just completely give up and slide into a heap. The sheer elegance of these clues points directly to something... collapsing. It's truly a masterclass in subtlety.

[judge] Collapse by p3-sarcastic -> INCORRECT

[guesser p2-friendly] Origami — The clue "Combining" is a general action, reinforced by "Aggregation." "Folding" is a very specific action, which, along with "Slide" and "Overlap," strongly suggests a process of manipulating material. Origami perfectly encompasses all these: it involves extensive folding of paper, parts of the paper slide against each other during the folding process, layers of paper constantly overlap, and the entire creation is a combination or aggregation of these folds. It ties all five clues together cohesively.

[judge] Origami by p2-friendly -> INCORRECT

[cluer] Superimpose

[guesser p1-thoughtful] Bundle — The clues provided are "Combining," "Folding," "Slide," "Overlap," and "Aggregation." "Bundle" fits all of these very well. You "combine" items to create a bundle. Often, items within a bundle, like clothes or papers, are "folded" for compactness. Items might "slide" into place as the bundle is formed or tightened. The items within a bundle will almost certainly "overlap" each other. Finally, a bundle is fundamentally an "aggregation" or collection of things brought together. This word ties all the clues together coherently, from specific actions to the overall result.

[judge] Bundle by p1-thoughtful -> INCORRECT

[guesser p3-sarcastic] Pile — Oh, the sheer brilliance! We started with the profound 'Combining,' moved to the intricate 'Folding,' then the subtle 'Slide,' followed by the illuminating 'Overlap,' and the highly academic 'Aggregation.' Now, with 'Sweep,' it's all so clear! When you combine, fold, slide, overlap, and aggregate with a sweep, you're obviously engaging in the highly complex, multi-faceted process of... creating a pile. Truly, the intellectual journey has been astounding.

[judge] Pile by p3-sarcastic -> INCORRECT

[cluer] Transform

[guesser p3-sarcastic] Collage — Oh, the sheer brilliance! We've sailed through the profound ocean of 'Combining,' navigated the intricate channels of 'Folding,' glided effortlessly across the 'Slide,' pondered the existential 'Overlap,' aggregated our thoughts on 'Aggregation,' swept aside all doubts with 'Sweep,' and finally, reached the zenith of intellectual enlightenment with 'Superimpose.' Clearly, after such a grand and complex journey, the only logical conclusion for all these deep, multi-layered concepts is the incredibly sophisticated art of... sticking various bits and pieces together. The intellectual journey has been truly breathtaking.

[judge] Collage by p3-sarcastic -> INCORRECT

[guesser p2-friendly] Embrace — The clues "Combining" and "Aggregation" both point to bringing things together. "Folding," "Slide," and "Overlap" describe specific physical actions or arrangements. "Embrace" fits all these: when you embrace, you combine and aggregate individuals, often folding your arms around them, with bodies sliding and overlapping. Crucially, the latest clue "Sweep" fits perfectly with the idiom "sweep someone into an embrace," which ties all the dynamic actions together.

[judge] Embrace by p2-friendly -> INCORRECT

[cluer] Shaping

[guesser p1-thoughtful] Overlay — The clues "Combining," "Aggregation," "Overlap," and "Superimpose" all strongly point to bringing layers or elements together, with "Overlap" and "Superimpose" being almost synonymous with the concept of an overlay. "Slide" perfectly describes the action of placing an overlay. "Folding" can relate to the material being used as an overlay (e.g., a folded map overlay) or a preparatory step, while "Sweep" could refer to the motion of applying an overlay or a broad area covered by it. All clues converge on the idea of placing one thing over another.

[judge] Overlay by p1-thoughtful -> INCORRECT

[system] timeout

[system] end: timeout

Round finished. Winner: none

This was also a nice "vibe coding" collaboration, since we started out with a very complicated design (courtesy of ChatGPT), and the initial code dump was me telling codex-cli (yes, I switched from claude code) to implement the PRD, and then we iterated on it quite a few times with me asking "is this part necessary" and "could this be simpler" and "why don't we just do XYZ", which is honestly not too different from how I am as a manager / code reviewer. And then all the dspy parts I wrote myself, because those parts are fun. :) And the final codebase I think is solid and well designed (other than the fact that it doesn't have any tests 😇).