Test-Driven Development Walk-Through

Author: Nishith Sharma
Estimated reading time: 25 min

TL;DR Writing the game first “by intuition” works, but you’ll discover bugs late and refactors are scary.
In this post we’ll start with tests—Red → Green → Refactor—until we ship a fully–tested, production-ready command-line Tic-Tac-Toe written in Python 3. By the end you’ll have:

a clean, object-oriented tic_tac_toe.py

a complete test_tic_tac_toe.py suite (pytest)

a reproducible step-by-step TDD journal you can mimic on any project.

0. The Intuitive Solution (Baseline)

Here’s the typical beginner script you may already have (abridged for readability):

tic_tac_toe_intuitive.py (expand to view code)


   
    import os

    def clear():
        os.system('cls' if os.name == 'nt' else 'clear')

    def print_layout(positions: list):
        clear()
        print("\nWelcome to Tic Tac Toe by Nishith Sharma\n")
        print("Below are the latest positions\n")
        
        for row in positions:
            line = ''
            col = 1
            for item in row:
                if col % 3 != 0:
                    line = line + f"{item} | "
                else:
                    line = line + f"{item}"
                col += 1
            print(line)
            
    def check_win(taken_pos: list, symb: str) -> bool:
        if taken_pos[0] == [symb,symb,symb] or taken_pos[1] == [symb,symb,symb] or taken_pos[2] == [symb,symb,symb]:
            return True
        elif taken_pos[0][0] == symb and taken_pos[1][1] == symb and taken_pos[2][2] == symb:
            return True
        elif taken_pos[0][2] == symb and taken_pos[1][1] == symb and taken_pos[2][0] == symb:
            return True
        elif taken_pos[0][0] == symb and taken_pos[1][0] == symb and taken_pos[2][0] == symb:
            return True
        elif taken_pos[0][1] == symb and taken_pos[1][1] == symb and taken_pos[2][1] == symb:
            return True
        elif taken_pos[0][2] == symb and taken_pos[1][2] == symb and taken_pos[2][2] == symb:
            return True
        else:
            return False
        
    win = False
    player = 1
    valid_str = ['1','2','3','4','5','6','7','8','9']
    taken_positions = []
    layout_positions = [[1,2,3],[4,5,6],[7,8,9]]
    current_pos = 0
    players = []

    # Game initialize
    clear()
    players.append(input("Player 1, Enter your name: "))
    players.append(input("Player 2, Enter your name: "))


    print_layout(layout_positions)

    while(win == False):
        pos_str = input(f"\n{players[player - 1]}, what position do you want to play? ")
        if pos_str in valid_str:
            if pos_str in taken_positions:
                print("This position is taken, retry")
            else:
                if player == 1:
                    symbol = 'X'
                else:
                    symbol = 'O'
                
                if int(pos_str) < 4:
                    layout_positions[0][int(pos_str) % 4 - 1] = symbol
                    taken_positions.append(pos_str)
                elif int(pos_str) < 7:
                    layout_positions[1][int(pos_str) % 7 - 4] = symbol
                    taken_positions.append(pos_str)
                else:
                    layout_positions[2][int(pos_str) % 10 - 7] = symbol
                    taken_positions.append(pos_str)
                print_layout(layout_positions)
                current_pos += 1
                if check_win(layout_positions, symbol):
                    print(f"\n{players[player -1]} wins, Congratulations !\n")
                    win = True
                else: 
                    if (current_pos > 8):
                        print("\nMatch draw. Game Over !")
                        win = True
                if player == 1:
                    player = 2
                else:
                    player = 1
        else:
            print("Invalid position, retry")

It works—until you need to:

port it to Windows & macOS (terminal‐clearing quirks)
swap the UI (e.g., web or Tkinter)
add an AI player
ensure that a refactor doesn’t break anything
No tests ⇒ no safety net.

What Is TDD, Really?

(“Red → Green → Refactor” is the visible tip of a much larger mindset iceberg.)

Aspect	Traditional (“code-then-test”)	Test-Driven Development
Primary driver	Feature delivery	Executable specification
Feedback loop	Minutes → days (manual)	Seconds (automated)
Design pressure	Afterthought; tests adapt to code	Code adapts to tests, ergo to requirements
Failure cost	Late discovery, expensive fixes	Early discovery, cheap fixes

TDD was popularized by Kent Beck in Extreme Programming (1999). His key insight: writing a failing test first forces you to think about behaviour before implementation details, nudging the design toward decoupled, composable units.

1.2 Anatomy of the Red-Green-Refactor Cycle

Red — Specify
- Write a micro-behavioural test that captures one new requirement.
- The failure is intentional; it proves the test can detect the missing behaviour.
Green — Satisfy
- Write the minimum production code to pass all tests.
- “Minimum” curbs gold-plating; YAGNI is built-in.
Refactor — Simplify
- Now that behaviour is protected, improve structure: rename, extract, remove duplication, optimise algorithms.
- Tests must remain green, acting as safety rails.

Cadence: A healthy loop is tiny—30 seconds to 10 minutes. Longer loops often signal tests that are too broad or code that violates SRP (Single-Responsibility Principle).

1.3 Granularity: What counts as a “test”?

Micro-tests (unit) — target a single class/function; run in < 10 ms.
Drive low-level design; enable fearless refactoring.
Component/Service-tests — cross class boundaries but stay intra-process.
Validate interactions and contracts.
Integration/Contract-tests — touch I/O (DBs, HTTP, queues).
Ensure wiring works; usually outside the TDD loop.
End-to-End/Acceptance — user-visible flows; minutes to run.
Specify system behaviour at the story level (often captured with BDD).

True TDD focuses on micro- and component-tests; broader tests complement but do not replace them.

1.4 Design Forces Generated by TDD

Emergent Quality	Why TDD Encourages It
High cohesion / Low coupling	Hard-to-test code (many hidden deps, side effects) makes the “Red” step painful. Developers naturally refactor toward small, pure functions and injected dependencies.
Documented intent	Tests describe why code exists, doubling as living documentation.
Refactor safety-net	Once green, you can change internals at will; tests protect external behaviour.
Modularity & SOLID	Violating SRP or Open-Closed quickly results in awkward tests, signalling design smells early.

1.5 Common Misconceptions

Myth	Reality
“TDD is about testing.”	It’s actually a design discipline that uses tests as the design medium.
“You must test every getter/setter.”	TDD cares about observable behaviour, not trivial accessors.
“TDD slows you down.”	Initial velocity dips (learning curve), but long-term throughput rises due to fewer regressions and easier maintenance.
“TDD = 100 % coverage.”	Coverage is a lagging indicator. Focus on meaningful assertions, not the metric.

1.6 When TDD Shines

Complex, rapidly evolving domains (e-commerce rules engines, fintech pricing).
Code requiring long-term maintenance by multiple developers.
Critical algorithms where regressions are costly (embedded avionics, healthcare).
Refactoring legacy code: write characterisation tests first, then improve design safely.

1.7 When It May Not Fit

Quick-and-dirty scripts or one-off data migrations.
Experiments where behaviour is unknown; spike solutions first, extract learning, then TDD the production variant.
UI-heavy code without testable seams (though modern frameworks + test-IDs mitigate this).
Teams without cultural buy-in—partial TDD can be worse than none (false sense of safety).

1.8 Relationship to BDD, ATDD & Property-Based Testing

Discipline	Focus	Spec Style	Typical Tooling
TDD	Low-level design	Imperative assertions	JUnit, pytest
BDD (Behaviour-Driven Dev)	Business outcomes	Given-When-Then	Cucumber, Behave
ATDD (Acceptance-Test-Driven Dev)	Story acceptance	Tables / DSL	FitNesse
PBT (Property-Based Testing)	Universal invariants	Generated inputs	QuickCheck, Hypothesis

They’re complementary—many teams write BDD acceptance tests for stories, then drill down with TDD micro-tests during implementation.

1.9 Practical Heuristics & Tips

Name tests as behaviour sentences: test_balance_is_zero_for_new_account() → communicates intent instantly.
One assertion per concept: avoids brittle tests; group closely related asserts if they fail together.
Prefer fakes over mocks: over-mocking couples tests to implementation details; aim for state verification over interaction verification.
Refactor tests too: duplication and bad names in tests hurt future maintainers just as much.
Keep build time < 10 s locally: quarantine slow tests (DB, network) behind explicit markers.

Mindset shift: We’re not “writing tests after coding.” We’re designing via tests.

2. Project Scaffold

Folder layout

```
tictactoe/
├─ tic_tac_toe.py # production code
├─ test_tic_tac_toe.py # pytest suite
└─ README.md # docs / blog post
Why this structure?
One module + one test file keeps paths simple, import errors unlikely, and lets pytest discover tests automatically.
Spin-up steps

   python -m venv .venv # isolate deps
   source .venv/bin/activate # Windows: .venv\Scripts\activate
   pip install -U pip pytest # install test runner 
   pytest -q # should report 0 tests until you add them

That’s all the scaffold does: give you a clean workspace where code and tests live side-by-side, run in an isolated environment, and can be executed with a single pytest command.

3. Red-Green-Refactor Diary

Below is the unedited chronology (commit sized chunks). You can literally copy/paste the failing test first, watch it fail, then implement.

3.1 Iteration 1 – An Empty Board

Test (Red)

# test_tic_tac_toe.py  
from tic_tac_toe import Board 

def  test_board_initial_state():
    board = Board() assert board.cells == [None] * 9

pytest -q
E   ImportError: cannot import name 'Board' ...

Production (Green)

 # tic_tac_toe.py  
 class  Board: 
	def  __init__(self):
		self.cells: list[str | None] = [None] * 9

All green:

pytest -q
.                                                           [100%]

Refactor
Nothing to clean yet.

3.2 Iteration 2 – Place a Mark

Test

def  test_place_x_in_top_left():
    board = Board()
    board.place(0, 'X') # 0-based index
    assert board.cells[0] == 'X'

Fail ➜ Implement

class  Board:
    ... 
    def  place(self, index: int, symbol: str):
	    if self.cells[index] is  not  None:
			raise ValueError("Cell already taken") 
		if symbol not  in ('X', 'O'):
			raise ValueError("Invalid symbol")
        self.cells[index] = symbol

Add a negative test for occupied cell.
Run tests—green.

3.3 Iteration 3 – Switching Players

We’ll need a Game wrapper.

Tests

from tic_tac_toe import Game 

def  test_players_alternate():
    g = Game()
    g.play_turn(0) # X 
    g.play_turn(1) # O  
    assert g.board.cells[:2] == ['X', 'O']

Prod

class  Game: 
	def  __init__(self):
        self.board = Board()
        self.current = 'X' 
    
   def  play_turn(self, index: int):
        self.board.place(index, self.current)
        self.current = 'O'  if self.current == 'X'  else  'X'        

Green.
Refactor move symbol toggle into a helper _toggle_player.

3.4 Iteration 4 – Detecting Wins

Red test for a row win

import pytest 

def  test_row_win():
	g = Game()
    g.board.cells = ['X','X','X', None,None,None, None,None,None] 
    assert g.winner() == 'X'

Implement:

class  Board:
    WIN_PATTERNS = [
        (0,1,2), (3,4,5), (6,7,8), # rows 
        (0,3,6), (1,4,7), (2,5,8), # cols 
        (0,4,8), (2,4,6) # diags 
        ] 
        
    def  winner(self) -> str | None: 
	    for a,b,c in self.WIN_PATTERNS: 
		    if self.cells[a] and self.cells[a] == self.cells[b] == self.cells[c]: 
			    return self.cells[a] 
		return  None

Expose via Game.winner():

class  Game:
    ... 
    def  winner(self): 
	    return self.board.winner()

Add column & diagonal tests (parameterize). All green.

3.5 Iteration 5 – A Draw

Test:

def  test_draw():
    b = Board()
    b.cells = ['X','O','X', 'X','O','O', 'O','X','X'] 
    assert b.is_full() and b.winner() is  None

Implementation:

class  Board:
    ... 
    def  is_full(self) -> bool: 
	    return  all(c is  not  None  for c in self.cells)

Green.

3.6 Iteration 6 – CLI Loop (end-to-end)

Testing interactive I/O is trickier; pytest offers capsys / monkeypatch.

def test_cli_first_move(monkeypatch, capsys):
    inputs = iter(["Alice", "Bob", "1", "2", "3", "4", "5"]) # we’ll stop after a win 
    monkeypatch.setattr('builtins.input', lambda _: next(inputs)) 
    from tic_tac_toe import cli
    cli() # runs until g.winner()
    captured = capsys.readouterr() 
    assert  "Alice wins"  in captured.out

Implementation idea (in tic_tac_toe.py):

def cli():
    clear_screen()
    g = Game()
    players = [input("Player 1 name: "), input("Player 2 name: ")] 
    while  True:
        print_board(g.board)
        move = int(input(f"{players[0  if g.current=='X'  else  1]}, choose (1-9): ")) - 1 
        try:
            g.play_turn(move) 
        except ValueError as ex: 
	        print(ex); continue  
	    if g.winner():
            print_board(g.board) 
            print(f"{players[0  if g.current=='O'  else  1]} wins ") 
            break
        if g.board.is_full(): print("Draw.") 
	        break

We reused only the logic layer; print_board is a tiny helper that just formats board.cells.