PokeR/openspec/changes/archive/2026-05-17-bot-intelligence/design.md

## Context

Current bot implementation uses coin-flip decisions with no strategic reasoning, personality, or skill variation. The `PlayerSeat` type has basic fields (chips, status, cards) but no personality or skill attributes. Bot turns execute instantly with no timing consideration. No observation tracking exists — the player learns nothing about opponent behavior.

The game uses a functional state pattern: actions return new GameState objects. Bot decisions are invoked during turn processing in `turn.ts`.

## Goals / Non-Goals

**Goals:**
- Replace coin-flip bot decisions with personality-driven, skill-aware decision engine
- Provide 8 distinct bot archetypes across 5 skill levels with realistic play patterns
- Enable player-configurable table setup with visual seat editor
- Track opponent statistics and classify player types automatically
- Offer configurable learning assistance (info levels, feedback, coaching)
- Implement sequential decision timer with timing tells
- Keep all logic client-side with no external dependencies

**Non-Goals:**
- Multiplayer networking (single-player vs bots only)
- Machine learning-based classification (rule-based for v1)
- Bot-to-bot adaptation (bots adapt to human player only)
- Tournament mode (cash game focus)
- Persistent cross-session memory (optional toggle in v2)

## Decisions

### Decision 1: Three-layer bot decision pipeline

**Chosen**: Base Strategy → Personality Filter → Skill Noise

```
GTO-informed baseline → Archetype bias injection → Mistake/noise layer
```

**Rationale**: Separates concerns cleanly. Base strategy computed once, personality reshapes probabilities, skill injects errors. Easy to tune each layer independently and add new archetypes without rewriting core logic.

**Alternatives considered:**
- Lookup table per archetype × skill: Too many combinations (8 × 5 = 40 tables), hard to maintain
- ML model: Overkill for v1, requires training data, opaque decision process defeats teaching purpose

### Decision 2: Weighted scoring classification over pure rules

**Chosen**: Each archetype has a scoring function; stats contribute weighted points; highest score wins with confidence based on total.

**Rationale**: Handles edge cases naturally (e.g., VPIP=32% is clearly LAG-adjacent but rigid rules might miss it). Produces natural confidence percentages for UI. Tunable via weight adjustments.

**Alternatives considered:**
- Rule-based if/else: Brittle at boundaries, no confidence metric, confusing when misclassifies
- Naive Bayes classifier: More accurate but adds complexity, less transparent for teaching purposes

### Decision 3: Archetype-specific mistake libraries

**Chosen**: Each archetype has its own set of possible mistakes; skill level controls frequency. A Novice TAG folds too much; a Novice Fish calls too much — same skill, different errors.

**Rationale**: Realistic — bad players fail in ways consistent with their style. More educational for the player to observe pattern-consistent mistakes.

### Decision 4: Sequential timer with configurable duration

**Chosen**: Turn passes sequentially, each player gets independent countdown. Duration configurable (5-30s). Human can have same timer, no timer, or custom timer. Timeout triggers archetype-appropriate default action.

**Rationale**: Mimics online poker flow. Configurable duration accommodates different learning paces. Timeout defaults add realism (real players sometimes fold from inaction).

### Decision 5: Bet sizing tracked per street

**Chosen**: Track average bet sizes on pre-flop, flop, turn, and river separately. Flag patterns like "always bets 1/3 pot" or "polarized river betting."

**Rationale**: Different streets reveal different information. Pre-flop sizing shows aggression level, post-flop sizing reveals hand reading ability. Per-street tracking enables richer classification.

### Decision 6: Timing data as observable tell

**Chosen**: Record decision time for every action. Track fast vs slow distributions per action type (call/fold/raise/check). Skill level controls timing consistency — Novice has random timing, Ultra deliberately randomizes.

**Rationale**: Timing is a real poker skill. Teaches players to notice hesitation patterns. Adds depth without UI complexity (just track timestamps).

## Risks / Trade-offs

[Complex decision engine] → Start with simplified base strategy (position-based hand ranking) rather than full GTO solver. Can be upgraded incrementally.

[Performance with many tracked stats] → Observation tracking is per-hand, not per-decision. Stats update once per completed hand, keeping computation minimal.

[Classification accuracy in early hands] → System shows "insufficient data" until minimum sample size reached (~10 hands). Confidence starts low and increases. Player learns patience.

[UI complexity from many settings] → Table setup uses progressive disclosure: basic presets first, advanced options expandable. Info/feedback settings use simple dropdowns.

[Mistake injection feeling unrealistic] → Mistakes are probabilistic, not deterministic. Same bot plays differently each game. Testing with human review to calibrate feel.

## Migration Plan

No migration needed — this is new functionality added to an early-stage project (v0.0.1). Existing bot logic will be replaced entirely. The `PlayerSeat` type will extend with personality/skill fields, but existing fields remain unchanged for backward compatibility.

## Open Questions

- Should the base strategy use a simplified GTO approximation or position-based hand ranking charts? Hand charts are simpler to implement and debug.
- How granular should bet sizing tracking be? (e.g., track exact percentages vs bucket into "small/medium/large")
- Should timeout actions feel like mistakes (count against skill) or neutral decisions?