Skip to main content
Back to Projects

Pokémon Guessing Game

AI-powered web game using decision tree algorithms and statistical analysis to guess Pokémon through optimized question selection.

PythonMachine LearningSQLiteFlaskData ScienceAlgorithm Design
View on GitHub

Overview

Pokémon Guessing Game is an intelligent web-based application that uses decision tree algorithms and statistical analysis to identify which Pokémon a user is thinking of through strategic questioning. Inspired by Akinator, this project demonstrates practical applications of information theory, database optimization, and algorithmic decision-making.

Built as an academic project at Université Clermont Auvergne, the system analyzes a database of 251 Pokémon (Generations 1-2) using entropy-based question selection to minimize the average number of questions needed for identification.

Technical Architecture

Core Algorithm: Information Gain Maximization

The decision engine implements a greedy algorithm that selects questions based on maximum information gain:

def select_best_question(remaining_pokemon, attributes):
    """
    Selects attribute that best splits the dataset
    Target: proportion closest to 0.5 for balanced elimination
    """
    best_score = float('inf')
    best_attribute = None
    
    for attribute in attributes:
        proportion = calculate_proportion(remaining_pokemon, attribute)
        # Distance from perfect 50/50 split
        score = abs(0.5 - proportion)
        
        if score < best_score:
            best_score = score
            best_attribute = attribute
    
    return best_attribute

Algorithm Complexity:

  • Time Complexity: O(n × m) where n = remaining Pokémon, m = attributes
  • Space Complexity: O(n) for filtered dataset storage
  • Average Questions: 6-10 questions (log₂(251) ≈ 8 theoretical minimum)

Database Schema & Design

SQLite Database with normalized structure:

ColumnTypeConstraintDescription
idINTEGERPRIMARY KEYPokédex number (1-251)
nameTEXTNOT NULLPokémon identifier
is_legendaryBOOLEANNOT NULLLegendary status
has_evolutionBOOLEANNOT NULLEvolution capability
primary_colorVARCHAR(20)NOT NULLDominant color
type_primaryVARCHAR(20)NOT NULLMain type (18 types)
type_secondaryVARCHAR(20)NULLABLESecondary type
size_categoryVARCHAR(10)NOT NULLSmall/Medium/Large
weight_categoryVARCHAR(10)NOT NULLLight/Common/Heavy
bipedalBOOLEANNOT NULLTwo-legged stance
can_flyBOOLEANNOT NULLFlying capability
has_tailBOOLEANNOT NULLTail presence

Data Preprocessing:

# Continuous variable categorization
def categorize_size(height_meters):
    if height_meters < 1.0:
        return 'Small'
    elif height_meters <= 2.0:
        return 'Medium'
    else:
        return 'Large'

def categorize_weight(weight_kg):
    if weight_kg < 30:
        return 'Light'
    elif weight_kg <= 90:
        return 'Common'
    else:
        return 'Heavy'

Implementation Details

Technology Stack

Backend:

  • Python 3.9+ - Core application logic
  • pandas 1.3+ - Data manipulation and analysis
  • numpy 1.21+ - Statistical calculations
  • SQLite3 - Embedded database engine
  • Flask 2.0+ - Web framework for API and routing

Frontend:

  • HTML5 - Semantic markup structure
  • CSS3 - Responsive styling with Flexbox
  • JavaScript (ES6) - Asynchronous API communication
  • Fetch API - AJAX requests for dynamic updates

Development Tools:

  • Git - Version control
  • pytest - Unit testing framework
  • Black - Code formatting
  • pylint - Code quality analysis

Statistical Analysis Engine

Three-Tier Data Processing:

  1. Binary Variables (Legendary, Evolution, Bipedal, Flies, Tail):
def analyze_binary(dataframe, column):
    """Direct proportion calculation"""
    return dataframe[column].sum() / len(dataframe)
  1. Categorical Variables (Color, Types):
def analyze_categorical(dataframe, column):
    """Frequency distribution analysis"""
    value_counts = dataframe[column].value_counts()
    return value_counts / len(dataframe)
  1. Continuous Variables (Size, Weight):
def analyze_continuous(dataframe, column, bins):
    """Histogram-based categorization"""
    categorized = pd.cut(dataframe[column], bins=bins, labels=['Low', 'Mid', 'High'])
    return categorized.value_counts() / len(dataframe)

Web Application Architecture

Flask-based RESTful API:

from flask import Flask, request, jsonify, session
import sqlite3

app = Flask(__name__)
app.secret_key = 'your-secret-key'

@app.route('/api/start', methods=['POST'])
def start_game():
    """Initialize new game session"""
    session['remaining_pokemon'] = get_all_pokemon()
    session['question_count'] = 0
    return jsonify({'status': 'started'})

@app.route('/api/question', methods=['GET'])
def get_question():
    """Return next optimal question"""
    remaining = session.get('remaining_pokemon')
    attribute = select_best_question(remaining)
    question = generate_question(attribute)
    return jsonify({'question': question, 'attribute': attribute})

@app.route('/api/answer', methods=['POST'])
def process_answer():
    """Filter database based on answer"""
    answer = request.json.get('answer')
    attribute = request.json.get('attribute')
    
    remaining = filter_pokemon(session['remaining_pokemon'], attribute, answer)
    session['remaining_pokemon'] = remaining
    session['question_count'] += 1
    
    if len(remaining) == 1:
        return jsonify({'result': remaining[0], 'questions': session['question_count']})
    else:
        return jsonify({'continue': True, 'remaining': len(remaining)})

Session Management:

  • Server-side session storage using Flask-Session
  • Secure cookie-based session IDs
  • Automatic cleanup after game completion

Key Features

🎯 Optimal Question Selection

Uses information gain calculation to select questions that maximize dataset partitioning, ensuring minimal questions needed.

📊 Statistical Data Analysis

Employs pandas and numpy for efficient data manipulation and statistical computations on Pokémon attributes.

🔄 Dynamic Database Filtering

Real-time SQL query generation based on user responses, progressively narrowing the search space.

🌐 RESTful API Design

Clean separation between frontend and backend with JSON-based communication for scalability.

Performance Optimization

  • Database indexing on frequently queried columns
  • Memoization of statistical calculations
  • Lazy loading of Pokémon data

Algorithm Performance

Benchmark Results (251 Pokémon dataset):

MetricValue
Average Questions7.3
Worst Case12 questions
Best Case4 questions
Database Query Time<10ms
Question Generation<5ms
Total Response Time<50ms

Optimization Techniques:

  • Pre-computed statistical distributions
  • Indexed database queries
  • Cached question templates
  • Efficient data structures (sets for filtering)

Project Structure

pokemon-guessing-game/
├── app/
│   ├── __init__.py
│   ├── routes.py              # Flask routes
│   ├── algorithm.py           # Decision tree logic
│   ├── database.py            # SQLite operations
│   └── statistics.py          # Statistical analysis
├── data/
│   └── pokemon.db             # SQLite database
├── static/
│   ├── css/
│   │   └── style.css
│   └── js/
│       └── game.js            # Frontend logic
├── templates/
│   ├── index.html
│   └── game.html
├── tests/
│   ├── test_algorithm.py
│   └── test_database.py
├── requirements.txt
├── config.py
└── run.py                     # Application entry point

What I Learned

Algorithm Design & Analysis

  • Implementing greedy algorithms for optimization problems
  • Analyzing time/space complexity
  • Understanding information theory and entropy

Data Science Techniques

  • Statistical analysis with pandas/numpy
  • Data preprocessing and categorization
  • Feature engineering for decision trees

Web Development

  • Building RESTful APIs with Flask
  • Session management and state handling
  • Asynchronous frontend-backend communication

Software Engineering

  • Test-driven development with pytest
  • Code organization and modularity
  • Database design and normalization

Challenges & Solutions

Challenge 1: Continuous Variable Handling

Problem: Height (0.3m - 14.5m) and weight (0.1kg - 999kg) are continuous with wide ranges.

Solution: Applied domain knowledge to create meaningful categories:

  • Size: Based on human comparison (< 1m, 1-2m, > 2m)
  • Weight: Based on carrying capacity (< 30kg, 30-90kg, > 90kg)

Challenge 2: Optimal Question Selection

Problem: Random attribute selection led to 15+ questions on average.

Solution: Implemented information gain algorithm targeting 50/50 splits, reducing average to 7.3 questions.

Challenge 3: Web Integration

Problem: Initial CGI approach had performance issues and limited scalability.

Solution: Migrated to Flask framework with proper session management and RESTful API design.

Technologies Demonstrated

This project showcases proficiency in:

  • Algorithm Design - Decision trees, greedy algorithms, optimization
  • Data Science - Statistical analysis, data preprocessing, pandas/numpy
  • Web Development - Flask, RESTful APIs, session management
  • Database Management - SQLite, query optimization, schema design
  • Software Engineering - Testing, code organization, documentation

License

Academic Project - MIT License


Academic Context: Université Clermont Auvergne | Year: 2021-2022 | Course: Database & Algorithm Design