AI Learns Super Mario World Through Neuroevolution

A youtube by Seing — researched and verified by Depth

7/8

●●●●●●●○ Credibility Score

Verified process with strong sources and practical application

📝 What They Said

A program called MarI/O, written by SethBling in Lua using the NEAT algorithm, learned to beat Super Mario World's Donut Plains 1 level from zero knowledge in 34 generations over 24 hours — demonstrating how neuroevolution can grow neural network topology and weights simultaneously, mirroring biological evolution, to solve real-time control tasks without any hand-designed architecture.

1 MarI/O starts with zero domain knowledge — no hardcoded rules, no pre-trained weights, no understanding that pressing right moves toward the goal.
2 The neural network takes a 13×13 grid of tiles around Mario as input (white = solid block, black = enemy/hazard) and outputs 8 SNES controller buttons (A, B, X, Y, Up, Down, Left, Right).
3 Generation zero networks are randomly initialized and mostly idle; the simulation is cut off if Mario stands still too long, forcing rapid cycling through candidates.
4 Connections are signed: positive (green) connections pass the input signal through unchanged; negative (red) connections invert it — enabling conditional logic like 'jump when an enemy is detected.'
5 Fitness is a function of rightward distance and speed; only the highest-fitness genomes are selected to breed the next generation via crossover and mutation.
6 After 34 generations of selection, crossover, and random mutation across a 24-hour training session, MarI/O completed Donut Plains 1 without dying and achieved a fitness score above 4,000.
7 The algorithm is NEAT (NeuroEvolution of Augmenting Topologies), published by Kenneth O. Stanley and Risto Miikkulainen in 2002, which evolves both network weights and topology simultaneously starting from minimal structure.
8 MarI/O was written from scratch in Lua as a plugin for the BizHawk emulator; the original script (~1,200 lines) is hosted on Pastebin at https://pastebin.com/ZZmSNaHX.

🔬 What We Found

MarI/O + NEAT — What It Is

MarI/O is a Lua script written by SethBling (YouTube creator) that implements the NEAT algorithm to autonomously learn to play Super Mario World on the SNES. The original script (~1,200 lines) is at https://pastebin.com/ZZmSNaHX and runs inside the BizHawk emulator (Windows-only for Lua scripting). The video was published in 2015 and remains one of the most-watched demonstrations of neuroevolution.

NEAT (NeuroEvolution of Augmenting Topologies) is a genetic algorithm for evolving artificial neural networks, developed by Kenneth O. Stanley and Risto Miikkulainen at the University of Texas at Austin, published in Evolutionary Computation (2002, Vol. 10, No. 2, pp. 99–127). DOI: 10.1162/106365602320169811. The full paper PDF is freely available at: https://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf. Stanley's official NEAT page with all papers and implementations: https://www.cs.ucf.edu/~kstanley/neat.html.

The canonical Python implementation is neat-python by CodeReclaimers: https://github.com/CodeReclaimers/neat-python — pure Python, no dependencies beyond the standard library, licensed under 3-clause BSD, supports Python 3.8–3.14 and PyPy3.

An improved fork of MarI/O with bug fixes and enhanced fitness functions: https://github.com/mam91/neat-genetic-mario.

How It Works — Technical Detail

The three pillars of NEAT (each is necessary; ablation studies show removing any one degrades performance severely):

Historical Markings (Innovation Numbers): Every new gene (connection or node) added by mutation receives a globally incrementing innovation number. This acts as a chronological tag, allowing NEAT to align genes from two differently-structured parent networks during crossover without expensive topological analysis — solving the "competing conventions" problem that plagued earlier topology-evolving algorithms.
Speciation: The population is divided into species based on topological similarity (measured via a compatibility distance formula using excess genes, disjoint genes, and average weight differences). Individuals compete primarily within their own species, not against the whole population. This protects new structural innovations — which initially reduce fitness — giving them time to optimize before facing elimination. Without speciation, random-starting NEAT was 7× slower and failed to find a solution within 1,000 generations 5% of the time.
Incremental Growth from Minimal Structure: NEAT starts every run with networks containing only input and output nodes — no hidden layers. Structure is added only when mutations insert new nodes (by splitting an existing connection) or new connections. This keeps the search space minimal and prevents bloat.

Two mutation types:
- Add connection: Links two previously unconnected nodes; assigned the next innovation number.
- Add node: Splits an existing connection, disabling it and inserting a new node with two new connections. The new node initially acts as an identity function, minimizing disruption.

MarI/O specifics:
- Input grid: BoxRadius = 6, so InputSize = (6×2+1)² = 169 tiles + 1 bias = 170 inputs.
- Outputs: 8 buttons (A, B, X, Y, Up, Down, Left, Right) for SMW; 6 for SMB.
- Population: 300 genomes per generation (visible in source code).
- Fitness: rightward X-position + time bonus.
- Save state: DP1.state (Donut Plains 1 start) must be placed in both the Lua folder and BizHawk root.
- The original save/load function in SethBling's script crashes on load; community forks fix this.

BizHawk constraint: Lua scripting is only available on Windows. Mac/Linux users must use Wine.

How To Apply This

Path A: Run MarI/O exactly as SethBling did (Windows, BizHawk + Lua)

Download BizHawk from https://github.com/TASEmulators/BizHawk/releases. Run the prerequisite installer first, then extract and launch EmuHawk.exe.
Obtain a legal Super Mario World ROM (Super Mario World (USA).sfc). If you own the cartridge, dump it yourself. The ROM must match the filename the script expects.
Get the MarI/O script. Use the improved community fork instead of the original (which has a broken save/load): clone https://github.com/mam91/neat-genetic-mario and place the neat-mario folder inside BizHawk\Lua\SNES\.
Configure the script. Open config.lua and set _M.BizhawkDir to your BizHawk installation path.
Create a save state. In BizHawk, load the ROM, navigate to the start of Donut Plains 1, then go to File > Save State > Save Named and name it DP1.state. Copy this file to both the Lua\SNES\neat-mario\ folder and the BizHawk root directory.
Load the script. In BizHawk, open Tools > Lua Console, then Script > Open Script, and select mario-neat.lua. The NEAT control window will appear — click Start.
Let it run. Early generations will mostly idle or walk a few steps. Meaningful progress typically appears after 5–10 generations. A full run to level completion takes hours (SethBling's took 24 hours at normal emulation speed; you can increase emulation speed in BizHawk to accelerate training significantly).
Save your pool periodically. The improved fork fixes the crash-on-load bug. Use the save button in the NEAT control window to checkpoint progress between sessions.

Path B: Apply NEAT to your own problem in Python

pip install neat-python

import neat, os

# 1. Define your fitness function
def eval_genomes(genomes, config):
    for genome_id, genome in genomes:
        net = neat.nn.FeedForwardNetwork.create(genome, config)
        # Run your simulation, get a score
        genome.fitness = your_simulation(net)  # must be >= 0

# 2. Load config (copy from neat-python/examples/xor/config-feedforward)
config = neat.Config(
    neat.DefaultGenome,
    neat.DefaultReproduction,
    neat.DefaultSpeciesSet,
    neat.DefaultStagnation,
    'config-feedforward'  # path to your config file
)

# 3. Run evolution
p = neat.Population(config)
p.add_reporter(neat.StdOutReporter(True))
p.add_reporter(neat.StatisticsReporter())
winner = p.run(eval_genomes, n=50)  # 50 generations
print('Best genome:', winner)

Start with the XOR example at neat-python/examples/xor/ — it solves in ~20 generations and is the canonical "hello world" for NEAT. Documentation: https://neat-python.readthedocs.io/

Key config parameters to tune:
- pop_size: population size (default 150; MarI/O uses 300)
- fitness_threshold: stop when this fitness is reached
- compatibility_threshold: controls species granularity (lower = more species)
- node_add_prob / conn_add_prob: structural mutation rates

What The Creator Didn't Mention

1. NEAT's core mechanism — innovation numbers — was glossed over. The video shows speciation visually but never explains why crossover between differently-structured networks works. The answer is innovation numbers: every new gene gets a globally incrementing tag, allowing NEAT to align homologous genes across networks of different sizes without topological analysis. This is the algorithm's most important contribution.

2. The "10% of your brain" claim is neuroscience myth. SethBling uses it as an analogy for sparse network activation. The claim that humans only use 10% of their brains is scientifically false — brain imaging shows activity throughout the brain. The analogy is harmless but misleading.

3. The original script's save/load is broken. The Pastebin version crashes every time you try to load a saved pool state. Community forks (mam91/neat-genetic-mario, SngLol/NEATEvolve) fix this — use them instead.

4. Training is level-specific and does not generalize. The learned network is optimized for Donut Plains 1 only. For each new level, you must restart training from scratch with a new save state. NEAT does not learn transferable representations.

5. BizHawk Lua scripting is Windows-only. Mac and Linux users must use Wine, which adds setup complexity. An alternative is the FCEUX port (https://github.com/juvester/mari-o-fceux) which runs on Linux.

6. NEAT does not scale well to high-dimensional inputs. The major limitation of vanilla NEAT is that it evolves a single network that must simultaneously extract features and select actions. For raw pixel inputs or large state spaces, this becomes intractable. The research community has addressed this with HyperNEAT (scales to millions of connections via indirect encoding), DeepNEAT (evolves layer-level topology), and hybrid approaches like NEAT+PPO. MarI/O sidesteps this by using a hand-crafted 13×13 tile grid rather than raw pixels — a significant design choice the video doesn't highlight.

7. Alternatives to NEAT for game-playing AI:
- PPO / DQN (deep RL): Gradient-based, far more sample-efficient on complex tasks, requires GPU but handles raw pixels natively. OpenAI's PPO beat many Atari games; NEAT cannot match this at scale.
- Evolution Strategies (OpenAI ES): Gradient-free like NEAT but parallelizes trivially across CPUs; used by OpenAI to train MuJoCo locomotion policies.
- HyperNEAT: Direct descendant of NEAT, evolves large-scale geometric connectivity patterns; better for spatially structured problems.
- neat-python + Gymnasium: The modern way to apply NEAT to control tasks — includes BipedalWalker, InvertedDoublePendulum, and Hopper examples out of the box.

✓ Verified Claims

✅

The algorithm is called NEAT, which stands for NeuroEvolution of Augmenting Topologies, based on a paper by Kenneth Stanley and Risto Miikkulainen.

The paper is confirmed: Stanley and Miikkulainen, University of Texas at Austin, published in Evolutionary Computation journal in 2002, DOI 10.1162/106365602320169811.

— Source

✅

Mario was written from scratch in Lua as a plug-in for an emulator called BizHawk.

Multiple GitHub forks confirm the script is Lua for BizHawk; the emulator name in the video is slightly misspelled ('Bisok') but is definitively BizHawk.

— Source

❓

It took 34 generations before Mario was able to finish the level without dying and achieve a fitness score above 4,000.

The 34-generation figure is specific to SethBling's single run; NEAT is stochastic and results vary across runs — no published benchmark confirms this exact figure.

— Source

✅

The neural network has inputs (simplified level view) and outputs (the eight controller buttons).

Source code confirms 8 buttons for SMW (A, B, X, Y, Up, Down, Left, Right) and a 13×13 tile grid (BoxRadius=6) as inputs, totaling 169+1=170 inputs.

— Source

✅

NEAT builds neural networks from scratch using genetic algorithms without presupposing the best structure.

The paper explicitly states NEAT starts from minimal structure (no hidden nodes) and grows complexity only as beneficial mutations survive selection.

— Source

✅

NEAT includes ideas for separating genomes into species, which a lot of genetic algorithms don't try to do.

Speciation is one of NEAT's three core innovations; the paper confirms it was novel in the context of topology-evolving neural networks (TWEANNs).

— Source

❓

The training session lasted 24 hours.

No independent source confirms the 24-hour duration; it is plausible given BizHawk's default emulation speed and 34 generations with population 300, but cannot be verified.

— Source

❌

You only use 10% of your brain (used as analogy for sparse network activation).

The '10% of your brain' claim is a well-documented neuroscience myth; brain imaging shows activity throughout the brain. SethBling uses it loosely as an analogy, not a scientific claim.

— Source

→ Suggested Actions

medium

Install BizHawk on Windows, clone https://github.com/mam91/neat-genetic-mario, obtain a legal SMW ROM, create the DP1.state save state, and run MarI/O at 4x emulation speed to observe NEAT learning firsthand within a weekend

Direct hands-on experience with the system makes abstract concepts like speciation and fitness progression concrete and observable; accelerated emulation speed cuts the 24-hour run to ~6 hours

quick

Run the neat-python XOR example locally: pip install neat-python, copy the config-feedforward file from neat-python/examples/xor/, run evolve.py, and instrument it to print species count and innovation numbers each generation to see the three NEAT pillars in action

XOR solves in ~20 generations and is the canonical NEAT hello-world; adding print statements to expose innovation numbers and species boundaries builds intuition for the algorithm's core mechanisms before tackling harder problems

medium

Read the original NEAT paper (https://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf) focusing specifically on Section 3 (historical markings), Section 4 (speciation), and Table 1 (ablation results showing 7x slowdown without speciation)

The video glosses over innovation numbers entirely; the paper's ablation study quantifies exactly what each pillar contributes, giving you the ability to make informed decisions when tuning NEAT for your own problems

medium

Apply neat-python to a Gymnasium control task: pip install neat-python gymnasium, then wire NEAT's eval_genomes to CartPole-v1 or BipedalWalker-v3, log fitness per generation, and compare convergence speed against a PPO baseline from stable-baselines3

Directly benchmarking NEAT against gradient-based RL on the same task produces concrete data on where neuroevolution is competitive versus where it falls short, informing when to choose each approach

medium

Modify MarI/O's fitness function in config.lua to add a bonus for enemy kills or coins collected (beyond rightward X-position), retrain from generation 0, and compare the learned behavior and network topology against the original fitness function

Fitness function design is the single highest-leverage variable in any evolutionary system; this experiment makes the reward-shaping tradeoff visceral and teaches you how NEAT exploits whatever objective you give it

heavy

Train MarI/O on three different SMW levels (Donut Plains 1, Yoshi's Island 1, Vanilla Dome 1) using separate save states, then compare the final network topologies and generation counts to quantify how level-specific the learned solutions are

This directly tests the generalization limitation identified in the research — that NEAT learns level-specific solutions — and produces concrete evidence for or against transfer learning claims, which is critical if you plan to use NEAT in production robotics or game AI

heavy

Implement a minimal NEAT from scratch in ~200 lines of Python (innovation counter as a global dict, compatibility distance function, speciation loop, mutation functions) without using neat-python, then verify it solves XOR

Writing NEAT from scratch forces you to resolve every ambiguity in the paper — how to handle disjoint vs excess genes, when to reset innovation numbers, how to assign offspring counts per species — producing deep algorithmic understanding that library use cannot provide

heavy

Read the HyperNEAT paper (Stanley et al. 2007, available at https://www.cs.ucf.edu/~kstanley/neat.html) and prototype a HyperNEAT experiment on a spatially structured task like a 2D maze, comparing network size and convergence against vanilla NEAT

HyperNEAT is the direct answer to NEAT's scaling limitation identified in the research; understanding indirect encoding is the necessary next step if you want to apply neuroevolution to robotics or vision tasks with large input spaces

💡 Go Deeper

How does NEAT's compatibility distance formula (excess genes coefficient c1, disjoint genes coefficient c2, weight difference coefficient c3) interact with compatibility_threshold to determine species count, and what are principled strategies for tuning these hyperparameters for a new domain?

What specific mechanisms do HyperNEAT and ES-HyperNEAT use to encode connectivity patterns geometrically, and under what input dimensionality does the crossover from vanilla NEAT to HyperNEAT become necessary for tractable training?

How does OpenAI Evolution Strategies (2017) differ from NEAT in parallelization strategy, gradient estimation, and sample efficiency, and which class of problems favors each approach?

What is the 'competing conventions problem' in neuroevolution, how did algorithms before NEAT (e.g., GNARL, ESP) fail to solve it, and why do innovation numbers specifically resolve it rather than other alignment strategies?

Can NEAT be combined with gradient-based weight optimization (e.g., evolve topology with NEAT, then fine-tune weights with backpropagation) and what does the literature show about whether this hybrid outperforms either approach alone?

How does MarI/O's hand-crafted 13x13 tile grid input representation compare to raw pixel input in terms of information content and search space size, and what would be required to make NEAT work directly on raw SNES pixel output?

What are the theoretical convergence guarantees (or lack thereof) for NEAT, and how does its performance scale empirically with population size, problem dimensionality, and fitness landscape ruggedness?

How have modern neuroevolution approaches like NEAT+PPO hybrids, DeepNEAT, and Weight Agnostic Neural Networks (Gaier & Ha 2019) extended the core NEAT ideas, and which has shown the most practical traction in robotics applications?

📄 Related Research

Want research like this for any video?
Save a link, get back verified intelligence.

Try Depth free →