Back to Projects
Deep Reinforcement Learning · Combinatorial Search

DeepCubeA
Maltese Gear Cube
Solver

Applying the DeepCubeA algorithm to a puzzle with ~4.9 × 1019 states — using backward induction, a learned neural heuristic, and symmetry-augmented training to achieve a 100% solve rate at near-optimal path lengths.

PythonPyTorchCUDANumPySymmetry GroupsWeighted A*NetworkXMatplotlib
View on GitHub
Maltese Gear Cube
Problem Statement

The Maltese Gear Cube is a mechanical puzzle with a state space of approximately 4.9 × 1019 configurations — orders of magnitude larger than a standard Rubik's Cube. Classical search algorithms such as BFS and IDA* become computationally intractable at this scale without a highly accurate heuristic to prune the tree. The challenge: learn a cost-to-go function h(s) entirely through self-supervised backward induction — no human-designed move sequences, no domain-specific solver — then deploy it inside a batched weighted A* search that returns provably near-optimal solutions at 100% solve rate on any scrambled starting configuration.

Approach & Methodology
Phase 1 — Training
1
State Representation
Each Maltese Gear Cube configuration is encoded as a concatenated one-hot vector spanning all corner positions, corner orientations, edge positions, and gear-tooth orientations. This produces a high-dimensional binary input tensor that fully captures the puzzle state without ambiguity — feeding cleanly into the first linear layer of the network.
2
Backward Induction Data Generation
Starting from the solved state, apply sequences of k random legal moves to produce training pairs (s, d), where d = k is the exact cost-to-go. Sampling k uniformly across all depths guarantees coverage of the full difficulty spectrum without any BFS oracle.50M+ samples
3
Neural Network Training
A deep fully-connected network — 4 hidden layers × 2048 units, batch normalisation after every layer, ReLU activations — outputs a scalar h(s) ∈ ℝ≥0. Trained end-to-end via MSE loss L = (h(s) − d)2 with Adam on uniformly-sampled mini-batches until validation loss converges.4 × 2048 FC
4
Symmetry-Based Data Augmentation
The Maltese Gear Cube admits 48 geometric symmetries — rotations and reflections that map valid states to equivalent valid states with identical cost-to-go. Applying all 48 transforms to every training sample multiplies the effective dataset 48×, yielding a 40% reduction in wall-clock training time at zero additional data-collection cost.×48 aug
Trained weights frozen — switching to inference
Neural network deployed as a static heuristic inside the search loop
Phase 2 — Inference
5
Batched Weighted A* Search
Given any scrambled input state, a Batched Weighted A* (BWAS) search runs with priority f(s) = g(s) + w·h(s), where g(s) is moves taken, h(s) is the trained neural heuristic, and weight w = 0.6 governs the optimality–speed tradeoff. Nodes are evaluated in batches of B = 1 000 per GPU forward pass, cutting per-step latency by orders of magnitude.w = 0.6B = 1 000
6
Evaluation
Tested on 100 independently scrambled cubes across varying scramble depths. Solution length is compared against optimal paths computed via BFS on a reachable subgraph. The solver achieves a 100% solve rate with paths averaging just 1.08× the optimal move count — confirming near-optimality well within the theoretical w-bound.
Architecture
deepcubea-maltese-gear · architecture-diagram
Phase 1 — Training
Maltese Gear Cube State Space
~4.9 × 1019 configurations
Backward Induction Scrambler
k random moves from solved state · k ∈ [1, kmax]
Training Dataset
50M+ (state s, distance d) pairs
⟳ Symmetry Aug ×48
Deep Neural Network
4 × 2048 FC · BN · ReLU
Output: h(s) ∈ ℝ≥0
MSE Loss
L = (h(s) − d)2 · Adam optimiser
✓ Trained Model Weights
Converged heuristic · weights frozen
Trained h(s)
Phase 2 — Inference
Scrambled Input State
Any arbitrary starting configuration
Batched Weighted A* (BWAS)
w = 0.6 · batch size B = 1 000 nodes / iter
Priority Queue
f(s) = g(s) + w · h(s)
g(s): moves taken · h(s): NN heuristic
Move Generator + NN Evaluator
~40 legal moves / state · GPU batch forward pass
Goal Check
Solved → return path · else → re-insert to queue
✓ Optimal Solution Path
100% solve rate · avg 1.08× optimal length
Results & Impact
100%
Solve Rate
1.08×
Near-Optimal
50M+
Training States
40%
Training Speedup
100% solve rate across 100 independently scrambled Maltese Gear Cube instances at varying scramble depths — zero unsolved cases in the full evaluation set.
Solutions average 1.08× the theoretical optimal move count, verified against BFS on a reachable subgraph — well within the weighted A* bound imposed by w = 0.6.
Symmetry augmentation ×48 reduced training time to convergence by 40% with no additional data collection, exploiting all rigid-body symmetries of the Maltese Gear Cube.
BWAS with GPU batching (B = 1 000) solved all test instances within a bounded node-expansion budget, demonstrating scalability to a ~4.9 × 1019 state space without exhaustive search.