Deep Reinforcement Learning · Combinatorial Search

DeepCubeA
Maltese Gear Cube
Solver

Applying the DeepCubeA algorithm to a puzzle with ~4.9 × 10¹⁹ states — using backward induction, a learned neural heuristic, and symmetry-augmented training to achieve a 100% solve rate at near-optimal path lengths.

PythonPyTorchCUDANumPySymmetry GroupsWeighted A*NetworkXMatplotlib

View on GitHub

Problem Statement

The Maltese Gear Cube is a mechanical puzzle with a state space of approximately 4.9 × 10¹⁹ configurations — orders of magnitude larger than a standard Rubik's Cube. Classical search algorithms such as BFS and IDA* become computationally intractable at this scale without a highly accurate heuristic to prune the tree. The challenge: learn a cost-to-go function h(s) entirely through self-supervised backward induction — no human-designed move sequences, no domain-specific solver — then deploy it inside a batched weighted A* search that returns provably near-optimal solutions at 100% solve rate on any scrambled starting configuration.

Approach & Methodology

Phase 1 — Training

› State Representation

Each Maltese Gear Cube configuration is encoded as a concatenated one-hot vector spanning all corner positions, corner orientations, edge positions, and gear-tooth orientations. This produces a high-dimensional binary input tensor that fully captures the puzzle state without ambiguity — feeding cleanly into the first linear layer of the network.

› Backward Induction Data Generation

Starting from the solved state, apply sequences of k random legal moves to produce training pairs (s, d), where d = k is the exact cost-to-go. Sampling k uniformly across all depths guarantees coverage of the full difficulty spectrum without any BFS oracle.50M+ samples

› Neural Network Training

A deep fully-connected network — 4 hidden layers × 2048 units, batch normalisation after every layer, ReLU activations — outputs a scalar h(s) ∈ ℝ^≥0. Trained end-to-end via MSE loss L = (h(s) − d)² with Adam on uniformly-sampled mini-batches until validation loss converges.4 × 2048 FC

› Symmetry-Based Data Augmentation

The Maltese Gear Cube admits 48 geometric symmetries — rotations and reflections that map valid states to equivalent valid states with identical cost-to-go. Applying all 48 transforms to every training sample multiplies the effective dataset 48×, yielding a 40% reduction in wall-clock training time at zero additional data-collection cost.×48 aug

⚡

Trained weights frozen — switching to inference

Neural network deployed as a static heuristic inside the search loop

Phase 2 — Inference

› Batched Weighted A* Search

Given any scrambled input state, a Batched Weighted A* (BWAS) search runs with priority f(s) = g(s) + w·h(s), where g(s) is moves taken, h(s) is the trained neural heuristic, and weight w = 0.6 governs the optimality–speed tradeoff. Nodes are evaluated in batches of B = 1 000 per GPU forward pass, cutting per-step latency by orders of magnitude.w = 0.6B = 1 000

› Evaluation

Tested on 100 independently scrambled cubes across varying scramble depths. Solution length is compared against optimal paths computed via BFS on a reachable subgraph. The solver achieves a 100% solve rate with paths averaging just 1.08× the optimal move count — confirming near-optimality well within the theoretical w-bound.

Architecture

deepcubea-maltese-gear · architecture-diagram

Phase 1 — Training

Maltese Gear Cube State Space

~4.9 × 10¹⁹ configurations

↓

Backward Induction Scrambler

k random moves from solved state · k ∈ [1, k_max]

↓

Training Dataset

50M+ (state s, distance d) pairs

⟳ Symmetry Aug ×48

↓

Deep Neural Network

4 × 2048 FC · BN · ReLU
Output: h(s) ∈ ℝ^≥0

↓

MSE Loss

L = (h(s) − d)² · Adam optimiser

↓

✓ Trained Model Weights

Converged heuristic · weights frozen

Trained h(s)

→

Phase 2 — Inference

Scrambled Input State

Any arbitrary starting configuration

↓

Batched Weighted A* (BWAS)

w = 0.6 · batch size B = 1 000 nodes / iter

↓

Priority Queue

f(s) = g(s) + w · h(s)
g(s): moves taken · h(s): NN heuristic

↓

Move Generator + NN Evaluator

~40 legal moves / state · GPU batch forward pass

↓

Goal Check

Solved → return path · else → re-insert to queue

↓

✓ Optimal Solution Path

100% solve rate · avg 1.08× optimal length

Results & Impact

100%

Solve Rate

1.08×

Near-Optimal

50M+

Training States

40%

Training Speedup

100% solve rate across 100 independently scrambled Maltese Gear Cube instances at varying scramble depths — zero unsolved cases in the full evaluation set.

Solutions average 1.08× the theoretical optimal move count, verified against BFS on a reachable subgraph — well within the weighted A* bound imposed by w = 0.6.

Symmetry augmentation ×48 reduced training time to convergence by 40% with no additional data collection, exploiting all rigid-body symmetries of the Maltese Gear Cube.

BWAS with GPU batching (B = 1 000) solved all test instances within a bounded node-expansion budget, demonstrating scalability to a ~4.9 × 10¹⁹ state space without exhaustive search.

Back to Projects GitHub

DeepCubeAMaltese Gear CubeSolver

DeepCubeA
Maltese Gear Cube
Solver