Goal-directed behavior in navigation and abstract tasks relies on hierarchical planning: long sequences are organized into subgoals and temporally extended actions. While hierarchical model-based reinforcement learning (H-MBRL) formalizes this, most existing neural circuit models remain largely flat, leaving open how H-MBRL can be implemented in the brain.
We propose a multi-area model in which hippocampal replay drives subgoal discovery and prefrontal circuitry performs hierarchical planning over the learned structure. We define a normative objective that discovers hierarchy by maximizing the sum of within-cluster principal eigenvalues, and we derive a replay-driven neural network with a local plasticity rule that optimizes this objective. In simulations, the model learns topology-, goal-, and cue-sensitive hierarchies resembling those observed in animals. Building on this hierarchy, a prefrontal planning circuit computes subgoal values via hierarchical successor representations and selects option sequences through prospective, path-level winner-take-all dynamics to maximize expected reward. Together, these results provide a biologically grounded account of how replay can discover subgoals and how cortical interactions can implement hierarchical planning over them, offering a plausible neural realization of H-MBRL.