REMI: Reconstructing Episodic Memory During Internally Driven Path Planning

NeurIPS 2025
Zhaoze Wang1
zhaoze@seas.upenn.edu
Genela Morris2,3†
genelam@tlvmc.gov.il
Dori Derdikman4†
derdik@technion.ac.il
Pratik Chaudhari1†
pratikac@seas.upenn.edu
Vijay Balasubramanian5,6,7†
vijay@physics.upenn.edu
1 Dept. of Electrical and Systems Eng., Univ. of Pennsylvania   2 Tel Aviv Sourasky Medical Center
3 Gray Faculty of Medical and Health Sciences, Tel Aviv University   4 Rappaport Faculty of Medicine, Technion - Israel Institute of Technology
5 Dept. of Physics, Univ. of Pennsylvania   6 Santa Fe Institute   7 Rudolf Peierls Centre for Theoretical Physics, University of Oxford
Equal contribution
📄 Paper 💻 Code (Coming Soon) 🏠 Simulation Suite 👁️ Vision Encoder

TL;DR
REMI is a unified, system-level theory of how the brain uses multiple spatial representations to achieve cue-triggered goal retrieval. We tested our framework in both idealized simulations and realistic settings using Habitat Sim.
Background
Decades of research have revealed multiple types of cells that support animals' navigation and localization. For example, grid cells in the MEC fire in hexagonal patterns, while place cells in the hippocampus fire at specific locations.
Research Question
Grid Cells
Example of grid cells
Place Cells
Example of place cells

A Connectivity model of HC-MEC Loop
To approach this problem, we first revisit known theories about spatial navigation cell types. HC-MEC Circuit
We propose that a simple extension of the place cell auto-association theory can connect all these ideas and generate several testable predictions.

RNN Implementation
To test this hypothesis, we trained a unified RNN that incorporates all five cell types within a single recurrent layer.

SMC units learn to autoencode sensory inputs, while grid cells learn to path-integrate based on speed and head direction signals, both with added masking and noise. The hippocampal module receives no direct input or output; instead, it learns to denoise activity from the MEC through recurrence. After training, we observed the spontaneous emergence of place cells in this region.

RNN Equation RNN Loss

We call this the HC-MEC model.

Testing Cue-Triggered Querying
After training, we sampled sensory cues at fixed locations to query the network.

Query Definition
The left plot shows how subregion responses evolve toward the goal response. The surface represents the population manifold; the green dot marks the goal, and the trajectory shows the evolution of responses, visualized in 3D. The right plot shows how these responses decode into 2D locations using nearest neighbor search.

Recall Manifold Recall Decode

Extending HC-MEC for Planning
The previous experiment shows how cues can trigger retrieval of goal grid cell patterns. Once the goal pattern is retrieved, prior studies suggest that planning can be more efficiently performed on the grid cell manifold. To test this, we expanded the trained HC-MEC network's recurrent matrix to include an additional planner subnetwork.

HC-MEC Weight HC-MEC Planning Weight


During planning, the planner receives the goal grid state, and the HC-MEC network is initialized with the current sensory and grid cell states. The planner projects only to the speed and head-direction regions to control HC-MEC dynamics.

Reconstructing Memory During Planning on the Grid Cell Manifold
We collected the responses of the sensory and grid cell subregions over all planning timesteps. Each latent state was decoded into locations using nearest neighbor search. We found that not only did grid cell responses decode into trajectories connecting the two locations, but the sensory region also decoded into trajectories. This decoded trajectory indicates that sensory experiences were reconstructed from intermediate grid states.

Planning Trajectories


Navigation with Realistic Visual Inputs and a Decodable Vision Encoder (BtnkMAE)
Finally, we asked whether the framework could extend to realistic navigation tasks.

To test this, we replaced simulated sensory responses with visual feature embeddings extracted from Habitat Sim. We captured panoramic images at each spatial location with a fixed direction and passed them through a general-purpose vision encoder that maps each image to a 1024-dimensional embedding vector.

Bottleneck Ratemap Bottleneck MAE

For this purpose, we trained a modified masked autoencoder (MAE) on ImageNet-1k. We added a bottleneck layer so that each image maps to a 1024-dimensional vector while remaining reconstructable back to images.

Using this visual feature map, we repeated the experiments and recorded the sensory subregion responses during training. When decoding these intermediate states with BtnkMAE's decoder, all states could be reconstructed into realistic images.


Acknowledgments and Funding
This work was supported by NIH CRCNS grant 1R01MH125544-01 and partially by the NSF and DoD OUSD (R&E) under Agreement PHY-2229929 (The NSF AI Institute for Artificial and Natural Intelligence). Additional support came from the United States-Israel Binational Science Foundation (BSF). PC was supported by the National Science Foundation (IIS-2145164, CCF-2212519) and the Office of Naval Research (N00014-22-1-2255). Vijay Balasubramanian was supported in part by the Eastman Professorship at Balliol College, Oxford.

Cite this Paper

@inproceedings{wang2025remi,
  title={REMI: Reconstructing Episodic Memory During Internally Driven Path Planning},
  author={Zhaoze Wang and Genela Morris and Dori Derdikman and Pratik Chaudhari and Vijay Balasubramanian},
  booktitle={Proceedings of the 2025 Conference on Neural Information Processing Systems (NeurIPS)},
  year={2025}
}