REMI - Zhaoze Wang

¹ Dept. of Electrical and Systems Eng., Univ. of Pennsylvania   ² Tel Aviv Sourasky Medical Center
³ Gray Faculty of Medical and Health Sciences, Tel Aviv University   ⁴ Rappaport Faculty of Medicine, Technion - Israel Institute of Technology
⁵ Dept. of Physics, Univ. of Pennsylvania   ⁶ Santa Fe Institute   ⁷ Rudolf Peierls Centre for Theoretical Physics, University of Oxford

TL;DR

REMI is a unified, system-level theory of how the brain uses multiple spatial representations to achieve cue-triggered goal retrieval. We tested our framework in both idealized simulations and realistic settings using Habitat Sim.

Background

Decades of research have revealed multiple types of cells that support animals' navigation and localization. For example, grid cells in the MEC fire in hexagonal patterns, while place cells in the hippocampus fire at specific locations.
Research Question

Example of grid cells

Example of place cells

A Connectivity model of HC-MEC Loop

To approach this problem, we first revisit known theories about spatial navigation cell types.

Grid cells may emerge through path integration of displacement vectors.
Spatially modulated cells in the MEC encode sensory stimuli.
Place cells can auto-associate sensory inputs, potentially reflected by spatially modulated cells.
Meanwhile, grid cell population manifolds form efficient bases for spatial planning because of their context-independent nature.

We propose that a simple extension of the place cell auto-association theory can connect all these ideas and generate several testable predictions.

Place cells may associate not only sensory signals reflected by spatially modulated cells, but also sensory signals with grid cells.
Given a partial sensory cue, a place cell can reconstruct the goal grid cell pattern.
With this recalled pattern, an animal can plan along the grid cell manifold.
Planning then produces sequences of intermediate grid states, which place cells could use to reconstruct sensory experiences along the planned path.

RNN Implementation

To test this hypothesis, we trained a unified RNN that incorporates all five cell types within a single recurrent layer.

SMC units learn to autoencode sensory inputs, while grid cells learn to path-integrate based on speed and head direction signals, both with added masking and noise. The hippocampal module receives no direct input or output; instead, it learns to denoise activity from the MEC through recurrence. After training, we observed the spontaneous emergence of place cells in this region.

We call this the HC-MEC model.

Testing Cue-Triggered Querying

After training, we sampled sensory cues at fixed locations to query the network.

Query Definition

The left plot shows how subregion responses evolve toward the goal response. The surface represents the population manifold; the green dot marks the goal, and the trajectory shows the evolution of responses, visualized in 3D. The right plot shows how these responses decode into 2D locations using nearest neighbor search.

Extending HC-MEC for Planning

The previous experiment shows how cues can trigger retrieval of goal grid cell patterns. Once the goal pattern is retrieved, prior studies suggest that planning can be more efficiently performed on the grid cell manifold. To test this, we expanded the trained HC-MEC network's recurrent matrix to include an additional planner subnetwork.

During planning, the planner receives the goal grid state, and the HC-MEC network is initialized with the current sensory and grid cell states. The planner projects only to the speed and head-direction regions to control HC-MEC dynamics.

Reconstructing Memory During Planning on the Grid Cell Manifold

We collected the responses of the sensory and grid cell subregions over all planning timesteps. Each latent state was decoded into locations using nearest neighbor search. We found that not only did grid cell responses decode into trajectories connecting the two locations, but the sensory region also decoded into trajectories. This decoded trajectory indicates that sensory experiences were reconstructed from intermediate grid states.

Planning Trajectories

Navigation with Realistic Visual Inputs and a Decodable Vision Encoder (BtnkMAE)

Finally, we asked whether the framework could extend to realistic navigation tasks.

To test this, we replaced simulated sensory responses with visual feature embeddings extracted from Habitat Sim. We captured panoramic images at each spatial location with a fixed direction and passed them through a general-purpose vision encoder that maps each image to a 1024-dimensional embedding vector.

For this purpose, we trained a modified masked autoencoder (MAE) on ImageNet-1k. We added a bottleneck layer so that each image maps to a 1024-dimensional vector while remaining reconstructable back to images.

Using this visual feature map, we repeated the experiments and recorded the sensory subregion responses during training. When decoding these intermediate states with BtnkMAE's decoder, all states could be reconstructed into realistic images.

Acknowledgments and Funding

This work was supported by NIH CRCNS grant 1R01MH125544-01 and partially by the NSF and DoD OUSD (R&E) under Agreement PHY-2229929 (The NSF AI Institute for Artificial and Natural Intelligence). Additional support came from the United States-Israel Binational Science Foundation (BSF). PC was supported by the National Science Foundation (IIS-2145164, CCF-2212519) and the Office of Naval Research (N00014-22-1-2255). Vijay Balasubramanian was supported in part by the Eastman Professorship at Balliol College, Oxford.

Cite this Paper

@inproceedings{wang2025remi,
  title={REMI: Reconstructing Episodic Memory During Internally Driven Path Planning},
  author={Zhaoze Wang and Genela Morris and Dori Derdikman and Pratik Chaudhari and Vijay Balasubramanian},
  booktitle={Proceedings of the 2025 Conference on Neural Information Processing Systems (NeurIPS)},
  year={2025}
}