Time Makes Space: Emergence of Place Fields in Networks Encoding Temporally Continuous Sensory Experiences

Zhaoze Wang1†,  Ronald W. Di Tullio2†,  Spencer Rooke3,  Vijay Balasubramanian3,4,5
Equal contribution
1 Department of Computer and Information Science,   2 Neuroscience,   and 3 Physics, University of Pennsylvania;
4 Rudolf Peierls Centre for Theoretical Physics, University of Oxford;   and 5 Santa Fe Institute
NeurIPS 2024
Paper Code Video Related
Abstract (Infomal)
In 1976, John O'Keefe and colleagues discovered place cells in the hippocampus—neurons that fire when an animal occupies a specific location. This finding introduced the idea of a biological spatial map. Later studies revealed that these cells are not only tuned to physical position but also modulated by contextual and multi-sensory cues. Remarkably, the same population of place cells can produce different spatial maps in different environments and return to previous maps upon revisiting familiar ones—even after long delays or intervening learning. These properties suggest that the brain constructs spatial memory through multi-modal sensory integration and supports robust continual learning without catastrophic forgetting.

But how do such structured, flexible representations emerge? Can similar principles be observed in artificial networks trained on realistic experience streams? What kinds of learning objectives or inductive biases support these phenomena?

In this work, we trained a recurrent autoencoder to reconstruct high-dimensional, temporally continuous sensory signals collected by an agent traversing simulated environments. Without any explicit spatial supervision, the network developed internal representations strongly reminiscent of biological place cells: units fired in localized regions, remapped across different environments, and reverted upon returning to familiar spaces. These representations were stable over time, formed orthogonal spatial maps to prevent interference, and adapted gradually—mimicking representational drift of Cornu Ammonis area (CA3) place cells. Our results suggest that spatial coding can arise naturally as a side effect of reconstructive memory over structured sensory streams, offering insights for continual representation learning in artificial agents.
Setup
We frame the encoding of space as the task of auto-associating sensory signals (potentially from different modalities) collected during an agent's random spatial traversal. In biological systems, this kind of mechanism may improve the reliability of localization and mapping, especially in dynamic environments where subsets of sensory inputs can vary over time—making any single input insufficient on its own.

To model this process, we use a masked autoencoding objective: a recurrent neural network (RNN) is trained to reconstruct missing parts of the sensory input as the agent explores a 2D environment.

Setup
Results
The RNN is trained on short sequences of sensory inputs collected along the agent’s random trajectories. After training, we find that the RNN also learns to encode spatial structure in its internal activity—showing patterns similar to place-like responses.

Spatial Maps

RNN Encoding

The left plot shows a sample unit’s spatial activity map in a 2D square room (averaging its responses at each location), and the right shows a unit’s spatial activity in a 3D room. These patterns are automatically learned and reflect sparse, spatially localized responses.

Continual Learning of Multiple Rooms

Place-like units in the hippocampus are known to form distinct spatial maps for different environments. In neuroscience, a “map” refers to the average activity pattern of units at different locations within a space. When the same group of units changes its pattern between environments, this is called remapping.

These units can form many such maps without forgetting the old ones, and can quickly return to previously learned maps when revisiting familiar environments. This reflects a form of continuous learning. We suggest that this ability does not require a specialized network architecture, but instead emerges naturally from the structure of the task.

Remap-Theory
That is, the autoencoding objective implicitly guides the network to switch between "learning" and "recall" depending on how familiar the current sensory experience is. Sensory inputs from each room form a \(n\)-D (where \(n\) is the navigable dimension) manifold in the network's latent space. When the agent is in a location shared by two rooms (i.e., the manifolds intersect), the network tends to recall. In unfamiliar regions, it learns. We also show that the structure of these sensory manifolds defines preferred directions in weight space, allowing learning updates to be naturally reversible—supporting recovery of old maps (see paper for details).

Remapping

The above plots show how the network can recover previously learned spatial maps, even after training on a new environment. T\(i\)R\(j\) means trial \(i\) in room \(j\).

Stability of Spatial Maps Over Time

Finally, we show that even after the network is continuously trained while the agent explores 20 different rooms over 30 cycles, the learned spatial representations remain stable—only drifting slowly over time. This gradual change reflects a shift in memory, rather than catastrophic forgetting.

The plot below shows the activity of sample units across repeated visits to the same room. Between each plot, the agent has explored (and the network has been trained on) all 20 rooms. We visualize the unit's response only when the agent re-enters the same environment, showing how the place-like patterns are preserved over time with minor drift.

Forget
Acknowledgements
We thank Dori Derdikman, Genela Morris, and Shai Abramson for many illuminating discussions in the course of this work, which was supported by NIH CRCNS grant 1R01MH125544-01 and in part by the NSF and DoD OUSD (R&E) under Agreement PHY-2229929 (The NSF AI Institute for Artificial and Natural Intelligence).

Vijay Balasubramanian was supported in part by the Eastman Professorship at Balliol College, Oxford.

We are also grateful to Pratik Chaudhari for his insightful suggestions on designing the RAE and analyzing its dynamics.
Cite this Paper

@inproceedings{wang2024time,
  title={Time Makes Space: Emergence of Place Fields in Networks Encoding Temporally Continuous Sensory Experiences},
  author={Zhaoze Wang and Ronald W. Di. Tullio and Spencer Rooke and Vijay Balasubramanian},
  booktitle={Proceedings of the 2024 Conference on Neural Information Processing Systems (NeurIPS)},
  year={2024}
}