Positional Encoding

Self-attention (see attention-mechanism) is permutation-invariant — it computes the same output regardless of token order. The model has no inherent notion of sequence position, so positional information must be injected separately.

Sinusoidal Positional Encoding (2017)

The original Transformer paper used fixed sinusoidal functions:

$P E_{(p os, 2 i)} = sin (p os /1000 0^{2 i / d})$ $P E_{(p os, 2 i + 1)} = cos (p os /1000 0^{2 i / d})$

Advantages: no learned parameters, can extrapolate to longer sequences than seen in training. However, the absolute position signal proved suboptimal for many tasks.

Rotary Position Embedding (RoPE) — 2026 Standard

RoPE has become the de facto standard, adopted by Llama, Mistral, Gemma, and most modern models. It encodes position by rotating the Query and Key vectors by an angle proportional to token position:

Each pair of dimensions is treated as a 2D vector and rotated
The rotation angle increases with position
Attention scores naturally decay between distant tokens

Why RoPE won:

Provides a theoretically grounded relative position bias (attention depends on distance between tokens, not absolute position)
Compatible with linear attention and kv-cache optimization
Enables context extension techniques (YaRN, NTK-aware scaling)
Works out-of-the-box with Grouped-Query Attention

Other Approaches

ALiBi (Press et al., 2022): Adds a linear bias proportional to distance — simpler but less expressive
Learned absolute embeddings: GPT-2/3 used learned position embeddings — cannot extrapolate
RoPE + NoPE mixtures: Some models combine RoPE with no-position encodings for certain layers

References

Vaswani et al. — Attention Is All You Need (sinusoidal encoding)
Jun Yu Tan — Crystallization of Transformer Architectures (RoPE adoption trajectory)

Talos Research Wiki

Explorer

Positional Encoding

Positional Encoding

Sinusoidal Positional Encoding (2017)

Rotary Position Embedding (RoPE) — 2026 Standard

Other Approaches

References

Graph View

Table of Contents

Backlinks

Talos Research Wiki

Explorer

Positional Encoding

Positional Encoding

Sinusoidal Positional Encoding (2017)

Rotary Position Embedding (RoPE) — 2026 Standard

Other Approaches

Related Concepts

References

Graph View

Table of Contents

Backlinks