Longformer is a 2020 attempt to address the efficiency problem of self-attention. Self-attention, where each token of an
input sequence attends to every other token, has an inherent quadratic time and memory complexity. Longformer addresses
this by attending mostly locally.