Hello, blog
A quick note on what this space is for and how it's organised.
Read more →Notes on machine learning, books & ideas
A quick note on what this space is for and how it's organised.
Read more →Attention is permutation-invariant — it treats 'the cat sat' identically to 'sat cat the' without help. Positional encoding is the elegant fix. Here's the sinusoidal construction and why it works.
Read more →One attention head is a single lens. Multi-head attention runs several lenses in parallel — each free to specialise on a different relationship type. Here's exactly how and why.
Read more →Before multi-head attention, before positional encoding, before the encoder-decoder stack — there's one core idea that makes Transformers work. Let's build it from scratch.
Read more →Kahneman's magnum opus on the two systems of thought — what still holds up, what's been replicated, and what I take away as a practitioner.
Read more →Press Esc to close · / to open