Abdalrahman Wael builds AI software.

I build production systems, agents, and research infrastructure, then write notes on language models, coding, and the parts worth understanding.

Latest

Writing

8 June 2026

GPT-2 --> Llama 3, Part 2: Optimizing for generation

A walkthrough of why generation changes transformer attention, how KV cache cuts repeated work, and why Llama-style grouped-query attention shrinks inference memory.

Paper

13 May 2026

Dense vs Sparse Pretraining at Tiny Scale: Active-Parameter vs Total-Parameter Matching

A controlled tiny-scale pretraining study comparing dense and mixture-of-experts transformers under active-parameter and total-parameter matching.

Writing

8 May 2026

GPT-2 --> Llama 3, One Improvement At A Time

A thinking-out-loud walkthrough of moving a GPT-2-style block toward Llama 3 with RoPE, RMSNorm, and SwiGLU.