Abdalrahman Wael builds AI software.
I build production systems, agents, and research infrastructure, then write notes on language models, coding, and the parts worth understanding.
Latest
Writing
8 June 2026
GPT-2 --> Llama 3, Part 2: Optimizing for generation
A walkthrough of why generation changes transformer attention, how KV cache cuts repeated work, and why Llama-style grouped-query attention shrinks inference memory.
Paper
13 May 2026
Dense vs Sparse Pretraining at Tiny Scale: Active-Parameter vs Total-Parameter Matching
A controlled tiny-scale pretraining study comparing dense and mixture-of-experts transformers under active-parameter and total-parameter matching.
Writing
8 May 2026
GPT-2 --> Llama 3, One Improvement At A Time
A thinking-out-loud walkthrough of moving a GPT-2-style block toward Llama 3 with RoPE, RMSNorm, and SwiGLU.