Blog
Implementing UL2 for Decoder-Only Language Models
An in-depth look at modeling considerationsRead More →
How does torch.compile speed up a transformer?
A case study of kernel fusion for a vision transformerRead More →
Transformer FLOPs
How to count FLOPs and why it's useful.Read More →