Blog

Implementing UL2 for Decoder-Only Language Models

An in-depth look at modeling considerationsRead More →

How does torch.compile speed up a transformer?

A case study of kernel fusion for a vision transformerRead More →

Transformer FLOPs

How to count FLOPs and why it's useful.Read More →

© Adam Casson.