Low-level systems engineer working at the boundary of hardware, GPUs and ML performance
I work close to the metal.
My background is in CPU architecture, verification and memory systems, but my day-to-day curiosity lives in low-level programming: GPU kernels, performance modeling, and how real workloads stress real hardware.
I care about understanding systems end-to-end, from cache lines and warps up to training loops and frameworks, and using that understanding to make things faster and more predictable.
A hands-on deep dive into GPU programming, kernel design and performance behavior.
Repo:
https://github.com/theRTLmaker/CUDA_in_100_days
- CUDA and GPU kernel optimization
- GPU memory hierarchies and profiling
- Performance modeling and benchmarking
- ML training workloads and system bottlenecks
- Low-level C++ and performance-oriented Python
- CUDA programming and GPU profiling tools
- CPU microarchitecture, caches and coherency
- SystemVerilog and hardware-software interfaces
- GPU and accelerator programming
- ML systems and performance engineering
- Hardware-aware software design
- Debugging at uncomfortable layers
- Making abstractions earn their keep
Always up for conversations about GPUs, low-level systems, performance engineering or how software really hits the hardware.
