Название: C++ Low Latency: Multithreading and Hotpath Optimizations Автор: David Spuler Издательство: Aussie AI Labs Pty Ltd Год: 2025 Страниц: 362 Язык: английский Формат: pdf, epub, mobi Размер: 10.1 MB
Run faster! This book is about speeding up C++ for low latency programming in multithreaded environments and sequential code in C++ backends.
Low latency programming is coding an algorithm so that it completes the task in the fastest time. In many cases, this is effectively the “user response time” or the “round-trip time” for a computation. The main uses of low latency programming include: • AI kernels — latency is the time between submitting a query, and starting to get the answer back. • Embedded devices — the system must respond quickly, in real time (e.g., autonomous self-driving cars are a large embedded device). • High-Frequency Trading (HFT) — latency is the time it takes to submit, execute, and complete a trade. • Game engines — latency is ensuring that the characters or environment moves fast enough to be responsive to user inputs and to keep up with the frame rate.
Game engines have historically been written in C++, at least for all the low-level stuff dealing with frame rates and 3D animation. Similarly, high-frequency trading is usually running in C++ at the bottom level. You can also use C, which is the longstanding precursor to C++. The C programming language is obviously fast, as that was its key design point. C is not necessarily any faster than C++, so if you used only a C-like subset of C++, the two would be the same speed. However, using C does avoid the temptation to use some of the slower features that are available in the higher levels of C++.
Main applications:
AI LLM Inference Backends High-Frequency Trading (HFT) Game Engines
Main optimization topics:
C++ Multithreading optimizations General C++ efficiency tweaks
Contents:
Part I: Introduction to Low Latency 1. Low Latency Programming 2. Multithreading Optimizations 3. Hardware Acceleration 4. System Optimizations Part II: Multithreading Optimizations 5. False Sharing 6. Branch Prediction 7. Lock Contention 8. Hotpath Optimizations 9. Slowpath Removal 10. Cache Warming Part III: C++ Optimizations 11. Timing and Benchmarking 12. Bitwise Operations 13. Floating-Point Arithmetic 14. Arithmetic Optimizations 15. Compile-Time Optimizations 16. Pointer Arithmetic 17. Algorithm Speedups 18. Memory Optimizations 19. Loop Vectorization 20. AVX Intrinsics 21. Parallel Data Structures 22. Lookup Tables & Precomputation Appendix 1: C++ Slug Catalog
Скачать C++ Low Latency: Multithreading and Hotpath Optimizations
|