Simplicity is the ultimate sophistication.
TL;DR An experiment to show that a lock-free queue designed to service only a single producer and single consumer can complete 1B write-read transactions in less than a quarter (27%) of the time that a multiple producer and multiple consumer lock based queue will take. Further, replacing the modulus function with a bit mask leads to further performance gains (24%) and then using the different memory models appropriately results in yet more performance gains (18%), and very cool graphs can be drawn using plotly!