The improvements I made to the original code:
- Convert all raw allocation to shared_ptr to prevent memory leaks.
- Replace non-reentrant rand() with a copy of C++11 random number generator in every thread using thread_local to avoid sharing and locking.
- Replace the text-based image format with stb_image saving to PNG.
- Parallelize with C++17 parallel for_each, achieving a more than 5X performance on my 6 core CPU.