Countries of Top 200 CodeProject Members

Note: the data is taken at 2019-01-26.

In CodeProject, a place where like-minded programmers share knowledge: There are 2 main ways to gain reputation points, either by answering questions in the Quick Answers section or writing quality articles. A third way to gain points is to post a good comment/message in forum or article thread that members upvote for it to gain visibility. The top 200 individual members (not including group account) by reputation statistic is as follows.

Highest:2207598
Lowest:85178
Median:138487.5
Average:202218

In total, there are 41 countries where the top 200 CodeProject members are hailed from. They are not necessarily country of birth but rather the country of work and residence. By sheer number, United States ranked undisputedly 1st for 56 members in top 200 while the United Kingdom and India are tied for the 2nd spot of having 25 members.

Interestingly, my birth country, Singapore is at the 9th spot, tied with Sweden and Belgium, though the National University of Singapore produces only 400 computer science graduates every year, most of the IT jobs in my country are filled by foreigners.

There is a country put down as Europe by the member, Wendelius, which I am sure he wants to keep it a secret. There are 4 female members in top 200 spots. 2 of them are former and current CodeProject editors who got the reputation points by article editing, we can exclude them; we only want to count those who got to that spot by contributions, not through day job at CP. The other 2 ladies are Snesh Prajapati(India)(93rd) and Mahsa Hassankashi(Iran)(178th). Well done, ladies!

Rank Countries Top 200 Members
1 United States 56
2 United Kingdom 25
3 India 25
4 Germany 11
5 Canada 9
6 Australia 8
7 Netherlands 6
8 Iran 5
9 Singapore 4
10 Sweden 4
11 Belgium 4
12 Serbia 3
13 Israel 3
14 Italy 3
15 Norway 2
16 Scotland 2
17 Malaysia 2
18 France 2
19 Russian Federation 2
20 Pakistan 2
21 Brazil 2
22 Estonia 1
23 Wales 1
24 Chile 1
25 Ireland 1
26 Slovakia 1
27 Poland 1
28 Ukraine 1
29 Spain 1
30 Kenya 1
31 Switzerland 1
32 South Africa 1
33 Thailand 1
34 Austria 1
35 Turkey 1
36 Egypt 1
37 Czech Republic 1
38 Greece 1
39 Mexico 1
40 Bangladesh 1
41 Europe 1

Let’s change the perspective and take the country population into account (score is calculated as members count divided by population in millions), Estonia takes the 1st spot though it only got 1 member and Singapore is in 2nd place, follows by Serbia. The 1st 5 spots are taken by countries with population less than 10 million. The previous 3 winners: the United States is in the 16th place while the United Kingdom is in 6th place and India in 31st place.

Rank Countries Population(mil) Top 200 Members Score
1 Estonia 1.316 1 0.759878
2 Singapore 5.612 4 0.712758
3 Serbia 7.022 3 0.427229
4 Sweden 9.995 4 0.4002
5 Norway 5.258 2 0.380373
6 United Kingdom 66.02 25 0.378673
7 Scotland 5.373 2 0.372232
8 Belgium 11.35 4 0.352423
9 Netherlands 17.08 6 0.351288
10 Israel 8.712 3 0.344353
11 Australia 24.6 8 0.325203
12 Wales 3.099 1 0.322685
13 Canada 36.71 9 0.245165
14 Ireland 4.784 1 0.20903
15 Slovakia 5.435 1 0.183993
16 United States 325.7 56 0.171937
17 Germany 82.79 11 0.132866
18 Switzerland 8.42 1 0.118765
19 Austria 8.773 1 0.113986
20 Czech Republic 10.58 1 0.094518
21 Greece 10.77 1 0.0928505
22 Malaysia 31.62 2 0.0632511
23 Iran 81.16 5 0.0616067
24 Chile 18.05 1 0.0554017
25 Italy 60.59 3 0.0495131
26 France 67.12 2 0.0297974
27 Poland 37.97 1 0.0263366
28 Ukraine 44.83 1 0.0223065
29 Spain 46.57 1 0.0214731
30 Kenya 49.7 1 0.0201207
31 India 1339 25 0.0186707
32 South Africa 56.72 1 0.0176305
33 Thailand 69.04 1 0.0144844
34 Russian Federation 144.5 2 0.0138408
35 Turkey 79.81 1 0.0125298
36 Egypt 97.55 1 0.0102512
37 Pakistan 197 2 0.0101523
38 Brazil 209.3 2 0.00955566
39 Mexico 129.2 1 0.00773994
40 Bangladesh 164.7 1 0.00607165
41 Europe 741.4 1 0.0013488

Well done, guys, keep the contributions coming!

UWP Storage Wrapper

Introduction

UWP Storage is a persistent name-value storage API for UWP apps. This article showcases a wrapper written in C++/CX that makes it easier to use in C++/CX. Though C# usage is already easy without the wrapper, the comparison of both with and without wrapper are still shown.

C++/CX Writing and Reading Primitive Value

In this C++/CX example, the code shows how to use UWP storage directly to store and read integer.

Windows::Storage::ApplicationDataContainer^ localSettings =
    Windows::Storage::ApplicationData::Current->LocalSettings;

localSettings->Values->Insert("RawKey", 123);
bool exists = localSettings->Values->HasKey("RawKey");
if (exists)
{
    Platform::Object^ obj = localSettings->Values->Lookup("RawKey");
    int result = safe_cast(obj);
}
localSettings->Values->Clear();

In this second C++/CX example, the code shows how to use UWP storage with the wrapper to store and read integer. The code is more concise. The second parameter of GetInt32 indicates if the name exists in the storage. If not, the third parameter will be returned as the default preferred value.

UWPStorage^ storage = ref new UWPStorage(StorageType::Local);
storage->SetInt32("WrapperKey", 456);
bool exists = false;
int result = storage->GetInt32("WrapperKey", &exists, 444);
storage->Clear();

C# Writing and Reading Primitive Value

This C# example shows how to use UWP storage directly to store and read integer.

Windows.Storage.ApplicationDataContainer localSettings =
    Windows.Storage.ApplicationData.Current.LocalSettings;

localSettings.Values["RawKey"] = 123;
int result = (int)(localSettings.Values["RawKey"]);
localSettings.Values.Clear();

This C# example shows how to use UWP storage directly to store and read integer. Similarly, The out parameter of GetInt32 indicates whether the name exists, else the third parameter will be returned. The code is not more concise. So there is no advantage to using the wrapper in C# except the wrapper can catch the exception for you when the casting to primitive type fails, instead you manually try-catching it.

UWPStorage storage = new UWPStorage(StorageType.Local);
storage.SetInt32("WrapperKey", 456);
bool exists = false;
int result = storage.GetInt32("WrapperKey", out exists, 444);
storage.Clear();

C++/CX Writing and Reading Primitive Composite

Composite is a group of name-value settings. The C++/CX code to directly store and read a composite with 2 name-values. One is integer while other is string. Later the composite is deleted.

Windows::Storage::ApplicationDataContainer^ localSettings =
    Windows::Storage::ApplicationData::Current->LocalSettings;

// Create a composite setting
ApplicationDataCompositeValue^ composite =
    ref new ApplicationDataCompositeValue();
composite->Insert("intVal", dynamic_cast
    (PropertyValue::CreateInt32(1212)));
composite->Insert("strVal", dynamic_cast
    (PropertyValue::CreateString("string")));

auto values = localSettings->Values;
values->Insert("RawCompositeSetting", composite);

// Read data from a composite setting
ApplicationDataCompositeValue^ composite2 =
    safe_cast
    (localSettings->Values->Lookup("RawCompositeSetting"));

if (composite2)
{
    // Access data in composite2["intVal"] and composite2["strVal"]
    int one = safe_cast
        (composite2->Lookup("intVal"))->GetInt32();
    String^ hello = safe_cast
        (composite2->Lookup("strVal"));
}
// Delete a composite setting
values->Remove("RawCompositeSetting");

The equivalent C++/CX code using UWP storage wrapper to store and read a composite with 2 name-values. One is integer while other is string. As you can see, the code is less verbose and saves many typings.

UWPStorage^ storage = ref new UWPStorage(StorageType::Local);
// Create a composite setting
CompositeValue^ composite = ref new CompositeValue();
composite->SetInt32("intVal", 3434);
composite->SetString("strVal", "string");

storage->SetComposite("WrapperCompositeSetting", composite);

// Read data from a composite setting
bool exists = false;
CompositeValue^ composite2 = storage->GetComposite
    ("WrapperCompositeSetting", &exists);
if (exists)
{
    // Access data in composite2["intVal"] and composite2["strVal"]
    int one = composite2->GetInt32("intVal", &exists, 4444);
    String^ hello = composite2->GetString("strVal", &exists, "error");
}
// Delete a composite setting
storage->Remove("WrapperCompositeSetting");

C# Writing and Reading Primitive Composite

The C# code to directly store and read a composite with 2 name-values. One is integer while other is string. Later, the composite is deleted.

Windows.Storage.ApplicationDataContainer localSettings =
    Windows.Storage.ApplicationData.Current.LocalSettings;

// Create a composite setting
Windows.Storage.ApplicationDataCompositeValue composite =
    new Windows.Storage.ApplicationDataCompositeValue();
composite["intVal"] = 1212;
composite["strVal"] = "string";

localSettings.Values["RawCompositeSetting"] = composite;

// Read data from a composite setting
Windows.Storage.ApplicationDataCompositeValue composite2 =
    (Windows.Storage.ApplicationDataCompositeValue)
    localSettings.Values["RawCompositeSetting"];

if (composite2!=null)
{
    // Access data in composite2["intVal"] and composite2["strVal"]
    int one = (int)(composite2["intVal"]);
    string hello = composite2["strVal"].ToString();
}

// Delete a composite setting
localSettings.Values.Remove("RawCompositeSetting");

The equivalent C# code relying on wrapper to store and read a composite with 2 name-values. One is integer while other is string. The code is equally concise.

UWPStorage storage = new UWPStorage(StorageType.Local);
// Create a composite setting
CompositeValue composite = new CompositeValue();
composite.SetInt32("intVal", 3434);
composite.SetString("strVal", "string");

storage.SetComposite("WrapperCompositeSetting", composite);

// Read data from a composite setting
bool exists = false;
CompositeValue composite2 = storage.GetComposite
    ("WrapperCompositeSetting", out exists);
if (exists)
{
    // Access data in composite2["intVal"] and composite2["strVal"]
    int one = composite2.GetInt32("intVal", out exists, 4444);
    string hello = composite2.GetString("strVal", out exists, "error");
}
// Delete a composite setting
storage.Remove("WrapperCompositeSetting");

With this library, it is now possible to write a portable C++ storage class with the same interface that not only works on UWP but other OS like Linux and MacOSX, cutting the amount of work to port your UWP app to other platforms, if the need arises.

The code is hosted at Github.

 

C++11 std::div() Benchmark

Download source at Github

Update: rand() overhead in benchmark has been removed by filling the array with random values beforehand.

C++11 standard introduces std::div() and its siblings on the premise of some compiler can take advantage of the available machine code that compute quotient and remainder of division together. The C++ reference noted, and (updated) according to Petr Kobalíček, this function was never about performance but rounding direction of negative operands. We thank him for his comment.

Until C++11, the rounding direction of the quotient and the sign of the remainder in the built-in division and remainder operators was implementation-defined if either of the operands was negative, but it was well-defined in std::div.

On many platforms, a single CPU instruction obtains both the quotient and the remainder, and this function may leverage that, although compilers are generally able to merge nearby / and % where suitable.

We’ll put std::div() to test in this benchmark.

Compiler Tested

  • GCC 7.4.0 on Cygwin
  • Clang 5.0.1 on Cygwin
  • Visual C++ 15.9 Update

OS: Windows 10 Pro

CPU: Intel i76820HQ

Loops: 10 million

Benchmark code

stopwatch.start("Division and Modulus");
for (size_t i = 0; i < vec.size(); ++i)
{
    TwoNum& a = vec[i];
    result.quot = a.num / a.divisor;
    result.rem = a.num % a.divisor;
    total_result += result.quot + result.rem; // prevent optimize away
}
stopwatch.stop();

total_result = 0L;
stopwatch.start("Custom div function");
for (size_t i = 0; i < vec.size(); ++i)
{
    TwoNum& a = vec[i];
    result = my_div(a.num, a.divisor);
    total_result += result.quot + result.rem; // prevent optimize away
}
stopwatch.stop();

total_result = 0L;
stopwatch.start("std::div function");
for (size_t i = 0; i < vec.size(); ++i)
{
    TwoNum& a = vec[i];
    result = std::div(a.num, a.divisor);
    total_result += result.quot + result.rem; // prevent optimize away
}
stopwatch.stop();

Custom my_div() function is defined as below.

inline std::div_t my_div(int number, int divisior)
{
    return std::div_t{ (number / divisior), (number % divisior) };
}

Unoptimized Benchmark

GCC Unoptimized
Division and Modulus timing:108ms
Custom div function timing:150ms
std::div function timing:104ms

Clang Unoptimized
Division and Modulus timing:104ms
Custom div function timing:184ms
std::div function timing:102ms

VC++ Unoptimized
Division and Modulus timing:411ms
Custom div function timing:465ms
std::div function timing:427ms

On unoptimized GCC and Clang binary, std::div() is faster than my_div() and slightly faster than the individual division and modulus. For VC++, individual division and modulus is faster probably due to no overhead of function call.

Optimized Benchmark

GCC Optimized(O3)
Division and Modulus timing:30ms
Custom div function timing:34ms
std::div function timing:56ms

Clang Optimized(O3)
Division and Modulus timing:31ms
Custom div function timing:32ms
std::div function timing:54ms

VC++ Optimized(Ox)(Ot)
Division and Modulus timing:32ms
Custom div function timing:41ms
std::div function timing:50ms

On optimized binary, it is a shame that std::div() is consistently slower. In conclusion, today’s compiler already does a very good job of computing division and modulus together without resorting to std::div() but it still have the performance lead when it comes to unoptimized build.

Just to give a quick shoutout to my readers: you may have noticed I have not been updating this blog since August last year and I always cross post from my CodeProject articles. I promise there will be exclusive content that can only be found here!

Not Every Memory Allocation Failure is OOM

Introduction

As with many C++ programmers with C background, bring their C habits to C++ programming as shown in the below code where a massive array is allocated and pointer is then checked for failed allocation in presence of null address. It works this way for C malloc. Unfortunately, C++ new does not work like that: in the case of allocation failure, new throws a bad_alloc exception. I have seen this bad example in production code. And I am guilty myself for writing such code as you can seen in this WIC article.

// C code that works
char* ptr = (char*) malloc(0x7fffffff);
if (!ptr) // will
{
    // will come here!
    printf("Error: ptr is null!\n");
}
// C++ code that does not work as intended
char* ptr = new char[0x7fffffff];
if (!ptr)
{
    // will never come here when allocation fails because bad_alloc exception is thrown
    std::cerr << "Error: ptr is null!" << std::endl;
}

To fix the code, either instruct new not to throw exception by specifying std::nothrow after it or use a trycatch block to catch bad_alloc exception.

char* ptr = new (std::nothrow) char[0x7fffffff];
if (!ptr)
{
    std::cerr << "Error: ptr is null!" << std::endl;
}

Should You Catch bad_alloc Exception?

One school of thought clearly recommends let the bad_alloc exception bubbles up the call stack and let the process die an inglorious death by termination because there is nothing you can do in a process ran out of memory situation. Here, we come to the question of today’s topic: is every memory allocation failure OOM?

The answer is no. It could a case of a competitor hiring a hacker to cause a denial of service(DoS). In today’s world, your program or service may open a file from user or read a network packet from another service. Inside the file or network packet, there could be an array to be read. In the usual implementation, there is count field preceding the array to let the program know in advance, the array length. Hacker can just manipulate this field to a very large number to cause your program to crash repeatedly. Instead of terminating, the fix is to detect this anomaly and discard reading this file (and flag this as error) and continue business as usual.

Is There a Notorious Vulnerability Out in the Wild Exploiting OOM?

The answer to this question is yes. In the OWASP Top Ten 2017, in rank number 4: XML External Entities (XXE) is a type of vulnerability which exploits XML processors either by injection or expansion. We shan’t discuss injection as it is not relevant to our discussion. One notable XML External Entities expansion is Billion laughs attack (See below for example). Except for the first entity, every entity is 10 times expansion of previous entity, causing too big a memory allocation to handle in the end.

<?xml version="1.0"?>
<!DOCTYPE lolz [
 <!ENTITY A "A">
 <!ELEMENT lolz (#PCDATA)>
 <!ENTITY B "&A;&A;&A;&A;&A;&A;&A;&A;&A;&A;">
 <!ENTITY C "&B;&B;&B;&B;&B;&B;&B;&B;&B;&B;">
 <!ENTITY D "&C;&C;&C;&C;&C;&C;&C;&C;&C;&C;">
 <!ENTITY E "&D;&D;&D;&D;&D;&D;&D;&D;&D;&D;">
 <!ENTITY F "&E;&E;&E;&E;&E;&E;&E;&E;&E;&E;">
 <!ENTITY G "&F;&F;&F;&F;&F;&F;&F;&F;&F;&F;">
 <!ENTITY H "&G;&G;&G;&G;&G;&G;&G;&G;&G;&G;">
 <!ENTITY I "&H;&H;&H;&H;&H;&H;&H;&H;&H;&H;">
 <!ENTITY J "&I;&I;&I;&I;&I;&I;&I;&I;&I;&I;">
]>
<lolz>&J;</lolz>

The solution to this type of vulnerability is to disable External Entities in the XML parser or when this is not possible, not to use External Entities, set a limit on the maximum level of expansion.

H264 Video Encoder for OpenGL

Though this long article has more lines than the encoder library itself, this is a very simple and easy to read and understand article. If you have read my other articles before, you will be comfortable to know I do not write complicated stuff.

Table of Contents

Requirements

Video Encoder

  • Visual C++ 2015/2017
  • Windows 7/8/10

OpenGL Renderer

  • SDL2
  • SDL2 Image
  • GLEW
  • GLM
  • Tiny OBJ loader
  • Zlib

All the OpenGL Renderer required libraries are included in the repository. The focus is on Video Encoder.

Introduction

I worked on this video encoder while writing my Windows Store App, Mandy Frenzy, a photo slideshow app for ladies. Right now, I am feeling burnt out so I am taking a short hiatus. Meanwhile, I am writing a series of short articles as a way to document this app. This video encoder is a header file only (H264Writer.h), based on Microsoft Media Foundation, not the old DirectShow as Microsoft did not expose the H264 and HEVC codec on DirectShow. It is tested on Windows 10. It should work fine on Windows 7/8 as well. Do let me know if you encounter any problem on those OSes. Windows 10 used to come bundled with HEVC codec. For unknown reasons, MS has taken it out in Windows 10 Fall Creators Update for the **new** Windows 10 installation and put it up for purchase for $1.50 in the Microsoft Store. In the section below, some screenshots will show encoding artifacts present in MS HEVC video.

What Are the Pros and Cons of This Encoder Over FFmpeg?

FFmpeg is GPL and so you may not concerned if you just want to encode the output of your personal renderer. For my freemium app, I prefer to steer clear of the licensing issues. How hobbyists usually encode their frames with FFmpeg is to save all the frames in HDD first which limits the number of frames and also directly impacted video duration that can be saved depending on the free HDD space. The extra step of saving and opening the files has negative impact of the encoding speed. Of course, tight integration with FFmpeg code may eliminate the frame saving part. On the other hand, this encoder reads RGB values from the framebuffer provided. The downside is it is not portable and only works on Windows 7/8/10.

3 Rendering Modes

The same OpenGL renderer can be compiled into 3 modes: normal OpenGL display mode, Video Encoder mode and Emscripten mode. The latter two’s code sections are respectively guarded by VIDEO_ENCODER and EMSCRIPTEN macros. You can, by all means, use your own renderer with the video encoder. The default OpenGL renderer is just provided to show a working demo.

The documentation is divided into 3 main sections. The first section is get the demo up and running and on how to modify the parameters. The second section is on how to integrate it with your OpenGL framework. The demo uses a renderer framework taught in Paul Varcholik’s OpenGL Essentials LiveLessons. The original source code used GLFW and is based on OpenGL 4.4: I converted his framework to use SDL and downgrade to OpenGL 2.0. The decision is based on the lowest denominator of what WebGL and Emscripten can support. A tutorial on how to integrate with DirectX will come later. In theory, this video encoder should integrate well with other graphics API like Vulkan, afterall, all it needs to be supplied with a video buffer and some synchronization in tandem to perform its work. The third section is on the explanation of the internals of the video encoder which you can skip if you are not into the encoder internals and implementation. And the last section explains the Emscripten part required to compile into asm.js or Webassembly.

Running the Demo

Spaceship Video

Spaceship Video on Youtube

spaceship_small

All the required libraries are included in the repository. The required DLLs are copied automatically to the Release or Debug folder for Win32 post builds. x64 build is unbuildable due to the inability to find a x64 zlib lib/dll on the web; this is a linking problem that lies with the OpenGL renderer, not video encoder.

To see the OpenGL demo, open up SDL_App.sln in Visual Studio and build the SDL_App project.

To run the video encoding demo, open up H264SinkWriter.cpp and in the main function, modify the configFile, musicFile and videoFile to the paths on your machine. configFile is found in the $(SolutionDir)SDL_App folder. musicFile should be a mp3 file and if it is left blank, the final video shall not have music. videoFile is the encoded video. You can specify HVEC(aka H265) instead of H264 in the 4th parameter. HVEC is having some encoding issues where the colors bleed (See screenshots below.)

// This is the config file to be found in SDL_App project folder.
std::wstring configFile(L"D:\\GitHub\\video_encoder_for_ogl_dx\\SDL_App\\SDL_App\\config.txt");
// This is your music file
std::wstring musicFile(L"D:\\FMA.mp3");
// This is the video encoded output file.
std::wstring videoFile(L"C:\\Users\\shaov\\Documents\\video.mp4");

H264Writer writer(musicFile.c_str(), configFile.c_str(), videoFile.c_str(), VideoCodec::H264);
if (writer.IsValid())
{
    if (writer.Process())
    {
        printf("Video written successfully!\n");
        return 0;
    }
}
printf("Video write failed!\n");
getchar();

The typical config.txt is to facilitate passing of information to OpenGL renderer, has nothing to do with video encoder. If your renderer can get the information about the video it is about to encode, then just pass a dummy config.txt. The contents of a typical config.txt is shown below:

ScreenWidth=800
ScreenHeight=600
LogPath=C:\Users\shaov\Documents\log.txt
FPS=60[/code]

Now the demo does not handle aspect ratio automatically and it always sticks with 4:3 ratio. If you enter anything which is 16:9, or wider than 4:3, in screen width and height, your video will look stretched. FPS entry is for the integer number of frames per second; there is no way to enter a decimal number like 29.7777.

The demo would only encode 5 seconds of the video. Change duration in RenderingScene::Draw function.

void RenderingScene::Draw(const SceneTime& gameTime)
{
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
    float grey = 35.0f / 255.0f;
    glClearColor(grey, grey, grey, 1.0f);

    Scene::Draw(gameTime);

    SDL_GL_SwapWindow(mWindow);

#ifdef VIDEO_ENCODER
    static GLfloat start_time = gameTime.TotalGameTime();
    GLfloat elapsed_time = gameTime.TotalGameTime() - start_time;
    // During video encoding, only run for 5 seconds.
    if(elapsed_time > 5.0f)
    {
        Scene::setVideoEnded();
    }
#endif
}

You have to experiment to find out the optimal bitrate that can encode a good quality video.

enum class VideoCodec
{
    H264,
    HVEC
};

// H264Writer constructor
H264Writer(const wchar_t* mp3_file, const wchar_t* src_file, const wchar_t* dest_file,
           VideoCodec codec, UINT32 bitrate = 4000000) :
{...}

To run the demo executable by itself, you need to copy the config.txt, Images and Models folders to the Release/Debug folder. The SDL2 DLLs would have already copied during post build.

By default, demo displays a rotating UFO saucer, to display other 3D model, just uncomment the other lines in CreateComponents().

void RenderingScene::CreateComponents()
{
    mCamera = std::unique_ptr(new
        FirstPersonCamera(*this, 45.0f, 1.0f / 0.75f, 0.01f, 100.0f));
    mComponents.push_back(mCamera.get());
    mServices.AddService(Camera::TypeIdClass(), mCamera.get());

    try
    {
        //mTexturedModelDemo = std::unique_ptr(new TexturedDemo(*this, *mCamera));
        //mComponents.push_back(mTexturedModelDemo.get());

        //mDiffuseCube = std::unique_ptr(new DiffuseCube(*this, *mCamera,
        //    "Cube.obj.txt", "Cube.mtl.txt"));
        //mComponents.push_back(mDiffuseCube.get());

        mUFOSpecularModel = std::unique_ptr(new SpecularModel(*this, *mCamera,
            "UFOSaucer3.jpg", "UFOSaucer.obj.txt.zip", "UFOSaucer.mtl.txt"));
        mComponents.push_back(mUFOSpecularModel.get());

        //mStarModel = std::unique_ptr(new StarModel(*this, *mCamera,
        //    glm::vec3(1.0f,1.0f,0.0f), "Star.obj.txt", "Star.mtl.txt"));
        //mComponents.push_back(mStarModel.get());
    }
    catch (SceneException& e)
    {
        char buf[1000];
        SPRINTF(buf, "ModelEffect exception:%s", e.what());
        gLogger.DebugPrint(buf);
    }

#ifdef __EMSCRIPTEN__
    gDownloadSingleton.DownloadAllNow();
#endif
}

HEVC Artifacts

H264 Video of Mandy Frenzy

H264 Video on Youtube

h264_small

HEVC Video (Artifacts) of Mandy Frenzy

HEVC Video on Youtube

hevc_small

As you can see, the sinewave artifacts in HEVC, not present in H264. Increasing the bitrate solves the problem at the expense of larger file size. By the way, the sinewave is rendered by triangles, not lines and not by fragment shaders. Reason being lines are usually implemented as 1 pixel wide in OpenGL ES 2.0. Using triangles allows me to control the width/height.

Integration with your OpenGL Framework

This section teaches the modification needed to integrate the video encoder into your renderer.

The fastest way to find all the encoding related code is to search for VIDEO_ENCODER macro in the source code. The encoder requires you to implement 2 global functions: check_config_file and encoder_start. See their declarations below. encoder_start is called in the worker thread.

extern int check_config_file(const wchar_t* file, int* width, int* height, int* fps);
extern int encoder_start(UINT** pixels, HANDLE evtRequest, HANDLE evtReply,
                         HANDLE evtExit, HANDLE evtVideoEnded, const WCHAR* szUrl);

They, in turn, called their DLL counterparts implemented in the Program.cpp.

int check_config_file(const wchar_t* file, int* width, int* height, int* fps)
{
    return ::check_project_file(file, width, height, fps);
}
int encoder_start(UINT** pixels, HANDLE evtRequest, HANDLE evtReply,
                  HANDLE evtExit, HANDLE evtVideoEnded, const WCHAR* szUrl)
{
    return ::encoder_main(pixels, evtRequest, evtReply, evtExit,
                          evtVideoEnded, szUrl);
}

check_project_file requires you to pass the screen width, height and FPS information from the file which is config.txt. In reality, you can just open any file which can provide you with this information, so implementation of check_project_file is unimportant. Let’s see how encoder_main is implemented.

SDL_APP_DLL_API int WINAPI check_project_file(const wchar_t* file, int* width,
                                              int* height, int* fps)
{
    ConfigSingleton config;
    const std::string afile = toAString(file);
    bool ret = config.LoadConfigFile(afile);
    if (ret)
    {
        *width = config.GetScreenWidth();
        *height = config.GetScreenHeight();
        *fps = config.GetFPS();
    }
    return ret;
}

SDL_APP_DLL_API int WINAPI encoder_main(UINT** pixels, HANDLE evtRequest,
                                        HANDLE evtReply, HANDLE evtExit,
                                        HANDLE evtVideoEnded, const WCHAR* szUrl)
{
    try
    {
        const std::string& config_file = toAString(szUrl);

        gConfigSingleton.OpenFile(config_file.c_str(),
          Library::DownloadableComponent::FileType::INI_FILE);
        if (gConfigSingleton.IsLoadSuccess() == false)
        {
            gLogger.Log("gConfigSingleton.IsLoadSuccess() failed! See log!");

            return 1;
        }

        Texture::setScreenDim(gConfigSingleton.GetScreenWidth(), gConfigSingleton.GetScreenHeight());
        std::unique_ptr renderingScene(new RenderingScene(L"Photo Montage",
            gConfigSingleton.GetScreenWidth(), gConfigSingleton.GetScreenHeight()));
        renderingScene->initVideoEncoder(pixels, evtRequest, evtReply, evtExit,
            evtVideoEnded, gConfigSingleton.GetFPS());

        renderingScene->Run();
    }
    catch (SceneException& ex)
    {
        gLogger.Log(ex.GetError().c_str());
    }
    catch (std::exception& ex)
    {
        gLogger.Log(ex.what());
    }

    return 0;
}

encoder_main is very similar to WinMain except it calls initVideoEncoder function to hand over the parameters. This is how initVideoEncoder is implemented: it just zero initialized some members and saved the parameters inside its members. These HANDLE arguments are already initialized inside the H264Writer constructor. setTimeQuandant() is to set the time increment for every frame, for example, if FPS is 30, then the time increment should be 33.3, irregardless of actual time passed. You wouldn’t want varying time rate for your video encoding.

void Scene::initVideoEncoder(UINT** pixels, HANDLE evtRequest, HANDLE evtReply,
                             HANDLE evtExit, HANDLE evtVideoEnded, int fps)
{
    mTexture = 0;
    mRenderBuffer = 0;
    mDepthBuffer = 0;
    mMultisampleTexture = 0;
    mPixels = pixels;
    mPixelBuffer = nullptr;
    mEvtRequest = evtRequest;
    mEvtReply = evtReply;
    mEvtExit = evtExit;
    mEvtVideoEnded = evtVideoEnded;

    mGameClock.setTimeQuandant((1000.0f/(float)fps)/1000.0f);
}

This is how encoder_start is called in the worker thread started by H264Writer.

DWORD WINAPI ThreadOpenGLProc(LPVOID pParam)
{
    H264Writer* pWriter = (H264Writer*)(pParam);
    return ::encoder_start((UINT**)(
        pWriter->GetImagePtr()),
        pWriter->GetRequestEvent(),
        pWriter->GetReplyEvent(),
        pWriter->GetExitEvent(),
        pWriter->GetVideoEndedEvent(),
        pWriter->GetUrl().c_str());
}

In our OpenGL renderer, we need to create RenderBuffer for our renderer to draw on.

void Scene::CreateFBO()
{
#ifdef VIDEO_ENCODER
    mPixelBuffer = new unsigned int[mScreenWidth*mScreenHeight];

    glGenTextures(1, &mMultisampleTexture);
    glBindTexture(GL_TEXTURE_2D_MULTISAMPLE, mMultisampleTexture);
    glTexImage2DMultisample(GL_TEXTURE_2D_MULTISAMPLE, 4,
                  GL_RGBA, mScreenWidth, mScreenHeight, GL_TRUE);

    glGenRenderbuffers(1, &mRenderBuffer);
    glBindRenderbuffer(GL_RENDERBUFFER, mRenderBuffer);
    glRenderbufferStorageMultisample(GL_RENDERBUFFER, 16, GL_RGBA8, mScreenWidth, mScreenHeight);
    glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, mRenderBuffer);

    // Create depth render buffer (This is optional)
    glGenRenderbuffers(1, &mDepthBuffer);
    glBindRenderbuffer(GL_RENDERBUFFER, mDepthBuffer);
    glRenderbufferStorageMultisample(GL_RENDERBUFFER, 16,
                       GL_DEPTH24_STENCIL8, mScreenWidth, mScreenHeight);
    glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_RENDERBUFFER, mDepthBuffer);
    glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_STENCIL_ATTACHMENT, GL_RENDERBUFFER, mDepthBuffer);

    GLuint mTexture = 0;
    glGenTextures(1, &mTexture);
    glBindTexture(GL_TEXTURE_2D, mTexture);
    unsigned int dim = determineMinDim(mScreenWidth, mScreenHeight);
    unsigned int* pixels = new unsigned int[dim * dim];
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, dim, dim, 0, GL_RGBA, GL_UNSIGNED_BYTE, pixels);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
    delete [] pixels;
    pixels = NULL;

    glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, mTexture, 0);

    // Enable multisampling
    glEnable(GL_MULTISAMPLE);

    if (GL_FRAMEBUFFER_COMPLETE == glCheckFramebufferStatus(GL_FRAMEBUFFER))
    {
        OutputDebugStringA("FBO status: Complete!\n");
    }
    else
    {
        OutputDebugStringA("FBO status: Not complete!\n");
    }
#endif // VIDEO_ENCODER
}

determineMinDim() used in the above function is implemented in this way because we need a square texture to power of 2.

#ifdef VIDEO_ENCODER
unsigned int Scene::determineMinDim(unsigned int width, unsigned int height)
{
    unsigned int dim = width;
    if (height > width)
        dim = height;

    unsigned int min_dim = 32;
    if (dim > 32 && dim  64 && dim  128 && dim  256 && dim  512 && dim  1024 && dim  2048 && dim <= 4096)
        min_dim = 4096;
    else
        min_dim = 4096;

    return min_dim;
}
#endif // VIDEO_ENCODER

Lastly, we need to read the rendered buffer with glReadPixels. The code wait on mEvtRequest and mEvtExit. mEvtRequest is signaled when the encoding thread requests for new frame. Whereas the mEvtExit is signaled during exit. After copying the buffer to mPixels, mEvtReply is signaled to encoding thread that mPixels is ready.

void Scene::ReadBuffer(bool& quit)
{
#ifdef VIDEO_ENCODER
    glReadBuffer(GL_COLOR_ATTACHMENT0);

    glReadPixels(0, 0, mScreenWidth, mScreenHeight, GL_BGRA_EXT, GL_UNSIGNED_BYTE, mPixelBuffer);

    HANDLE arrHandle[2];
    arrHandle[0] = mEvtRequest;
    arrHandle[1] = mEvtExit;

    DWORD dwEvt = WaitForMultipleObjects(2, arrHandle, FALSE, INFINITE);

    if (dwEvt == WAIT_OBJECT_0 + 1)
    {
        quit = true;
    }
    while (*mPixels == NULL) { Sleep(100); }

    if(*mPixels)
        memcpy(*mPixels, mPixelBuffer, mScreenWidth*mScreenHeight*sizeof(unsigned int));

    SetEvent(mEvtReply);
#endif // VIDEO_ENCODER
}

This is how CreateFBO() and ReadBuffer() are called in my Render(). CreateFBO() and ReadBuffer() are empty when not compiled in VIDEO_ENCODER mode.

void Scene::Render(bool& quit)
{
    static bool all_init = false;
    if (all_init == false)
    {
        if (IsAllReady())
        {
            CreateFBO();
            Initialize();
            if (IsAllInitialized())
            {
                all_init = true;
                mGameClock.Reset();
            }
        }
    }
    if (all_init)
    {
        mGameClock.SetPause(mPaused);
        if (mPaused == false)
        {
            mGameClock.UpdateGameTime(mGameTime);
        }
        Update(mGameTime);
        Draw(mGameTime);
        ReadBuffer(quit);
    }
}

How Is Video Encoder Written

Too long to be put in the blog post, read it at Github Readme

Running as asm.js on Web Browser

This section focuses mainly on asm.js aspect of OpenGL framework that comes bundled with the demo. If you are only interested in the video encoder, you can safely ignore this section. If you want to run your app on the web browser with the framework, this is the section for you. The design choice of framework is based on lowest technologies mainstream WebGL and web browser environment can work with. Since WebGL is a safe subset based on OpenGL 2.0 ES in such a way that WebGL calls can be translated to OpenGL 2.0 calls with little effort. Without coincidence, this is also the maximum OpenGL version the framework can support.

Change Your URL

Change your IP address/domain/port accordingly in SRC_FOLDER in the Program.cpp. SRC_FOLDER is used to download your assets from. Switch to http if you are not using https.

#ifdef __EMSCRIPTEN__
	#define SRC_FOLDER "https://localhost:44319/"
#else
	#define SRC_FOLDER ".\\"
#endif

Recompile the Code as asm.js

Recompile the code for new URL to take effect, if you are on Windows 10 and are comfortable tinkering with Bash commands, enable Windows Subsystem for Linux and install Ubuntu from MS Store and install Emscripten. Run GNU make.

In the Makefile, change to -s WASM=0

make all

Recompile the Code as Webassembly

Remove or change to -s WASM=1 because default option is compiling to Webassembly when WASM is not specified. The reason I did not do so, because IIS do not recognise the wasm mime type added to it.

No Assimp Support

Assimp is not supported simply because there is no Emscripten port for Assimp. Instead, Tiny OBJ Loader is used to load simple 3D model in Wavefront OBJ format. OBJ file extensions ends in *.obj and *.mtl and I have modified the library to load in *.obj.txt and *.mtl.txt because I want to avoid adding new mime types in the web server.

Downloading Your Assets

For drawing anything on screen, derive your class from DrawableSceneComponent and **must** call DownloadFiles() in the constructor to download your assets from relative subfolders because asm.js must have all assets downloaded before the main rendering loop is run. DownloadFiles calls OpenFile on the files directly on a desktop app. If the code is compiled with EMSCRIPTEN macro defined, it will first download files and then open.

No Downloading of Shader

The shader code is inline with C++11 raw string literals to save downloading effort.

Reference Book

Thinking Within the Box

No, your eyes aren’t playing tricks on you. You have not misread it. The blog title is not a typo! It is, in fact, intentional. I assure you this is not a click bait. It is not thinking outside the box but thinking within the box, per se. These two approaches couldn’t be more different, the former is looking outside in versus the latter’s looking inside out. Today’s blogpost, we will discuss of a case of Thinking Within the Box at work in computer science, particularly Java language design compared to C/C++ language and how it applies and will work in today’s business world and then we attempt to apply this methodology to tackle ongoing fare evasion problem in Metro system in France. Without delay, let’s get started.

What do I mean by thinking within the box? In its simplest term, it is contemplating within the problem constraints and using its circumstances to turn the problem around on its head to attack itself! The 1st example that quickly came to my mind, is vaccination. Below is an excerpt from Live Science Journal with explanation on how vaccination work.

A healthy individual can produce millions of antibodies a day, fighting infection so efficiently that people never even know they were exposed to an antigen. … Unfortunately, the first time the body faces a particular invader, it can take several days to ramp up this antibody response. For really nasty antigens like the measles virus or whooping cough bacteria, a few days is too long. The infection can spread and kill the person before the immune system can fight back.

That’s where vaccines come in. According to the Children’s Hospital of Philadelphia Vaccine Education Center, vaccines are made of dead or weakened antigens. They can’t cause an infection, but the immune system still sees them as an enemy and produces antibodies in response. After the threat has passed, many of the antibodies will break down, but immune cells called memory cells remain in the body.

When the body encounters that antigen again, the memory cells produce antibodies fast and strike down the invader before it’s too late.

Another instance I can find in martial arts is Judo where the Judo practitioner use the oncoming force of attacker to overthrow him. In fact, in computer science, we also have a good example. Before I reveal my answer, let me give you some history and perspective on C language first. In the 70s and 80s, colleges were teaching C programming and the students had a hard time grasping C and its pointers. To be fair to students, for a C/C++ veteran like me, I, too, sometimes have difficulty knowing exactly when to dereference an address/pointer, despite having written a pointer to pointer article (which is an advanced pointer topic) in 2003. However, there is no escape from understanding pointer and memory address in CS at that time, because underneath, it is fundamentally how computer works. Coupled with confusion that pointer can point to either a variable on the stack or an allocated memory on the heap which the programmer must remember to deallocate after use, else a memory leak occurs. C language and its pointers is, no doubt to many of us, a very hard beast to subdue and tame.

Come 1995, Java, a language created by James Gosling, was unveiled to the world which mentioned no pointers at all in its specification. How did Java (or rather James Gosling) accomplish that feat? Java is a garbage collection language and has what they called reference type which is essentially opaque pointer to memory that are automatically dereferenced and deallocated during garbage collection cycle. Garbage collector keeps track of all these references, so that after the memory is collected back, memory compaction is done and the addresses kept in those affected references are updated with the new memory address. This is the reason program execution is paused during GC.

This is akin to saying, “Hey, here’s a computer language without pointers”, but in fact, it is full of pointers disguised and renamed as references.

The Nobel prize winner, American economist, Milton Friedman said in his famous paper, “The social responsibility of business is to increase its profits”, because corporations were owned by their shareholders, the only obligation of business is to maximise profits for the shareholders. Most top executives tend to achieve record-breaking profits though cost-cutting which, most of time, hurts its employees and customers and in turn affects the corporation negatively in long term.

IT centres and its operations are often labelled as cost centre (where all fats shall be trimmed) while sales department is undisputedly the profit centre. Nothing can be further from the truth. Without the IT centre, the sales department does not even have a decent product or service to sell. Small and medium IT companies often cut the essential operating cost of their IT service, causing the service unable to meet SLA, resulting customer not renewing the contract upon expiration. Even when company pay top-dollar for best sales person, profit is unlikely to be stellar.

Everybody has this horrible experience with automated hotline where you have to jump hoops to converse with a living customer support officer. My local telco came up with this idea of AI chatbot; That could be a brilliant cost-cutting move, except this chatbot isn’t exactly smart; It merely interprets and matches the asked questions to FAQ and redirects there without ability to resolve issues. I have already checked FAQ before resorting to chatbot. When any telco has good personalized service, that business shall have my money. To be honest, it takes a great leap of faith (to believe that additional business would have paid off the extra incurred cost) for any corporation to justify personalized service (for customer) where none of its competitors are currently doing.

Unless you have been living in a cave, film photography has been dead for some time and digital photography is the current trend. My friend, Marco is able to carve out a lucrative business running an analogue film processing lab, charging a premium for his service. His selling point is his photos are more authentic and his target customer are the well-to-do individuals who are willing to splurge for nostalgia. Music CD shops are closing down in droves due to rampant music piracy and easy availability of legitimate music streaming sites like Spotify and Apple Music. There are still brick and mortar music shops left, just that they are not selling CDs, but vintage vinyl records. Just because technology made the product obsolete, that doesn’t mean you cannot have a business selling them, if you know how to position your business and adapt the value proposition accordingly.

Finally, we have to come to the last part everyone has been waiting for! That is applying “thinking within the box” approach to fare dodging problem in French Metro system which has been haemorrhaging money for years. My proposed solution is actually very simple: Pass legislation to legalise fare dodging; Open 1 gate specially for them, remove its turnstiles, install CCTVs on all gates. Fare dodgers are only allowed to pass through by that free gate, no leaping over the turnstiles at other gates. It is up to the station to decide whether to hire enforcement officers to police the situation. It is perfectly fine not to hire any; that is, train station does not want to catch any cheater. No questions asked and no grudge held against the fare dodgers; they could be genuinely penniless or tourists who just got pickpocketed. But what if everyone just passes through the free gate? Then close that particular station for 3 months with reason given as insufficient funds to keep it open. Most importantly keep the train stations very clean, nobody would voluntarily pay full fares for station with urine stench.

One could argue Thinking Within the Box can be interpreted as another form of Thinking Outside the Box, in retrospect. The key takeaway, I want, for reader is: When all fails, when all the tried-and-true methods fails you, be not afraid to try opposite, go against conventional wisdom.

Note: In case the reader is puzzled why I discuss about business ideas in a CS blog about career. Knowing to code to the IT architecture isn’t necessarily enough for you to climb the corporate ladder, you have to know business side of things. If you like this blogpost and would like to read more of this stuff, click subscribe to this blog, because future career related post would not be posted to Reddit C++ Forum and C++ Enthusiasts Facebook page due to their non-C++ related nature.

Reference Material

The Garbage Collection Handbook (2011) by Richard Jones, Antony Hosking and Eliot Moss: The insights were gleaned from this book when I wanted to create my own OO scripting language.

Pull Request at GitHub

Well, they say GitHub is a social-media site for programmers. It provides a way of collobration and contribution for programmers via pull request. If you are just using GitHub to download source code, you are not using it to the fullest potential. I often have users message/email me about some small fix they did for my libraries and expect me to merge their fix manually. On the other hand, if they did a pull request, I could have it merged automatically after a code review. In this tip, I am going to show how to do a pull request on my simplebinstream library on GitHub. The source code owner and contributor are 2 accounts controlled by me.

Contributor Steps

Go to the repo you are interested to contribute. Fork the repository.

git_fork

Forking can take some time to complete if the repository is huge.

git_forking

Next, open up a command prompt and clone the repository you just forked.

D:\Github>git clone https://github.com/ZodiacAZ/simplebinstream.git
Cloning into 'simplebinstream'...
remote: Counting objects: 150, done.
remote: Compressing objects: 100% (5/5), done.
Receiving objects: 100% (150/150)   1 (delta 0), pack-reused 145Receiving objects:  99% (149/150)
ts: 100% (150/150), 36.07 KiB | 0 bytes/s, done.
Resolving deltas: 100% (97/97), done.
Checking connectivity... done.

Switch to the development branch. The development branch in our case is master. So we need not do any switching. Some repository can have other development branch name such as develop: They only merge develop into master after changes are tested and stable.

Next step is to create a new branch (called fix_endian) based on development branch. Note: it has to be a new branch. You cannot do your modification on the same branch as origin. Your new branch can have any name as long as it does not clash with any existing branch.

D:\Github>git checkout -b fix_endian

Do our modification on the new branch. Then we commit and push to our repository.

D:\Github\simplebinstream>git status
On branch fix_endian
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   TestBinStream/SimpleBinStream.h
        modified:   TestBinStream/TestBinStream.cpp

no changes added to commit (use "git add" and/or "git commit -a")

D:\Github\simplebinstream>git add TestBinStream/SimpleBinStream.h

D:\Github\simplebinstream>git add TestBinStream/TestBinStream.cpp

D:\Github\simplebinstream>git status
On branch fix_endian
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        modified:   TestBinStream/SimpleBinStream.h
        modified:   TestBinStream/TestBinStream.cpp


D:\Github\simplebinstream>git commit -m "Fix endian"
[fix_endian 0f19491] Fix endian
 2 files changed, 51 insertions(+), 19 deletions(-)

D:\Github\simplebinstream>git push origin fix_endian
Fatal: HttpRequestException encountered.
Username for 'https://github.com': ZodiacAZ
Password for 'https://ZodiacAZ@github.com':
Counting objects: 5, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 796 bytes | 0 bytes/s, done.
Total 5 (delta 4), reused 0 (delta 0)
remote: Resolving deltas: 100% (4/4), completed with 4 local objects.
To https://github.com/ZodiacAZ/simplebinstream.git
 * [new branch]      fix_endian -> fix_endian

We will go to our GitHub repo webpage and switch to our branch (fix_endian) from the master branch.

git_switchbranch

The button “Compare and pull request” will appear and click that.

git_pullreq

It will bring us to this page and fill in the title and message. You can only allow to create the pull request when GitHub verifies there is no conflict. Conflict can happen if the owner already pushes some code whilst you do your modifications. If this is the case, you have to pull the latest source code and fix the conflict and commit and push the code again. Click the "Create pull request" button and your job is done. Now we can only wait patiently for the owner to review and merge our fix.

git_openpull

Project Owner Steps

The owner will see your pull request. If he clicks the "Merge pull request" button, our work is done.

git_canmerge2

Some Advice

When you want to contribute to an open source project, you will spend much time reading and understanding the source code first than doing the modification/fix. With a project with 100,000 lines of code, you can spend close to one month to understand just 1 subsystem. While reading the code, try to get a feel of its coding standards/convention used in that project. If your code does not follow the coding standards, owner is definitely going to reject your contribution.

Why would anyone want to contribute?

Assuming you are a very passionate and opinionated person, when you see something is not right with the software, you’ll want to fix it. Of course, you can file an issue on their GitHub. Sometimes the owners ignore your issue or feature request because they have other priorities or more urgent bugs to fix. To see the bug fixed in your beloved project, sometimes the best way is to roll up your sleeves and implement the change yourself.

 

Floating Point and Integer Arithmetic Benchmark

Introduction

This is not much of a tip, just a posting of benchmark result to compare integer and floating point arithmetic timing. All the integer and floating point types used in Benchmark are 64bit. Timing is based on looping 100 million times. Clarification: SmallInt and SmallDouble refers to small values (10-10000) stored in int64_t and double, not referring to the type size. Big integer and double value range from 10,000 to 1000,000, if they are any bigger, there would be overflow in 64bit integer.

Hardware Specs

  • Processor: Intel i7-6700 CPU @ 3.40GHz, 3400 Mhz, 4 Cores, 8 Logical Processors
  • RAM: 16 GB
  • Graphics Card: NVIDIA GeForce GTX 1060 6GB

CSharp x64 Benchmark

Note: x86-32 executable typically has worse integer performance than floating point (not shown here). You can build as x86-32 executable and run it to see for yourself.

Multiplication and Division Benchmark
=====================================
MulBigDouble RunTime:00:00.186
MulBigInt RunTime:00:00.157
DivBigDouble RunTime:00:00.160
DivBigInt RunTime:00:00.776
MulSmallDouble RunTime:00:00.192
MulSmallInt RunTime:00:00.191
DivSmallDouble RunTime:00:00.205
DivSmallInt RunTime:00:00.933

Addition and Subtraction Benchmark
==================================
AddBigDouble RunTime:00:00.167
AddBigInt RunTime:00:00.154
SubBigDouble RunTime:00:00.151
SubBigInt RunTime:00:00.152
AddSmallDouble RunTime:00:00.204
AddSmallInt RunTime:00:00.187
SubSmallDouble RunTime:00:00.186
SubSmallInt RunTime:00:00.218

C++ x64 Benchmark

Multiplication and Division Benchmark
=====================================
       MulBigDouble:   57ms
          MulBigInt:   49ms
       DivBigDouble:   96ms
          DivBigInt:  636ms
     MulSmallDouble:   60ms
        MulSmallInt:   68ms
     DivSmallDouble:  118ms
        DivSmallInt:  823ms

Addition and Subtraction Benchmark
==================================
       AddBigDouble:   57ms
          AddBigInt:   49ms
       SubBigDouble:   64ms
          SubBigInt:   49ms
     AddSmallDouble:   69ms
        AddSmallInt:   59ms
     SubSmallDouble:   63ms
        SubSmallInt:   59ms

Most of the time, integer performance is on par with floating point, with exception of division.

The performance of floating point arithmetic has caught up with the integer in the last 15 years. This very much removes the requirement to have our own custom fixed point type to wring last drop of performance out of processor. For those who are not familiar, fixed point is arithmetic type which is like floating point except its decimal point is fixed, does not move, hence its name. The main difference is fixed point arithmetic is executed on the integer unit, not on floating point unit. Fixed point type was relevant during the period where integer perf was crown over floating point. Source code download consists of the CSharp and C++ version of the same benchmark.

Any suggestions on how to improve the nature of benchmark or constructive criticism on what I have been doing wrong, are all welcome.

Source code is hosted at Github.

C++: Simple Permutation and Combination Parallelism

Primary Motivation

My Github repository, Boost Concurrent Permutations and Combinations on CPU, has been up since January 2017. However, download is close to none while my older single threaded next_combination has plenty of downloads. So I figured it must be the code usage is deemed too complex. Aim of this article is to show a very simple, albeit inefficient, multithreaded example to give users a guide. It should be relatively easy for user to come up with an efficient multithreaded code, once they understand the examples in this article. For each permutation and combination section, I am going to start off with a single threaded example then followed by a multi-threaded example.

Single Threaded Permutation

First I’ll show you the example on using next_permutation in single threaded scenario. The while loop display std_permuted until next_permutation returned false when std_permuted is detected to be in descending order. That is “54321”.

void usage_of_next_perm()
{
    std::string std_permuted = "12345";
    do
    {
        std::cout << std_permuted << std::endl;
    } while (std::next_permutation(std_permuted.begin(), std_permuted.end()));
}

Output is follows.

12345
12354
12435
12453
12534
12543
...
54123
54132
54213
54231
54312
54321

Multi-Threaded Permutation

Next, I’ll show you on how to finding permutation in multithreaded scenario. There is a total of 120 different permutations in collection of 5 elements. We’ll use 4 threads in my example, meaning each thread will find 120/4=30 permutations. To tell you the truth, each thread will find 30-1=29 permutations, because the 1st one is already computed by find_perm_by_idx in the main thread and passed to the thread. To keep the discussion simple, we pretend each thread computes 30 permutations. 1st thread finds 0th to 29th permutations, while 2nd thread starts from 30th to 59th permutations, and 3rd thread begins from 60th to 89 and so on. From 2nd thread and 3rd thread to start from 30th and 60th permutation respectively, we need to make use of find_perm_by_idx to find 30th and 60th permutation given their index, we’ll find the rest with next_permutation. Techincally, it is possible to use find_perm_by_idx to find all permutations by giving indices from 0 to 119 but it is simply not fleasible to do so, because find_perm_by_idx is much much slower than next_permutation. As a word of caution, index_to_find must be of type that is large enough to store the factorial(n). For instance, you are just finding the 1st 10 permutations, index_to_find type can store 0 to 10 but if it cannot store the factorial(n), find_perm_by_idx will not generate the permutation correctly. This applies to find_comb_by_idx as well. If your largest integer type is not large enough, you can consider using Boost Multiprecision Integer. And vector_type can be any collection type that supports push_back() and size() methods, like std::string. You can see I use std::string in my example.

I join my theads before starting the next thread because all_results is shared with all threads and it is not guarded with a lock, therefore not thread-safe. It is only meant to be a simple example on how to find_perm_by_idx. You can see I call the the slow find_perm_by_idx in the main thread. It is your job to try figure out on how to call find_perm_by_idx in the worker threads.

template<typename int_type, typename vector_type>
vector_type find_perm_by_idx(int_type index_to_find,
	vector_type& original_vector);

void usage_of_perm_by_idx()
{
    uint64_t index_to_find = 0;
    std::string original_text = "12345";
    std::string std_permuted = "12345";
    std::vector all_results;
    size_t repeat_times = 30;
    for (size_t i=0; i<4; ++i)
    {
        std::string permuted = concurrent_perm::find_perm_by_idx(index_to_find, original_text);

        std::thread th([permuted, &all_results, repeat_times] () mutable {

            // do your work on the permuted result, instead of pushing to vector
            all_results.push_back(permuted); 
            for (size_t j=1; j<repeat_times; ++j)
            {
                std::next_permutation(permuted.begin(), permuted.end());
                // do your work on the permuted result, instead of pushing to vector
                all_results.push_back(permuted); 
            }
        });
        th.join();

        index_to_find += repeat_times;
    }
}

I skip the output here, I have verified they are the same as generated by single threaded next_permutation.

Single Threaded Combination

Next, we have come to next_combination section. To compute the total combination count, we use compute_total_comb. total must be of type that is large enough to store the factorial(n).

void usage_of_next_comb()
{
    std::string original_text = "123456";
    std::string std_combined = "123";
    uint64_t total = 0;
    if(concurrent_comb::compute_total_comb(original_text.size(), std_combined.size(), total))
    {
        std::cout << std_combined << std::endl;
        for (uint64_t i = 1; i < total; ++i)
        {
            stdcomb::next_combination(original_text.begin(), original_text.end(), std_combined.begin(), std_combined.end());
            std::cout << std_combined << std::endl;
        }
    }
}

Output is follows. If your collection is sorted like original_text, all generated combination will be sorted as well, as you can see below.

123
124
125
126
134
135
136
145
146
156
234
235
236
245
246
256
345
346
356
456

Multi-Threaded Combination

index_to_find must be of type that is large enough to store the factorial(n) and vector_type can be any collection type that supports push_back() and size() methods, like std::string. Similar to find_perm_by_idx mentioned above, it is possible to find all combinations with find_comb_by_idx but it is much slower compared to next_combination, so we resort to next_combination to find the rest of combinations. There are total of 20 combinations, so each of the 4 thread will find 20/4=5 combinations. Actually, each thread will find 5-1=4 combination, because the 1st one is already computed by find_comb_by_idx in the main thread and passed to the thread.

template<typename int_type, typename vector_type>
vector_type find_comb_by_idx(const uint32_t subset,
	int_type index_to_find,
	vector_type& original_vector);

void usage_of_comb_by_idx()
{
    uint64_t index_to_find = 0;
    std::string original_text = "123456";
    std::string std_combined = "123";
    std::vector all_results;
    size_t repeat_times = 5;
    for (size_t i = 0; i < 4; ++i)
    {
        std::string combined = concurrent_comb::find_comb_by_idx(std_combined.size(), index_to_find, original_text);

        std::thread th([original_text, combined, &all_results, repeat_times] () mutable {

            // do your work on the combined result, instead of pushing to vector
            all_results.push_back(combined);
            for (size_t j = 1; j < repeat_times; ++j)
            {
                stdcomb::next_combination(original_text.begin(), original_text.end(), combined.begin(), combined.end());
                // do your work on the combined result, instead of pushing to vector
                all_results.push_back(combined);
            }
        });
        th.join();

        index_to_find += repeat_times;
    }
}

Output is skipped here, afterall I have verified they are the same as generated by single threaded next_combination.

 

Improved Next Combination with State

To speed up next_combination, we can store the state of generated combination so that it does not have to find which current combination elements correspond to the bigger collection. One way to do it is to store this state inside a class but this violates the design of STL algorithms. Another way to do it, is to pass this state to next_combination at every call. The declaration of next_combination and next_combination_with_state are listed below so that we can compare them side by side. The 1st one is current next_combination and 2nd one is overloaded one with 5th parameter as equality predicate and the 3rd is the new next_combination_with_state which also has 4 parameters as 1st next_combination but the last 2 parameters are of BidItIt type which is iterator whose value type is BidIt iterator. In other words, BidItIt is iterator of iterator! By storing BidIt iterator of n_begin and n_end itself, I could save some time without finding the range of r_begin and r_end that corresponds to n_begin and n_end. Reader may question if iterator of iterator violates the principle of STL algorithms as well. Well, it is one of the not-so-bad tradeoff, compared to alternative I just mentioned, we have to make for performance. We can expect performance gain of 4X to 10X, depending on how big n and r collection.

// Plain old next_combination
template <class BidIt>
inline bool next_combination(
    BidIt n_begin, 
    BidIt n_end,
    BidIt r_begin, 
    BidIt r_end);

// Plain old next_combination with equality predicate
template <class BidIt, class Prediate>
inline bool next_combination(
    BidIt n_begin, 
    BidIt n_end,
    BidIt r_begin, 
    BidIt r_end,
    Prediate Equal);

// New next_combination_with_state
// its state is stored in r_beginIT and r_endIT
// which iterators of BidIt iterators
template <class BidIt, class BidItIt>
inline bool next_combination_with_state(
    BidIt n_begin, 
    BidIt n_end,
    BidItIt r_beginIT, 
    BidItIt r_endIT);
	
// New next_combination_with_state does not have 
// version with equality predicate because it compare 
// with BidIt iterators, not elements which BidIt 
// iterator pointed to.

next_combination_with_state does not have version with equality predicate because it compare with BidIt iterators, not elements themselves.

I reproduce example of next_combination usage so that we can compare with the one of next_combination_with_state.

#include<iostream>
#include<vector>
#include<string>
#include"combination.h"

using namespace std;
using namespace stdcomb;

// for use with next_combination examples!
template<class BidIt>
void display(BidIt begin,BidIt end)
{
  for (BidIt it=begin;it!=end;++it)
      cout<<*it<<" ";
  cout<<endl;
}

//test next_combination() with iterators
int main()
{
  vector<int> ca;
  ca.push_back (1);
  ca.push_back (2);
  ca.push_back (3);
  ca.push_back (4);
  ca.push_back (5);
  ca.push_back (6);
  vector<int> cb;
  cb.push_back (1);
  cb.push_back (2);
  cb.push_back (3);
  cb.push_back (4);
   
  do
  {
    display(cb.begin(),cb.end());
  }
  while(next_combination(ca.begin (),ca.end (),cb.begin (),cb.end()) );
  
  cout<<"Complete!"<<endl;
  
  return 0;
}

The next_combination_with_state example is below. Instead of constructing a vector of integer for smaller collection, we construct cbit, a vector out of ca iterators. We also have a new display2 function to display the result, the main difference, it iterator is dereferenced twice, instead of once in display. Note, we cannot dereference first before passing to display because cbit.end() cannot be dereferenced as it is the one past the last valid iterator. Previously, I tried putting cbit.begin() and cbit.end() result back to cb, an already allocated vector. I got back the same performance, back to square one. Only use next_combination_with_state when you are comfortable with having your result as iterators of iterators. Since cbit stores ca iterators, ca must be kept alive while you still have cbit, else you got dangling iterators. next_combination_with_state requires C++17 because it uses reverse_iterator.

#include<iostream>
#include<vector>
#include<string>
#include"combination.h"

using namespace std;
using namespace stdcomb;

template<class BidItIt>
void display2(BidItIt begin, BidItIt end)
{
    for (BidItIt it = begin; it != end; ++it)
        cout << **it << " ";
    cout << endl;
}

//test next_combination_with_state() with iterators
int main()
{
  vector<int> ca;
  ca.push_back (1);
  ca.push_back (2);
  ca.push_back (3);
  ca.push_back (4);
  ca.push_back (5);
  ca.push_back (6);
  
  vector cbit;
  vector::iterator it = ca.begin();
  for(; it!=ca.end()-2; ++it)
    cbit.push_back(it);

  do
  {
    display2(cbit.begin(), cbit.end());
  }
  while(next_combination_with_state(ca.begin (),ca.end (),cbit.begin (),cbit.end () ) );
  
  cout<<"Complete!"<<endl;
  
  return 0;
}

The source code is hosted at Boost Concurrent Permutations and Combinations on CPU