C++14/20 Heterogeneous Lookup Benchmark

Download the benchmark code on GitHub.

C++14 introduced ordered transparency lookup which enables const char* and string_view lookup without string instantiation on map/set objects. C++20 introduced unordered transparency lookup that allows to do same thing with unordered_map/unordered_set. We are officially in 2020 but as of writing time, the C++20 Standard is yet to be ratified by C++ committee. However, Visual C++ team has already implemented some of C++20 library features. Thanks, Billy O’Neal and VC++ team! In order to enable latest C++ standard in VC++ to compile this benchmark, go to VC++ general property page and choose std:c++latest from the C++ Language Standard dropdown as shown below.

property_page_cpp20

To help you understand the sv suffix in the later code, it has to be noted whenever sv is appended to a string literal, we are telling the compiler to create a string_view from literal. What is a string_view? string_view is a non-owning view to a not-null-terminating character buffer with its length. Care must be taken with accessing string_view that the original std::string or character buffer, it pointed to, must be still in scope.

"xxx" // char string literal
"xxx"s // create std::string from literal
"xxx"sv // create std::string_view from literal

In a normal map::find(), a const char* string literal actually creates a temporary string while string_view results in a compilation error.

std::map mapNormal;
mapNormal.find("Susan"); // memory alloc
mapNormal.find("Susan"sv); // compile error with string_view

Let’s see a transparent map. Transparent lookup name makes people instantly relate to opacity which had nothing to do at all with this feature. Geez, talk about bad naming! A transparent map is created by giving it a 3rd templated predicate of std::less which is otherwise std::less when not specified. Now no string instantiation is required when const char* and string_view is used. If you want to delve deeper as to how these magic works, read Bartlomiej Filipek’s excellent blogpost for more information.

std::map<std::string, size_t, std::less > mapTrans;
mapTrans.find("Terry"); // no memory alloc but strlen() is used
mapTrans.find("Terry"sv); // no memory alloc

As with a normal unordered_map::find(), const char* string literal creates a temporary string while string_view results in a compilation error.

std::unordered_map unordmapNormal;
unordmapNormal.find("Susan"); // memory alloc
unordmapNormal.find("Susan"sv); // compile error with string_view

To create a transparent unordered_map, a hash functor with a transparent_key_equal type and () operator overloads for const char* and string_view has to be defined and passed to unordered_map as 3rd template type.

struct string_hash {
    using transparent_key_equal = std::equal_to;  // Pred to use
    using hash_type = std::hash;  // just a helper local type
    size_t operator()(std::string_view txt) const { return hash_type{}(txt); }
    size_t operator()(const std::string& txt) const { return hash_type{}(txt); }
    size_t operator()(const char* txt) const { return hash_type{}(txt); }
};

std::unordered_map unordmapTrans;
unordmapTrans.find("Terry"); // no memory alloc but strlen() is used
unordmapTrans.find("Terry"sv); // no memory alloc

The benchmark code is shown below. Ignore the total and grandtotal variables. Their only purpose is to prevent the compiler from optimizing away the for loops since they are not doing any useful work. Initially, I was using the associative container’s [] operator for search but it turns out that [] does not accept string_view, so find() is used.

void benchmark(
    const std::vector& vec_shortstr, 
    const std::vector& vec_shortstrview, 
    const std::map& mapNormal,
    const std::map<std::string, size_t, std::less >& mapTrans,
    const std::unordered_map& unordmapNormal,
    const std::unordered_map& unordmapTrans)
{
    size_t grandtotal = 0;

    size_t total = 0;

    timer stopwatch;
    total = 0;
    stopwatch.start("Normal Map with string");
    for (size_t i = 0; i < MAX_LOOP; ++i)
    {
        for (size_t j = 0; j second;
        }
    }
    grandtotal += total;
    stopwatch.stop();

    total = 0;
    stopwatch.start("Normal Map with char*");
    for (size_t i = 0; i < MAX_LOOP; ++i)
    {
        for (size_t j = 0; j second;
        }
    }
    grandtotal += total;
    stopwatch.stop();

    total = 0;
    stopwatch.start("Trans Map with char*");
    for (size_t i = 0; i < MAX_LOOP; ++i)
    {
        for (size_t j = 0; j second;
        }
    }
    grandtotal += total;
    stopwatch.stop();
    
    total = 0;
    stopwatch.start("Trans Map with string_view");
    for (size_t i = 0; i < MAX_LOOP; ++i)
    {
        for (size_t j = 0; j second;
        }
    }
    grandtotal += total;
    stopwatch.stop();
    
    total = 0;
    stopwatch.start("Normal Unord Map with string");
    for (size_t i = 0; i < MAX_LOOP; ++i)
    {
        for (size_t j = 0; j second;
        }
    }
    grandtotal += total;
    stopwatch.stop();

    total = 0;
    stopwatch.start("Normal Unord Map with char*");
    for (size_t i = 0; i < MAX_LOOP; ++i)
    {
        for (size_t j = 0; j second;
        }
    }
    grandtotal += total;
    stopwatch.stop();

    total = 0;
    stopwatch.start("Trans Unord Map with char*");
    for (size_t i = 0; i < MAX_LOOP; ++i)
    {
        for (size_t j = 0; j second;
        }
    }
    grandtotal += total;
    stopwatch.stop();

    total = 0;
    stopwatch.start("Trans Unord Map with string_view");
    for (size_t i = 0; i < MAX_LOOP; ++i)
    {
        for (size_t j = 0; j second;
        }
    }
    grandtotal += total;

    stopwatch.stop();

    std::cout << "grandtotal:" << grandtotal << " <--- Ignore this\n" << std::endl;

}

First, we run the benchmark with short text. The benchmark is built in Release x64 mode with /Ox, the highest compiler optimization. Short text should be faster since std::string is implemented with Short String Optimization(SSO), meaning std::string has a short buffer which is used whenever the text is short enough to fit in that buffer, instead of allocating on the heap. string_view has about the same performance as std::string that looks about right while const char* has worse performance in transparent map than normal map. It should be a bug which I shall report to Microsoft.

Short String Benchmark
======================
          Normal Map with string timing:  652ms
           Normal Map with char* timing:  723ms
            Trans Map with char* timing:  829ms
      Trans Map with string_view timing:  608ms

This is the short text benchmark result with unordered_map. The result is as expected.

    Normal Unord Map with string timing:  206ms
     Normal Unord Map with char* timing:  506ms
      Trans Unord Map with char* timing:  296ms
Trans Unord Map with string_view timing:  211ms

The long text benchmark result with map. The long text is ensured to be minimum of 30 chars in order to exceed the short buffer to force memory allocation.

Long String Benchmark
=====================
          Normal Map with string timing:  589ms
           Normal Map with char* timing: 2292ms
            Trans Map with char* timing: 2442ms
      Trans Map with string_view timing:  602ms

The long text benchmark result with unordered_map.

    Normal Unord Map with string timing:  738ms
     Normal Unord Map with char* timing: 2382ms
      Trans Unord Map with char* timing: 1506ms
Trans Unord Map with string_view timing:  762ms

What about GCC and Clang? I do not have access to the latest C++ compilers on Linux. You are welcome to build and run the single cpp benchmark with GCC and Clang.

If you have been reading my articles over the years, you will notice they are usually performance-focused. In the new decade, I am going to shift my focus to code safety and developer life. If you are interested in these 2 topics, stay tuned!

References

 

Succinct Guide to Floating-Point Format For C++ and C# Programmers

Floating-Point Dissector is available for download at GitHub.

Table of Contents

Introduction

The IEEE 754 standard for floating-point format has been ubiquitous in hardware and software since established in 1985. Programmers has been using floating-point indiscriminately for real-number calculations. However, not many can claim to understand floating-point format and its properties, so more than a few misunderstandings has arisen. In this new year and new decade, I throw down the gauntlet as a challenge to myself to write a concise and easy-to-digest guide to explain floating-point once and for all. Common pitfalls shall be covered.

Floating-Point Dissector

Floating-Point Dissector (with C++ and C# version) is written for programmers to examine floating-point bits after setting its value. Only class to check single(32-bit) and double(64-bit) precision are written for. Main reason is half(16-bit), long double(80-bit) and quad(128-bit) precision are not directly supported on Microsoft platform. Code for single and double precision dissector are identical, their only difference are the constants defined at the top of each class. C++ experts should see this code ripe for template and type traits. But I prefer to keep things simple as there are only 2 class for single and double. See the usage of C# code below on how to view the single precision of value 1 with the Display() method. C++ code is not shown because it is similar.

FloatDissector f = new FloatDissector(1.0f);
f.Display();

The output from Display() is shown below. Adjusted Exponent is the true value. Adjusted Exponent = Raw Exponent - Exponent Bias. Exponent Bias in single precision case is 127. More on this later when we get to the floating-point format section.

Sign:Positive, Adjusted Exponent:0, Raw Exponent:01111111, 
Mantissa:00000000000000000000000, Float Point Value:1

There are also methods to set Not-a-Number(NaN), Infinity(INF). There are 2 ways to set a NaN. The first way is to pluck the NaN from double type and give it to constructor. The second way is the DoubleDissector.SetNaN() method whose first parameter is the sign bit and the second one is mantissa which can be any value but shall be greater than zero beause a zero mantissa does not constitute a NaN. This indirectly implied there are more than 1 NaN value.

DoubleDissector d = new DoubleDissector(double.NaN); // set NaN

d.SetNaN(DoubleDissector.Sign.Positive, 2); // NaN can be set this way as well.

Console.WriteLine("IsNaN:{0}", d.IsNaN());
Console.WriteLine("IsZero:{0}", d.IsZero());
Console.WriteLine("IsInfinity:{0}", d.IsInfinity());
Console.WriteLine("IsPositiveInfinity:{0}", d.IsPositiveInfinity());
Console.WriteLine("IsNegativeInfinity:{0}", d.IsNegativeInfinity());
Console.WriteLine("IsNaN:{0}", double.IsNaN(d.GetFloatPoint()));

After the end of the guide, reader shall be confident of implementing his own dissector.

Common Misconceptions

It is a common knowledge among programmers that floating-point format is unable to represent transcendental numbers like PI and E whose fractional part continues on indefinitely. It is not unusual to see a very high precision PI literal is defined in C/C++ code snippets whose the original poster take a leap of faith that the compiler try its best possible effort to quantize into IEEE 754 format and do rounding when needed. Taking 32-bit float as an example, the intention goes as planned (See diagram below).

surprising_behaviour

Surprise pop up for E. As it turns out, there are finite bits in a floating-point to perform quantization from a floating-point literal. Reality sets in when a simple number like 0.1 cannot be represented in single precision perfectly as well.

float a = 0.1f;
Console.WriteLine("a: {0:G9}", a);

Output is as follows.

a: 0.100000001

Mathematical identities (associative, commutative and distributive) do not hold.

Associative rule does not apply.

x + (y + z) != (x + y) + z

Commutative rule does not apply.

x * y != y * x

Distributive rule does not apply.

x * y - x * z != x(y - z)

In fact with floating-point arithmetic, the order of computation matters and the result can change on every run when order changes. Programmer can be tripped by this if he rely on your result to be consistent. During C++17 standardization, the parallelized version of std::accumulate() was given a new name: std::reduce() so as to let people know this parallelized function can return different result.

Division and multiplication cannot be interchanged. Common misconception claimed division is more accurate than multiplication. This is not true. These 2 operations could yield slighly different results.

x / 10.0 != x * 0.1

Floating-Point Conversion to Integer

Floating-point conversion to integer can be done with a int cast. The caveat is the cast actually truncate it towards zero which may not be desired.

float f = 2.9998f;
int num = (int)(f); // num is 2

To fix this problem, add 0.5 before casting.

float f = 2.9998f;
int num = (int)(f + 0.5f); // num is 3 now

To cater for negative value,

if(f < 0.0f)
    num = (int)(f - 0.5f);
else
    num = (int)(f + 0.5f);

A better solution for C# is to use the static Math.Round method.

int a = (int)Math.Round(4.5555);      // a == 5

Math.Round is very convenient to use. But it has to be noted that rounding follows IEEE Standard 754, section 4 standard. This means that if the number being rounded is halfway between two numbers, the Round operation shall always round to the even number.

int x = (int)Math.Round(1.5);      // x == 2
int y = (int)Math.Round(2.5);      // y == 2

For C++ 11, std::round and std::rint are available for rounding. std::rint operates according to the rounding rules set by calls to std::fesetround. If the current rounding mode is…

  • FE_DOWNWARD, then std::rint is equivalent to std::floor.
  • FE_UPWARD, then std::rint is equivalent to std::ceil.
  • FE_TOWARDZERO, then std::rint is equivalent to std::trunc.
  • FE_TONEAREST, then std::rint differs from std::round in that halfway cases are rounded to even rather than away from zero.

C++ equivalent of above C# code is below, using std::rint.

int x = (int)std::rint(1.5);      // x == 2
int y = (int)std::rint(2.5);      // y == 2

32-Bit Floating-Point Format

In this guide, we focus on single-precision (32-bit) float. Everything covered, applies to other precision float where information can easily adjust with the table found at the end of this section.

A single-precision format has one sign bit, 8-bit exponent and 23-bit mantissa. The value is negative when the sign bit is set. To get the true exponent, subtract exponent bias(127) from the raw exponent. In the raw exponent, 0(all zeroes) and 255(all ones) are reserved, valid range comprises of 1 to 254. The mantissa for normalized number(as opposed) has a hidden bit which is not stored. So we could infer mantissa has in fact 24-bits of precision. How do we get zero when hidden bit is always 1? Zero is a special number when both the raw exponent and mantissa are zero.

32bit_format

This is the formula for converting the components into a floating-point value.

formula

Let’s see what the dissector display for the exponent of 0.25, 0.5 and 1.0. Do note that hidden mantissa bit is on.

FloatDissector f = new FloatDissector(0.25f);
f.Display();
f.SetFloatPoint(0.5f);
f.Display();
f.SetFloatPoint(1.0f);
f.Display();

Output of the above code is shown below. Take note the exponent is in radix 2, not 10: M * 2e. So 1 * 2-2 would give 0.25. And 1 * 2-1 = 0.5 and 1 * 20 = 1.

Sign:Positive, Adjusted Exponent:-2, Raw Exponent:01111101, 
Mantissa:00000000000000000000000, Float Point Value:0.25

Sign:Positive, Adjusted Exponent:-1, Raw Exponent:01111110, 
Mantissa:00000000000000000000000, Float Point Value:0.5

Sign:Positive, Adjusted Exponent:0, Raw Exponent:01111111, 
Mantissa:00000000000000000000000, Float Point Value:1

As the number gets larger as shown, mantissa precision remains constant(223=8,388,608), we can say the precision actually becomes lesser. The number of floats from 0.0

  • …to 0.1 = 1,036,831,949
  • …to 0.2 =     8,388,608
  • …to 0.4 =     8,388,608
  • …to 0.8 =     8,388,608
  • …to 1.6 =     8,388,608
  • …to 3.2 =     8,388,608

Between 0.0 and 0.1, there is more floats because subnormal is included with normal number. Subnormal are small numbers very close to zero.

Now let’s play around mantissa bits to see how it works. As we proceed from MSB to LSB, each bit value is halved from its preceding bit value. If the MSB or hidden bit has the value of 1, its next bit is 1/2 and the 3rd bit is 1/4. If we set those 2 bits to one, we should get 1 + 0.5 + 0.25 = 1.75

1_75format

FloatDissector f = new FloatDissector(1.0f);
f.SetMantissa(0x3 << 21); // shift binary 11 to the left 21 times.
f.Display();

The output is as we expected.

Sign:Positive, Adjusted Exponent:0, Raw Exponent:01111111, 
Mantissa:11000000000000000000000, Float Point Value:1.75

However, the hidden bit value is not always 1. It depends on the exponent. When the exponent is -1, it is 1*2-1=0.5!

0_875format

If we set those 2 bits again, we should get 0.5 + 0.25 + 0.125 = 0.875

FloatDissector f = new FloatDissector(0.5f);
f.SetMantissa(0x3 << 21); // shift binary 11 to the left 21 times.
f.Display();

We are right again!

Sign:Positive, Adjusted Exponent:-1, Raw Exponent:01111110, 
Mantissa:11000000000000000000000, Float Point Value:0.875

The size of fields in each floating-point type is shown together with their significant digit precision.

fp_chart

Zero

IEEE 754 floating-point has 2 distinct zero: positive and negative. A positive zero is indicated by all the bits, including the sign bit, unset. That is why, in a structure with Plain Old Data (POD) types of integer and float type, it is possible, using memset() to quickly zero-initialize every member.

Subnormal (Underflow)

Subnormal, also known as denormalized number, are IEEE 754 mechanism to deal with very small numbers close to zero. Subnormal are indicated by all zero exponent and non-zero mantissa. With normal number, the mantissa has an implied leading one bit. But with subnormal, the leading 1-bit is not assumed, so subnormal has 1-bit less mantissa precision now (23-bit precision). With the raw exponent set to zero, the true (adjusted) exponent is 0 - 127 which yields -127. The smallest positive subnormal number is when the mantissa is all zeroes except for the Least Significant Bit(LSB), which is approximate to 1.4012984643 × 10−45 while the smallest positive normal number is approximately 1.1754943508 × 10−38.

Subnormal are much slower on Intel processor

If you search on Stack Overflow for questions on subnormal, you will no doubt come across this thread which shows subnormal calculations are much slower on Intel architecture processor because normalized number arithmetic is implemented in hardware while subnormal one has to take the slow microcode path. Unless you are dealing with very small, close to zero numbers, you can safely ignore this performance issue. If you are in the other camp, using C++ and SIMD, you have 2 options.

  • Use the -ffast-math compiler flag to sacrifice correctness to gain more FP performance. Not recommended.
  • Set CPU flags (Stopgap measure)
    • Flush-to-zero : treat subnormal outputs as 0
    • Denormals-to-zero : treat subnormal inputs as 0

There are 2 ways to do the 2nd option. First one is to set control and status register(CSR) via the Intel Intrinsic, _mm_setcsr() but this method requires you to remember the hexadecimal number(0x8040) to or the bits with existing CSR.

_mm_setcsr(_mm_getcsr() | 0x8040);

A better method is to call _MM_SET_FLUSH_ZERO_MODE and _MM_SET_DENORMALS_ZERO_MODE() with _MM_FLUSH_ZERO_ON and _MM_DENORMALS_ZERO_ON respectively.

_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);

What about x87 Floating-Point Unit(FPU)? Unfortunately, there is no x87 FPU functionality to treat subnormal as zero.

Infinity (Overflow)

1/0

When any greater than zero number is divided by zero, you get infinity, not divide-by-zero exception. That exception is for integer divide-by-zero. Infinity is indicated all ones exponent and all zeroes mantissa. As with zero number, you have both positive and negative infinity.

You can assign Infinity by .NET’s Single.PositiveInfinity or Single.NegativeInfinity field and Double.PositiveInfinity or Double.NegativeInfinity field in C# or std::numeric_limits::infinity() defined in limits header in C++11 or using dissector’s SetInfinity() method.

To test for infinity, if you are using C#, call Single.IsInfinity() and Double.IsInfinity(). If you are on C++11, call std::isinf().

Rules for infinity correspond to common sense. See below.

6 + infinity = infinity
6 / infinity = 0

Not a Number

0/0

Not a Number(NaN) is what you get when you divide a zero by zero or do a square root on -1. There are 2 types of NaN: quiet and signalling but we are not covering them here. You can assign NaN by .NET’s Single.NaN and Double.NaN field in C# or std::nan() defined in cmath header in C++11 or using dissector’s SetNaN() method.

Never test for NaN by testing for equality with another NaN. When you compare 2 NaN, the equality test shall return false, even when the 2 NaN in question are exactly the same in binary. If you are using C#, call Single.IsNaN() and Double.IsNaN(). If you are on C++11, call std::isnan().

Rules for NaN also correspond to common sense. See below.

NaN + anything = NaN
NaN * anything = NaN

Summary Chart

The chart summaries the corresponding exponent and mantissa that constitute the floating-point values (like Subnormal, NaN and Infinity).

diff_combo

Floating-Point Exception

C# as a language does not expose floating-point exception. Do not even think of circumventing this limitation with P/Invoke to change the FPU status because the .NET Framework and 3rd party libraries rely on the FPU in certain state.

On Microsoft C++, it is possible to catch floating-point exception via Structured Exception Handling(SEH) after enabling the FP exception through the _controlfp_s. By default, all FP exceptions are disabled. SEH style exception can be translated to C++ typed exceptions through _set_se_translator. For those who shun proprietary technology and write cross-platform Standard C++ code, read on. As noted on this cppreference page on C++11 floating-point environment. The floating-point exceptions are not related to the C++ exceptions. When a floating-point operation raises a floating-point exception, the status of the floating-point environment changes, which can be tested with std::fetestexcept, but the execution of a C++ program on most implementations continues uninterrupted.

Equality Test

Never test floating-point for equality. Guess the output below!

float a = 1.3f;
float b = 1.4f;
float total = 2.7f;
float sum = a + b;
if (sum == total)
    Console.WriteLine("Same");
else
{
    Console.WriteLine("Different");
    Console.WriteLine("{0:G9} != {1:G9}", sum, total);
}
Console.WriteLine("a: {0:G9}", a);
Console.WriteLine("b: {0:G9}", b);

The reason they are not equal because single precision cannot represent 1.3 and 1.4 perfectly. Although changing the type to double precision solves the problem in this case, all floating-point type inherently has this imprecise problem therefore they should not be used as keys in a dictionary (or a std::map in C++ case). The only time equality test poses no problem are the numbers involved, are not result of any computation but directly assignment of literal constant.

Different
2.69999981 != 2.70000005
a: 1.29999995
b: 1.39999998

Always test floating-point for nearness. Choose an epsilon value a bit larger than your largest error. There is no one-size-fits-all epsilon to use for all cases. For example, the epsilon is set to 0.0000001 and the 2 numbers are actually 0.000000008 and 0.000000005, the test will always pass, so clearly the chosen epsilon is wrong! Nearness test do not have the transitive property as equality test. For instance, a==b and b==c, does not necessarily mean a==c. See the example below.

float epsilon = 0.001f;
float a = 1.5000f;
float b = 1.5007f;
float c = 1.5014f;
if (Math.Abs(a - b) < epsilon)
    Console.WriteLine("a==b");
else
    Console.WriteLine("a!=b");

if (Math.Abs(b - c) < epsilon)
    Console.WriteLine("b==c");
else
    Console.WriteLine("b!=c");

if (Math.Abs(a - c) < epsilon)
    Console.WriteLine("a==c");
else
    Console.WriteLine("a!=c");

The output is as follows.

a==b
b==c
a!=c

For C++, do not use std::numeric_limits::epsilon. For C#, do not use Single.Epsilon and Double.Epsilon. Why? Because these epsilon defined the smallest positive value but most of your time, your largest error value is larger than the smallest value, so your nearness test would fail most of time with those standard epsilon.

Whenever possible, turn == into >= or <= comparison, that would be more ideal.

Other Types

Fixed-Point

When the exponent is fixed at certain negative number, there is no need to store it. The fractional part or decimal point is fixed, thus the name: fixed-point as opposed to floating-point. In the same application, it is not unusual to implement different fixed-point types to accommodate precision requirement of certain type of computation. In such application, conversion of floating-point result, from trigonometry or arithmetic function such as sqrt(), to fixed-point is a common practice, as it is not feasible to reimplement all the floating-point math in the standard library in fixed-point. Fixed-point is not made for calculations dealing very large quantity like total number of particles in the universe or very small numbers involving quantum mechanics due to its limited precision range.

fixed_point

Decimal

In C#, Decimal type in .NET Base Class Library (BCL) can represent numbers in radix 10 perfectly (due to its exponent is in radix 10), is most suitable to store currency for financial calculations without loss of precision. However, it must be noted that calculations with decimal are orders of magnitude slower than floating-point because it is 128-bit type and no hardware support.

cpp_float

In the C++ world, Boost is de facto, peer-reviewed, high quality library to go for whenever a C++ task/work needs to get done. cpp_float featured in Boost.Multiprecision library can be used for computations requiring precision exceeding that of standard built-in types such as float, double and long double. For extended-precision calculations, Boost.Multiprecision supplies a template data type called cpp_dec_float. The number of decimal digits of precision is fixed at compile-time via template parameter.

Posits

A new approach to floating-point called posits delivering better performance and accuracy, while doing it with fewer bits. Posit could cut the half the number of bits required, leaving more cache and memory for other information. In posit, there is only one zero and NaN value, leaving no ambiguity. And it is meant to be a drop-in replacement for IEEE floating-point with no modification needed to the code. As of the writing time, there is no commodity hardware implementation of posit but it could change in near future. It is an interesting development to keep our eyes on.

IEEE 754 and Posits Compared

ieee754_and_posit_comparisons

References

 

Lee Algorithm Mazesolver in Direct2D

Table of Contents

screenshot

Introduction

In 1999, my team won Obstacle Avoidance Robot(OAR) first prize in the Singapore Robotics Games(SRG). That was 20 years ago. I do not have a photo of OAR, but it looks very much like micromouse. The micromouse maze is constructed with the walls while OAR maze is constructed with obstacles. Both micromouse and OAR team in my school make use of Lee algorithm to solve the maze. The Lee algorithm was used to route single layer print circuit board (PCB) in 1960s and is of historical footnote until Google uses it as an interview technical test. I must point out the algorithm described in the Wikipedia page and heat map seems fishy based on my rusty memory of the algorithm, by right, the heat map should change around obstacles. Lee algorithm is a very simple algorithm where the robot travels from a higher value cell to lower value cell. Each cell value is determined from the minimum value of the 4 neighboring cells and increment one. Diagonal cell values are not taken into consideration.

The green cell is the starting cell while the yellow one is the destination. The program cannot see the entire map of obstacles. It updates its map with obstacle as it explores along. When start button is clicked, there are 2 runs: first is maze solving mode and second is optimized path mode. Optimized path might not be shortest path because as I say, it does not have the entire map, optimized path is based on the path it has explored before.

The original simulator written in Turbo C++ 20 years ago was in DOS. This time, I wrote a Windows version with MFC and Direct2D. Direct2D turned out surprisingly intuitive to use. The first half of article explains the Lee algorithm while the second half on the Direct2D code.

Lee Algorithm for Mazesolving

Choose a maximum number, m to represent obstacle, say 0xFF and the default value for unoccupied cell is m-1. Why 0xFF? The maximum number would depend on your cell dimensions of your maze or (grid if talking about PCB routing). An maze of 16×16 cells would be 16*16=256(0x100). In other words, choose a maximum value which is impossible to exceed in any circumstance. Choice of 0xFF is due to embedded hardware limitation because the 2 dimensional array is made up of unsigned char due to limited RAM on the motherboard.

  1. Set current robot in the cell[0][15].
  2. Set the value of destination cell[12][3] to 0.
  3. Before travelling next cell, do the below to determine the next cell.
  4. Set changed variable to false.
  5. Initialize all empty cells to be m-1 if they are not occupied by obstacle.
  6. Do the following for every empty cell, i.e., skip the obstacle occupied cell. Calculate the value of current cell by finding the minimum value of neighboring cells (up, down, left, right) and add one and if the new value is not equal to current value, set changed to true. Set new value as the cell value.
  7. If changed is true, repeat steps 4 to 7, else do step 8.
  8. Travel to the next neighbouring cell with the minimum value.
  9. If current cell is destination cell[12][3], stop else do from step 3 onwards.

There are some default behaviour not documented by the pseudo-code above. If front cell and left or right cell have the same minimum value, favor travelling straight instead of making a turn. If left or right cell has the same minimum value, favor making right turn. There is no right or wrong with these defaults. It depends on the maze. Robot could make a right turn leads to a dead end while left turn could lead to the destination. Whatever your choice of default behaviour, stick with them.

This is a partial weightage map (full map is too big to be shown) which all other cells in the map, other than the destination, are all initialized to 0xFE while the destination is set to 0x00.

FE|FE|FE|FE|FE|FE|FE
FE|FE|FE|FE|FE|FE|FE
FE|FE|FE|FE|FE|FE|FE
FE|FE|FE|00|FE|FE|FE
FE|FE|FE|FE|FE|FE|FE
FE|FE|FE|FE|FE|FE|FE
FE|FE|FE|FE|FE|FE|FE

After the map is initialized, the weightage is calculated by getting all the neighbouring cells and find the least value among them and add one to the least value and set it as its current cell value. The (partial) weightage map looks like this.

06|05|04|03|04|05|06
05|04|03|02|03|04|05
04|03|02|01|02|03|04
03|02|01|00|01|02|03
04|03|02|01|02|03|04
05|04|03|02|03|04|05
06|05|04|03|04|05|06

If there is any maze which this mazesolver cannot solve, save it and sent it to me or message me in this forum. Of course, if you cordon off the yellow destination cell with obstacles completely, it will keep running forever until you stop it. Below is the weightage when the destination cell is surrounded by obstacles (represented by 0xFF).

FE|FE|FE|FE|FE|FE|FE
FE|FE|FE|FE|FE|FE|FE
FE|FE|FF|FF|FF|FE|FE
FE|FE|FF|00|FF|FE|FE
FE|FE|FF|FF|FF|FE|FE
FE|FE|FE|FE|FE|FE|FE
FE|FE|FE|FE|FE|FE|FE

Important Notes

Difference between classic Lee Algorithm literature and this article, the start and target point are interchanged. Classic paper says setting the start point, not target point as zero. This is why I use the name of destination cell, not target cell to avoid confusion. In PCB routing, there is no real notion of a start and a target point, the algorithm is just trying to connect 2 endpoints.

Direct2D Code

In this section, we'll examine the Direct2D code used. Direct2D, DirectWrite and Windows Image Component(WIC) are designed from ground up to replace the ageing GDI+, their method call are as inituitive, only complaint is the class names are sometimes very verbose, a side-effect from being descriptive. Direct2D and DirectWrite are based on Component Object Model(COM) but CoInitialize() and CoUninitialize() calls are not required. This exception rule does not apply to WIC which needs COM runtime. In this mazesolver application, we are only utilizing Direct2D because text and image rendering are not required. To use Direct2D, we'll need the Windows Runtime Library (WRL) ComPtr smart pointer, similar to ATL's CComPtr (note the extra C prefix) in a way that both keeps COM object alive by incrementing reference count though IUnknown interface's AddRef() and destroy it after the reference count decremented by Release() to zero. Below is a list of headers the application needed.

#include <wrl.h>              // Windows Runtime Library for its ComPtr
#include <d2d1.h>             // Direct2D v1 header
#include <stdexcept>          // C++ exception class

#include "ComException.h"
#include "FactorySingleton.h" // Singleton to get Direct2D factories

Next, we import Direct2D library for linkage.

// Direct2D import libraries
#pragma comment(lib, "d2d1")

We have the FactorySingleton to restrict the instance of ID2D1Factory to one.

using namespace D2D1;
using namespace Microsoft::WRL;

class FactorySingleton
{
public:
    static ComPtr<ID2D1Factory> GetGraphicsFactory();
private:
    static ComPtr<ID2D1Factory> m_GraphicsFactory;
};

FactorySingleton::GetGraphicsFactory() creates the factory if m_GraphicsFactory is nullptr.

#include "FactorySingleton.h"

ComPtr<ID2D1Factory> FactorySingleton::m_GraphicsFactory;

ComPtr<ID2D1Factory> FactorySingleton::GetGraphicsFactory()
{
    if (!m_GraphicsFactory)
    {
        D2D1_FACTORY_OPTIONS fo = {};

#ifdef DEBUG
        fo.debugLevel = D2D1_DEBUG_LEVEL_INFORMATION;
#endif

        HR(D2D1CreateFactory(D2D1_FACTORY_TYPE_SINGLE_THREADED,
            fo,
            m_GraphicsFactory.GetAddressOf()));

    }
    return m_GraphicsFactory;
}

Coming up next is the declaration of the ComException and TestResult.

#include <Windows.h>
#include <string>

void TestResult(const char* expression, HRESULT hr);

struct ComException
{
    HRESULT const hr;
    std::string where;
    ComException(const char* expression, HRESULT const value) : where(expression), hr(value) {}
    std::string Message() const;
};

The body of TestResult() and ComException::Message() are defined. TestResult() is actually used in the HR macro to check for failed HRESULT.

#include "ComException.h"

void TestResult(const char* expression, HRESULT hr)
{
    if (FAILED(hr)) throw ComException(expression, hr);
}

std::string ComException::Message() const
{
    char buf[800];
    memset(buf, 0, sizeof(buf));
    sprintf_s(buf, "ComException hr:%x, Where:%s", hr, where.c_str());
    std::string str = buf;
    return str;
}

// Macro to test HRESULT and throw ComException for failed HRESULT
#define HR(expression) TestResult(#expression, (expression));

Next, we declare the render target(RT) member objects and the brushes within the ComPtr smart pointer. To draw something in Direct2D, we need to have at least one render target. But why 2 render targets? m_BmpTarget is the offscreen RT based on a bitmap. While drawing is being done to m_DCTarget, user could see the incomplete or partial drawing, therefore everything is drawn on an offscreen m_BmpTarget which its bitmap is then DrawImage() by m_DCTarget.

ComPtr<ID2D1DCRenderTarget> m_DCTarget;
ComPtr<ID2D1BitmapRenderTarget> m_BmpTarget;

ComPtr<ID2D1SolidColorBrush> m_BmpBlackBrush;
ComPtr<ID2D1SolidColorBrush> m_BmpWhiteBrush;
ComPtr<ID2D1SolidColorBrush> m_BmpYellowBrush;
ComPtr<ID2D1SolidColorBrush> m_BmpGreenBrush;
ComPtr<ID2D1SolidColorBrush> m_BmpRedBrush;

Direct2D device dependent resources such as brush are tied to the RT. You cannot use a resource created from a RT on another RT. CreateDeviceResources() is a function to create brush. ReleaseAndGetAddressOf() is called to release the COM object and return address of the ComPtr's encapsulated raw pointer member in a pointer-to-pointer argument because a new brush object is about to be created and assigned to this argument.

void CreateDeviceResources(
    ID2D1RenderTarget* target, 
    ComPtr<ID2D1SolidColorBrush>& brush, 
    D2D1_COLOR_F color)
{
    HR(target->CreateSolidColorBrush(color,
        brush.ReleaseAndGetAddressOf()));
}

Next, the m_DCTarget and m_BmpTarget creation is shown. we need to specify the pixelFormat to be DXGI_FORMAT_B8G8R8A8_UNORM in order to be compatible with GDI's device context(DC) format. That is the m_DCTarget renders to DC. Get() is to return the internal raw pointer to RT encapsulated by the ComPtr smart pointer.

// Create a pixel format and initial its format
// and alphaMode fields.
// https://docs.microsoft.com/en-gb/windows/win32/direct2d/
// supported-pixel-formats-and-alpha-modes#supported-formats-for-id2d1devicecontext
D2D1_PIXEL_FORMAT pixelFormat = D2D1::PixelFormat(
    DXGI_FORMAT_B8G8R8A8_UNORM,
    D2D1_ALPHA_MODE_PREMULTIPLIED
);

D2D1_RENDER_TARGET_PROPERTIES props = D2D1::RenderTargetProperties();
props.pixelFormat = pixelFormat;

HR(FactorySingleton::GetGraphicsFactory()->CreateDCRenderTarget(&props,
    m_DCTarget.ReleaseAndGetAddressOf()));

m_DCTarget->BindDC(pDC->GetSafeHdc(), &rect);

HR(m_DCTarget->CreateCompatibleRenderTarget(m_BmpTarget.ReleaseAndGetAddressOf()));

m_BmpTarget->SetAntialiasMode(D2D1_ANTIALIAS_MODE_PER_PRIMITIVE);
m_BmpTarget->SetTextAntialiasMode(D2D1_TEXT_ANTIALIAS_MODE_CLEARTYPE);

CreateDeviceResources(m_BmpTarget.Get(), m_BmpBlackBrush, COLOR_BLACK);
CreateDeviceResources(m_BmpTarget.Get(), m_BmpWhiteBrush, COLOR_WHITE);
CreateDeviceResources(m_BmpTarget.Get(), m_BmpYellowBrush, COLOR_YELLOW);
CreateDeviceResources(m_BmpTarget.Get(), m_BmpGreenBrush, COLOR_GREEN);
CreateDeviceResources(m_BmpTarget.Get(), m_BmpRedBrush, COLOR_RED);

In case you have been wondering how the colors for the brush are defined, they are listed below. D2D_COLOR_F color channels are namely in RGBA in float type.

D2D_COLOR_F const COLOR_WHITE = { 1.0f,  1.0f,  1.0f,  1.0f };
D2D_COLOR_F const COLOR_BLACK = { 0.0f,  0.0f,  0.0f,  1.0f };
D2D_COLOR_F const COLOR_YELLOW = { 0.99f, 0.85f, 0.0f,  1.0f };
D2D_COLOR_F const COLOR_GREEN = { 0.0f,  1.0f,  0.0f,  1.0f };
D2D_COLOR_F const COLOR_RED = { 1.0f,  0.0f,  0.0f,  1.0f };

To draw a white line, DrawLine() is called with one start point, one end point and the white brush.

D2D1_POINT_2F p0{ 0.0f, sum };
D2D1_POINT_2F p1{ 50.0f, sum };
m_BmpTarget->DrawLine(p0, p1, m_BmpWhiteBrush.Get());

To draw a white obstacle, FillRectangle() is called with a rectangle and a white brush.

auto rectTarget = RectF(0.0f, 0.0f, 10.0f, 10.0f);
m_BmpTarget->FillRectangle(&rectTarget, m_BmpWhiteBrush.Get());

The obstacle avoidance robot is represented by a green circle. To draw a ellipse as a circle, D2D1_ELLIPSE need to be initialized with a center point and 2 equal radiusX and radiusY value. Then supply the D2D1_ELLIPSE object to the FillEllipse() with its brush.

D2D1_ELLIPSE ell;
ell.point = Point2F(5.0f, 5.0f);
ell.radiusX = 3.0f;
ell.radiusY = 3.0f;
m_BmpTarget->FillEllipse(ell, m_BmpGreenBrush.Get());

Next, we are ready to implement the OnPaint() function. If m_DCTarget is nullptr, CreateDCTarget() creates both the m_DCTarget and m_BmpTarget. BeginDraw() must precede before any drawing and EndDraw() must be called after all drawing calls are completed. DrawMap() and DrawObstacles() are performed on m_BmpTarget. Then m_DCTarget is binded with DC, this is something which has to be done on every OnPaint() calls. In between the m_DCTarget's BeginDraw() and EndDraw(), m_DCTarget draws m_BmpTarget's internal bitmap. If EndDraw() fails with D2DERR_RECREATE_TARGET, m_DCTarget is reset (meaning destroyed), Invalidate() sends WM_PAINT messsage to get the OnPaint() to be called again with a null m_DCTarget (to be created again).

CPaintDC dc(this);
if (!m_DCTarget)
{
    CreateDCTarget(&dc);
}

ComPtr<ID2D1Bitmap> bitmap;
m_BmpTarget->GetBitmap(bitmap.GetAddressOf());

CRect rectClient;
GetClientRect(&rectClient);

RECT rect;

rect.top = 0;
rect.bottom = rectClient.Height();
rect.left = SHIFT_LEFT;
rect.right = rectClient.Width();

m_BmpTarget->BeginDraw();
DrawMap();
DrawObstacles();
m_BmpTarget->EndDraw();

m_DCTarget->BindDC(dc.GetSafeHdc(), &rect);

m_DCTarget->BeginDraw();

m_DCTarget->DrawBitmap(bitmap.Get());

if (D2DERR_RECREATE_TARGET == m_DCTarget->EndDraw())
{
    m_DCTarget.Reset();
    Invalidate();
    return;
}

Minimum OS should be Windows 7 and onwards because Direct2D features for higher Windows 8 and above is not used. Have fun playing with the program!

The code is hosted on GitHub.

References

History

  • 2020-01-01: Added "Important Notes" section
  • 2019-12-22: First release

C# 7: ref returns and locals

Table of Contents

Introduction

C# 7 introduced ref-local and ref-return functionality to allow safe direct-memory access to value variables. Before C# 7, we could do it in an unsafe code but now is available to access in a safe way. This is an example of ref-local variable taking the address of the a variable. b is behaving like an alias variable of a, note the use of ref keyword on both sides of the b initialization!

int a = 10;
ref int b = ref a;
b = 20;
Console.WriteLine("{0}", a); // display 20

I visualize the equivalent C++ code as such. b is a C++ reference. Like C++ reference, ref-local variable cannot be reassigned to another variable after initialization.

int a = 10;
int& b = a;
b = 20;
printf("%d", a); // display 20

ref-return on Value Member

For the ref-return on the property, we’ll use the classes below for our example. Please note that Point is structure, therefore a value type, as opposed to the reference type like Coordinate. We can only use ref-return from reference type property because structure methods cannot ref-return its instance fields. Anyone who needs a refresher on difference between .NET reference and value types can refer to this useful link.

struct Point
{
    public int x;
    public int y;
}
    
class Coordinate
{
    private Point _Point;
    public Point Pt
    {
        set { this._Point = value; }
        get { return this._Point; }
    }
    public ref Point RefPt
    {
        get { return ref this._Point; }
    }
}

As we can see, the normal Pt property has a setter while RefPt ref-return property doesn’t. The reason is due to RefPt getter directly exposing _Point for outside modification once it is ref-returned. Let’s first see how Pt property is normally used.

Coordinate cd = new Coordinate();
Point pt = cd.Pt; // a copy
pt.x = 10;
pt.y = 20;
Console.WriteLine("{0},{1}", cd.Pt.x, cd.Pt.y); // display 0,0
cd.Pt = pt;
Console.WriteLine("{0},{1}", cd.Pt.x, cd.Pt.y); // display 10,20

Next, we’ll see how value RefPt property is normally used.

Coordinate cd = new Coordinate();
ref Point pt = ref cd.RefPt;
pt.x = 10;
pt.y = 20;
Console.WriteLine("{0},{1}", cd.Pt.x, cd.Pt.y); // display 10,20

If the ref keyword is (accidentally) omitted from both sides of the initialization, ref-return will be instead copied to the pt variable, thus _Point shall remained unmodified.

Coordinate cd = new Coordinate();
Point pt = cd.RefPt;
pt.x = 10;
pt.y = 20;
Console.WriteLine("{0},{1}", cd.Pt.x, cd.Pt.y); // display 0,0

Assuming you can directly access/modify _Point member by changing to public accessibility. The code will be like this:

Coordinate cd = new Coordinate();
cd._Point.x = 10;
cd._Point.y = 20;
Console.WriteLine("{0},{1}", cd.Pt.x, cd.Pt.y); // display 10,20

Benchmark

We’ll benchmark the ref-return against value property access and public member direct access. This shall be the benchmark code of looping 10 million Coordinate objects.

Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();

for(int i=0; i < list.Count; ++i)
{
    Point pt = list[i].Pt;
    pt.x = 10;
    pt.y = 20;
    list[i].Pt = pt;
}

stopWatch.Stop();
DisplayElapseTime("Value Return RunTime:", stopWatch.Elapsed);

Stopwatch stopWatch2 = new Stopwatch();
stopWatch2.Start();

for (int i = 0; i < list.Count; ++i)
{
    ref Point pt = ref list[i].RefPt;
    pt.x = 10;
    pt.y = 20;
}

stopWatch2.Stop();
DisplayElapseTime("Ref Return RunTime:", stopWatch2.Elapsed);

Stopwatch stopWatch3 = new Stopwatch();
stopWatch3.Start();

for(int i=0; i < list.Count; ++i)
{
    list[i]._Point.x = 10;
    list[i]._Point.y = 20;
}

stopWatch3.Stop();
DisplayElapseTime("Public Member access RunTime:", stopWatch3.Elapsed);

The benchmark result is below. ref-return performance is 50% over the traditional value return access and is 20% better than the public member access! The difference in timing is more pronounced when more fields are added to Coordinate class.

Value Return RunTime:00:00.040
Ref Return RunTime:00:00.019
Public Member access RunTime:00:00.025

Conclusion

After seeing ref-return in action, I must stress that the rules for ref-return functionality must be followed.

  • The result of a regular method return value cannot be assigned to a ref local variable. However, ref-return values can be implicitly copied into non-ref variables.
  • You cannot return a ref of a local variable because the actual memory must persist beyond the local scope to avoid invalid memory access.
  • A ref variable cannot be reassigned to a new memory location after initialization.
  • Struct methods cannot ref-return instance fields.
  • This functionality cannot be used with async methods.

This feature is most useful for the situations I described below:

  • Modifying fields in a property-exposed struct
  • Directly accessing an array location
  • Repeated access to the same memory location

Code examples are hosted at Github.

Reference

  • Writing High-Performance .NET Code, 2nd Edition by Ben Watson. A must-read book for those .NET code efficiency aficionados. I am not affiliated with Amazon, meaning I get no kickback when you buy from that link, so feel free to browse that webpage. Please note that some typos are in the ref-return code examples in the book, so when you copied the code and it fails to compile, be sure to check the official C# guide.

History

  • 16th December, 2019: First release

 

Bring Your C++ OpenGL Code to the Web

Table of Contents

Introduction

Prior to reading this article, that is if you have not setup your Emscripten, you have to read this article: Bring Your C++ Code to the Web. Let me be clear: this is not a tutorial on OpenGL! It can take reading up to 100 pages of OpenGL textbook to display a triangle. It is a stretch to cover basics of OpenGL in this short article. It only covers the changes needed to modify your OpenGL ES 2.0 application to run on the web. OpenGL ES 2.0 is subset of OpenGL 2.0 and corresponded to WebGL 1.0. Every function in OpenGL ES 2.0 can be easily mapped to WebGL’s equivalent. It makes porting to Emscripten a walk in the park.

Render function

In every OpenGL application, there is a render or draw function that is called repeatedly in a main loop. In Emscripten, we have to setup the render function to be called by Javascript’s requestAnimationFrame() by giving the render function to emscripten_set_main_loop with its second argument refers to fps, is set to 0. The 3rd argument is simulate_infinite_loop which setting to zero value lead it to enter into emscripten_set_main_loop.

emscripten_set_main_loop(render, 0, 0);

Setting up OpenGL with SDL2

This is standard SDL 2 code to setup the window and OpenGL 2.0. Next, we setup VSync. GLEW is next. For those who are not familiar with GLEW, GLEW stands for OpenGL Extension Wrangler Library, is a cross-platform C/C++ library that helps in loading OpenGL functions. In the final setup step, we initialize the vertices and shaders in initGL().

//The window we'll be rendering to
SDL_Window* gWindow = NULL;

//OpenGL context
SDL_GLContext gContext;

//Initialize SDL
if (SDL_Init(SDL_INIT_VIDEO) < 0)
{
    printf("SDL could not initialize! SDL Error: %s\n", SDL_GetError());
    success = false;
}
else
{
    //Use OpenGL 2.1
    SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 2);
    SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 0);

    //Create window
    gWindow = SDL_CreateWindow("SDL Tutorial", SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED, SCREEN_WIDTH, SCREEN_HEIGHT, SDL_WINDOW_OPENGL | SDL_WINDOW_SHOWN);
    if (gWindow == NULL)
    {
        printf("Window could not be created! SDL Error: %s\n", SDL_GetError());
        success = false;
    }
    else
    {
        //Create context
        gContext = SDL_GL_CreateContext(gWindow);
        if (gContext == NULL)
        {
            printf("OpenGL context could not be created! SDL Error: %s\n", SDL_GetError());
            success = false;
        }
        else
        {
            //Use Vsync
            if (SDL_GL_SetSwapInterval(1) < 0)
            {
                printf("Warning: Unable to set VSync! SDL Error: %s\n", SDL_GetError());
            }

            GLenum err = glewInit();
            if (GLEW_OK != err)
            {
                printf("GLEW init failed: %s!\n", glewGetErrorString(err));
                success = false;
            }

            //Initialize OpenGL
            if (!initGL(userData))
            {
                printf("Unable to initialize OpenGL!\n");
                success = false;
            }
        }
    }
}

Setting up OpenGL with Emscripten

The above SDL 2 setup code used to work unmodified for Emscripten. I do not know which commit actually breaks SDL2 implementation on Emscripten. Now you have to use this code below. In emscripten_set_canvas_element_size, we specify the HTML5 canvas name and width and height. The majorVersion and minorVersion should be 1 and 0 because we are targetting WebGL 1.0. Next, we create the WebGL context and make it the current one. Like the above SDL 2 code, we initialize GLEW and OpenGL objects like vertices and shaders. We make this code active with __EMSCRIPTEN__ macro so that the code is visible during Emscripten build.

emscripten_set_canvas_element_size("#canvas", SCREEN_WIDTH, SCREEN_HEIGHT);
EmscriptenWebGLContextAttributes attr;
emscripten_webgl_init_context_attributes(&attr);
attr.alpha = attr.depth = attr.stencil = attr.antialias = 
    attr.preserveDrawingBuffer = attr.failIfMajorPerformanceCaveat = 0;
attr.enableExtensionsByDefault = 1;
attr.premultipliedAlpha = 0;
attr.majorVersion = 1;
attr.minorVersion = 0;
EMSCRIPTEN_WEBGL_CONTEXT_HANDLE ctx = emscripten_webgl_create_context("#canvas", &attr);
emscripten_webgl_make_context_current(ctx);

GLenum err = glewInit();
if (GLEW_OK != err)
{
    printf("GLEW init failed: %s!\n", glewGetErrorString(err));
    success = false;
}

//Initialize OpenGL
if (!initGL(userData))
{
    printf("Unable to initialize OpenGL!\n");
    success = false;
}

OpenGL Shader Precision

In OpenGL ES 2.0, we have to specify the floating point precision before shader code begins. highp, mediump and lowp are the available options. mediump is a nice tradeoff between precision and performance. For me, lowp is too low resolution to display image correctly. Insert the code below as the 1st line in your vertex and fragment shader only when compiling for Emscripten. Remember to remove the line when compiling for desktop.

"precision mediump float;     \n"

Inline your Shader Code

I recommend keeping shader code inline than storing in files so that in Emscripten, you need not download the shader to load them. There is 2 ways to inline the code: consecutive string literals or C++11 raw string literals. The former requires you to insert a newline at end of each line for readability. All the consecutive string literals will concatenate into the same string literal. The vertex and fragment shader below are using consecutive string literals.

const char vShaderStr [] =
    "precision mediump float;     \n"
    "uniform mat4 WorldViewProjection;\n"
    "attribute vec3 a_position;   \n"
    "attribute vec2 a_texCoord;   \n"
    "varying vec2 v_texCoord;     \n"
    "void main()                  \n"
    "{                            \n"
    "   gl_Position = WorldViewProjection * vec4(a_position, 1.0); \n"
    "   v_texCoord = a_texCoord;  \n"
    "}                            \n";

const char fShaderStr [] =
    "precision mediump float;     \n"
    "varying vec2 v_texCoord;                            \n"
    "uniform sampler2D s_texture;                        \n"
    "void main()                                         \n"
    "{                                                   \n"
    "  gl_FragColor = texture2D( s_texture, v_texCoord );\n"
    "}                                                   \n";

Load Asset

There are 2 ways to load the assets such as 3D model and image for texture. One is preload the files in a folder and specifies this location in Makefile. The other method is asynchronous download. Preloading is nice if your assets never changes in every single run of your application. Like game assets. I am doing a slideshow which changes according to the photo which user uploads. So I’ll use the asynchronous download. With emscripten_async_wget, 1st argument is the download URL, second is the destination filename, 3rd and 4th argument are load and error callback for successful and failed download event respectively. For the Emscripten, remember to change the below URL to your localhost and local port before build and to copy the assets to the web server.

#ifdef __EMSCRIPTEN__
    emscripten_async_wget("http://localhost:16564/yes.png", IMG_FILE, load_texture, load_error);
#endif

void load_texture(const char * file)
{
    gUserData.textureId = init_texture(file);
    ++gUserData.images_loaded;
}

void load_error(const char * file)
{
    printf("File download failed: %s", file);
}

In the Makefile, make sure to set these options for using OpenGL ES 2.0, asm.js, no memory initialization file, SDL 2 window and SDL 2 Image. You can specify -s WASM=1 for Webassembly but make sure your web server can serve wasm files. If not, consult your web server documentation on how to add MIME type for wasm.

-s FULL_ES2=1 -s WASM=0 --memory-init-file 0 -s USE_SDL=2 -s USE_SDL_IMAGE=2

When you run the accompanied source code, you should see this image moving forward and backward.
app_demo

The demo code is hosted at Github

C++: Size Matters in Platform Compatibility

Introduction

For file storage and data communication to work interoperably, the width of datatype must stay invariant across platforms. This tip discusses the pitfalls of platform-dependent data width and their solution. Endianess, deserving a tip of its own, is not covered here.

time_t

time_t stores the number of seconds since 1st January 1970. It is a 32-bit integer, on 32-bit Linux, where it can run up to year 2038, a Y2K equivalent crisis for Linux and otherwise it is 64-bit on 64-bit Linux. Whereas on modern Visual C++, time_t is 64-bit, no matter the x86 or x64 platform. time_t is not guaranteed to be interoperable between platforms, so it is best to store time as text and convert to time_t accordingly.

wchar_t

wchar_t type to hold the Unicode character is UTF-16 on Windows while UTF-32 on Linux/MacOS, therefore incompatible with each other. UTF-16 character can be 2 bytes or 4 bytes depending on its codepage while UTF-32 character is always 4 bytes which is a colossal waste of memory since most Unicode characters can be expressed in 2 bytes. UTF-8 is 1 byte for ASCII and multibyte for Unicode. For interoperability between Windows and other OSes, the solution is to store the text in UTF-8 and convert to wchar_t upon loading. Another solution is to use fixed-width character types such as char16_t or char32_t introduced in C++11.

Integer Types

size_t and its signed counterpart type, ptrdiff_t whose width varies on x86 or x64 platform, should always be avoided in storage and communication packet. Undetermined width type like long type should be avoided as well. Use the fixed width integer types introduced in C++11, such as uint32_t and int32_t.

Pointer Types

Pointer width varies according to x86 or x64 mode. Pointer sometimes are used as a opaque index/identity. Window SDK’s DWORD_PTR is one such example. Pointer derived identity can be temporarily stored in database, file storage or network packets due to distinctness of memory address. It poses a problem where a 64-bit value is sliced off in a 32-bit, say database column type, when the code is recompiled in x64 mode from the original x86 mode. If it has to be done, then use the largest pointer width as the data width. If not, it is best to derive your identity through other means like GUID or truly random number generation.

 

Bring Your Animations to H264/HEVC Video

 

Table of Contents

Introduction

Last year, I introduced a single header windows-based software-based video encoder for OpenGL that works on Windows 7 and above. See the above video demo! I have decoupled it from the OpenGL thread and make it simpler to encode a 2D frame. All you need is to fill in the frame buffer, frame by frame to create your animations. In this article, I use GDI+ since I am most familiar with it but you are welcome to use your favourite graphics library; The video encoder is not coupled with GDI+. HEVC codec used to come bundled with Windows 10 but now Microsoft has removed it and put it on sale in the Microsoft Store. That HEVC codec has a quality issue where higher bitrate has to be given to maintain the same quality as H264 encoded video. Make sure the video file is not opened or locked by video player before you begin to write to it. The new H264Writer constructor is as follows:

H264Writer(const wchar_t* mp3_file, const wchar_t* dest_file, VideoCodec codec, 
    int width, int height, int fps, int duration /*in milliseconds*/, 
    std::function<bool(int, int, int, int, UINT32*)> renderFunction,
    UINT32 bitrate = 4000000);

The mp3_file parameter is a MP3 file path (which can be empty if you do not want any audio) and dest_file parameter is the resultant video file. codec parameter can be either H264 or HEVC. The width and height parameters refer to the video width and height. fps parameter is the frames per second of the video which I usually specified as 30 or 60. duration parameter refers to the video duration in milliseconds which can be set as -1 to indicate the video duration to be the same as the MP3. renderFunction parameter is the render method to be called every frame. bitrate parameters refers to the video bitrate of bytes per second. Remember to set the bitrate higher for high resolution video and HEVC. The render function signature can be as follows. The width and height is the video dimension. fps is the frames per second while frame_cnt is the frame count which auto-increments itself on every frame. pixels parameter is the single dimensional array to be filled up with your bitmap data. The return value should be false for catastrophic error which encoding shall be stopped.

bool render(int width, int height, int fps, int frame_cnt, UINT32* pixels);

Red Video

For our first example, I keep it simple. We just render a red video.

red_video

This is the main function whereby H264Writer.h is included and H264Writer is instantiated and Process() is called to encode the video. Process() calls the given renderFunction() which is renderRedImage().

#include "../Common/H264Writer.h"

bool renderRedImage(int width, int height, int fps, int frame_cnt, UINT32* pixels);

int main()
{
    std::wstring musicFile(L"");
    std::wstring videoFile(L"C:\\temp\\RedVideo.mp4");

    std::function<bool(int, int, int, int, UINT32*)> renderFunction = renderRedImage;
    H264Writer writer(musicFile.c_str(), videoFile.c_str(), 
                    VideoCodec::H264, 640, 480, 30, 5000, renderFunction);
    if (writer.IsValid())
    {
        if (writer.Process())
        {
            printf("Video written successfully!\n");
            return 0;
        }
    }
    printf("Video write failed!\n");
}

Below is the renderRedImage() body. It only renders when frame_cnt is zero, meaning on the first frame because since pixels remains unchanged, there is no need to fill it up again on every frame.

// render a red image once!
bool renderRedImage(int width, int height, int fps, int frame_cnt, UINT32* pixels)
{
    if (frame_cnt == 0)
    {
        for (int col = 0; col < width; ++col)
        {
            for (int row = 0; row < height; ++row)
            {
                int index = row * width + col;
                pixels[index] = 0xffff0000;
            }
        }
    }
    return true;
}

Pixel format is in Alpha,Red,Green,Blue (ARGB) format. For example, you want a blue video, just change to pixels[index] = 0xff0000ff;

One JPEG Video

For our second example, we load a JPEG image with GDI+ and render once, that is when frame_cnt is zero.

yes_video

Because we are using GDI+ now, we have to include the Gdiplus.h header and its Gdiplus.lib, and also to initialize and destroy GDI+ with GdiplusStartup() and GdiplusShutdown respectively. Otherwise, the main function is unchanged, except the renderFunction method is set to renderJPG now.

#include "../Common/H264Writer.h"
#include <Gdiplus.h>
#pragma comment(lib, "gdiplus.lib")

bool renderJPG(int width, int height, int fps, int frame_cnt, UINT32* pixels);

int main()
{
    std::wstring musicFile(L"");
    std::wstring videoFile(L"C:\\temp\\JpgVideo.mp4");

    std::function<bool(int, int, int, int, UINT32*)> renderFunction = renderJPG;

    // Initialize GDI+ so that we can load the JPG
    Gdiplus::GdiplusStartupInput m_gdiplusStartupInput;
    ULONG_PTR m_gdiplusToken;

    Gdiplus::GdiplusStartup(&m_gdiplusToken, &m_gdiplusStartupInput, NULL);

    H264Writer writer(musicFile.c_str(), videoFile.c_str(), 
               VideoCodec::H264, 640, 480, 30, 10000, renderFunction);
    if (writer.IsValid())
    {
        if (writer.Process())
        {
            printf("Video written successfully!\n");
            Gdiplus::GdiplusShutdown(m_gdiplusToken);
            return 0;
        }
    }
    printf("Video write failed!\n");
    Gdiplus::GdiplusShutdown(m_gdiplusToken);
}

renderJPG() is straightforward for those developers familiar with GDI+. It loads the “yes.jpg” with the Bitmap class. bmp is the Bitmap with the same dimension as the video. We fill bmp with black color using FillRectangle(). Then we calculate the aspect ratio of the jpeg file and video frame. If w_ratio_jpg is greater than w_ratio_bmp, it means image is wider than video so you will see 2 horizontal black bars at the top and bottom of the video, otherwise you shall see 2 vertical black bars on the 2 sides of the video. In other words, we try to render the image as much as to cover the video while maintaining its original aspect ratio. To get bmp pixel pointer, we must call LockBits() and UnlockBits() afterwards after use. You notice in the double for loop, the image is rendered vertically upside down, so that it appears correctly in the video output.

// render a jpg once!
bool renderJPG(int width, int height, int fps, int frame_cnt, UINT32* pixels)
{
    using namespace Gdiplus;
    
    if (frame_cnt == 0)
    {
        Bitmap bmp(width, height, PixelFormat32bppARGB);
        Bitmap jpg(L"image\\yes.jpg", TRUE);
        Graphics g(&bmp);

        SolidBrush brush(Color::Black);
        g.FillRectangle(&brush, 0, 0, bmp.GetWidth(), bmp.GetHeight());

        float w_ratio_bmp = bmp.GetWidth() / (float)bmp.GetHeight();
        float w_ratio_jpg = jpg.GetWidth() / (float)jpg.GetHeight();

        if (w_ratio_jpg >= w_ratio_bmp)
        {
            int width2 = bmp.GetWidth();
            int height2 = (int)((bmp.GetWidth() / (float)jpg.GetWidth()) * jpg.GetHeight());
            g.DrawImage(&jpg, 0, (bmp.GetHeight() - height2) / 2, width2, height2);
        }
        else
        {
            int width2 = (int)((bmp.GetHeight() / (float)jpg.GetHeight()) * jpg.GetWidth());
            int height2 = bmp.GetHeight();
            g.DrawImage(&jpg, (bmp.GetWidth() - width2) / 2, 0, width2, height2);
        }

        BitmapData bitmapData;
        Rect rect(0, 0, bmp.GetWidth(), bmp.GetHeight());

        bmp.LockBits(
            &rect,
            ImageLockModeRead,
            PixelFormat32bppARGB,
            &bitmapData);

        UINT* pixelsSrc = (UINT*)bitmapData.Scan0;

        if (!pixelsSrc)
            return false;

        int stride = bitmapData.Stride >> 2;

        for (int col = 0; col < width; ++col)
        {
            for (int row = 0; row < height; ++row)
            {
                int indexSrc = (height-1-row) * stride + col;
                int index = row * width + col;
                pixels[index] = pixelsSrc[indexSrc];
            }
        }

        bmp.UnlockBits(&bitmapData);

    }
    return true;
}

Two JPEG Video

For the third example, we display first image and slowly alphablend with the second image until it appears. You can see the effect by looking at the video.

The main function is exactly the same as previous except renderFunction is set to render2JPG().

#include "../Common/H264Writer.h"
#include <Gdiplus.h>
#pragma comment(lib, "gdiplus.lib")

// render 2 jpg
bool render2JPG(int width, int height, int fps, int frame_cnt, UINT32* pixels);
inline UINT Alphablend(UINT dest, UINT source, BYTE nAlpha, BYTE nAlphaFinal);

int main()
{
    std::wstring musicFile(L"");
    std::wstring videoFile(L"C:\\temp\\TwoJpgVideo.mp4");

    std::function<bool(int, int, int, int, UINT32*)> renderFunction = render2JPG;

    // Initialize GDI+ so that we can load the JPG
    Gdiplus::GdiplusStartupInput m_gdiplusStartupInput;
    ULONG_PTR m_gdiplusToken;

    Gdiplus::GdiplusStartup(&m_gdiplusToken, &m_gdiplusStartupInput, NULL);

    H264Writer writer(musicFile.c_str(), videoFile.c_str(), 
                      VideoCodec::H264, 640, 480, 30, 3000, renderFunction);
    if (writer.IsValid())
    {
        if (writer.Process())
        {
            printf("Video written successfully!\n");
            Gdiplus::GdiplusShutdown(m_gdiplusToken);
            return 0;
        }
    }
    printf("Video write failed!\n");
    Gdiplus::GdiplusShutdown(m_gdiplusToken);
}

render2JPG is almost similar to renderJPG, except it loads 2 jpeg with the Bitmap class. The transparency stored in alpha variable is zero(total transparent) and 255(total opaque) when the duration is less or equal to 1000 milliseconds and is more or equal to 2000 milliseconds respectively. Between duration of 1000 and 2000 milliseconds, the alpha is calculated. A little note about the frame_duration = 1000 / fps: it is imprecise because it is in integer. For example, when the fps is 30: 1000/30 gives 33 millseconds but 30 * 33 only yields 990 millseconds, not the original 1000 milliseconds. Just to warn you, render2JPG() can take a long time because it is opening the two JPEG file and rendering on every frame, unlike the previous 2 examples which only render once on the first frame.

// render 2 jpg
// This function takes a long time.
bool render2JPG(int width, int height, int fps, int frame_cnt, UINT32* pixels)
{
    using namespace Gdiplus;

    Bitmap bmp(width, height, PixelFormat32bppARGB);
    Bitmap bmp2(width, height, PixelFormat32bppARGB);
    // Warning JPG1 and JPG2 must have the same dimensions
    Bitmap jpg1(L"image\\first.jpg", TRUE);
    Bitmap jpg2(L"image\\second.jpg", TRUE);
    Graphics g(&bmp);
    Graphics g2(&bmp2);

    BYTE alpha = 0;
    int frame_duration = 1000 / fps;
    if (frame_cnt * frame_duration <= 1000)
        alpha = 0;
    else if (frame_cnt * frame_duration >= 2000)
        alpha = 255;
    else
        alpha = ((frame_cnt * frame_duration) - 1000) * 255 / 1000;

    float w_ratio_bmp = bmp.GetWidth() / (float)bmp.GetHeight();
    float w_ratio_jpg = jpg1.GetWidth() / (float)jpg1.GetHeight();

    SolidBrush brush(Color::Black);
    g.FillRectangle(&brush, 0, 0, bmp.GetWidth(), bmp.GetHeight());

    if (w_ratio_jpg >= w_ratio_bmp)
    {
        int width2 = bmp.GetWidth();
        int height2 = (int)((bmp.GetWidth() / (float)jpg1.GetWidth()) * jpg1.GetHeight());
        g.DrawImage(&jpg1, 0, (bmp.GetHeight() - height2) / 2, width2, height2);
        g2.DrawImage(&jpg2, 0, (bmp2.GetHeight() - height2) / 2, width2, height2);
    }
    else
    {
        int width2 = (int)((bmp.GetHeight() / (float)jpg1.GetHeight()) * jpg1.GetWidth());
        int height2 = bmp.GetHeight();
        g.DrawImage(&jpg1, (bmp.GetWidth() - width2) / 2, 0, width2, height2);
        g2.DrawImage(&jpg2, (bmp2.GetWidth() - width2) / 2, 0, width2, height2);
    }

    BitmapData bitmapData;
    BitmapData bitmapData2;
    Rect rect(0, 0, bmp.GetWidth(), bmp.GetHeight());

    bmp.LockBits(
        &rect,
        ImageLockModeRead,
        PixelFormat32bppARGB,
        &bitmapData);

    bmp2.LockBits(
        &rect,
        ImageLockModeRead,
        PixelFormat32bppARGB,
        &bitmapData2);

    UINT* pixelsSrc = (UINT*)bitmapData.Scan0;
    UINT* pixelsSrc2 = (UINT*)bitmapData2.Scan0;

    if (!pixelsSrc || !pixelsSrc2)
        return false;

    int stride = bitmapData.Stride >> 2;

    for (int col = 0; col < width; ++col)
    {
        for (int row = 0; row < height; ++row)
        {
            int indexSrc = (height - 1 - row) * stride + col;
            int index = row * width + col;
            pixels[index] = Alphablend(pixelsSrc2[indexSrc], pixelsSrc[indexSrc], alpha, 0xff);
        }
    }

    bmp.UnlockBits(&bitmapData);
    bmp2.UnlockBits(&bitmapData2);

    return true;
}

pixels[index] is determined by the Alphablend() function below:

inline UINT Alphablend(UINT dest, UINT source, BYTE nAlpha, BYTE nAlphaFinal)
{
    BYTE nInvAlpha = ~nAlpha;

    BYTE nSrcRed = (source & 0xff0000) >> 16;
    BYTE nSrcGreen = (source & 0xff00) >> 8;
    BYTE nSrcBlue = (source & 0xff);

    BYTE nDestRed = (dest & 0xff0000) >> 16;
    BYTE nDestGreen = (dest & 0xff00) >> 8;
    BYTE nDestBlue = (dest & 0xff);

    BYTE nRed = (nSrcRed * nAlpha + nDestRed * nInvAlpha) >> 8;
    BYTE nGreen = (nSrcGreen * nAlpha + nDestGreen * nInvAlpha) >> 8;
    BYTE nBlue = (nSrcBlue * nAlpha + nDestBlue * nInvAlpha) >> 8;

    return nAlphaFinal << 24 | nRed << 16 | nGreen << 8 | nBlue;
}

Text Animation Video

For our last example, we show the 2 prerendered text images appearing from the middle of the video. See the video below for example.

Because main function essentially remains unchanged, I shall only show the renderFunction named renderText(). There is a thin white rectangle expanding progressively. renderbmp variable is the one where top half of bmp and bottom half of bmp2 are shown. bmp is rendered with jpg1 progressively moving up while bmp2 is rendered with jpg2 progressively moving down. The jpg1 and jpg2 are misnomers since the image loaded are actually PNGs. Bitmap class can load both JPEG and PNG. JPEG is best for storing photographs while PNG is for storing illustrations.

// render text
bool renderText(int width, int height, int fps, int frame_cnt, UINT32* pixels)
{
    using namespace Gdiplus;

    Bitmap renderbmp(width, height, PixelFormat32bppARGB);

    Bitmap bmp(width, height, PixelFormat32bppARGB);
    Bitmap bmp2(width, height, PixelFormat32bppARGB);
    Bitmap jpg1(L"image\\Mandy.png", TRUE);
    Bitmap jpg2(L"image\\Frenzy.png", TRUE);
    Graphics render_g(&renderbmp);

    Graphics g(&bmp);
    Graphics g2(&bmp2);

    float rectProgress = 0.0f;
    float textProgress = 0.0f;
    float frame_duration = 1000.0f / fps;
    float total_duration = frame_cnt * frame_duration;

    SolidBrush brush(Color::Black);
    render_g.FillRectangle(&brush, 0, 0, width, height);
    g.FillRectangle(&brush, 0, 0, width, height);

    int rectHeight = 4;

    int rectWidth = (int)(width * 0.8f);
    if (total_duration >= 1000.0f)
        rectProgress = 1.0f;
    else
        rectProgress = total_duration / 1000.0f;


    if (total_duration >= 2000.0f)
        textProgress = 1.0f;
    else if (total_duration <= 1000.0f)
        textProgress = 0.0f;
    else
        textProgress = (total_duration - 1000.0f) / 1000.0f;

    g.DrawImage(&jpg1, (width - jpg1.GetWidth()) / 2, 
    (height / 2) - (int)(jpg1.GetHeight() * textProgress), jpg1.GetWidth(), jpg1.GetHeight());
    g.FillRectangle(&brush, 0, height / 2 - 4, width, height / 2 + 4);
    render_g.DrawImage(&bmp, 0, 0, width, height);

    g2.DrawImage(&jpg2, (width - jpg2.GetWidth()) / 2, 
    (int)((height / 2 - jpg2.GetHeight()) + (int)(jpg2.GetHeight() * textProgress)), 
    jpg2.GetWidth(), jpg2.GetHeight());
    g2.FillRectangle(&brush, 0, 0, width, height / 2 + 4);
    render_g.DrawImage(&bmp2, 0, height / 2 + 4, 0, height / 2 + 4, 
                       width, height / 2 - 4, Gdiplus::UnitPixel);

    SolidBrush whitebrush(Color::White);
    int start_x = (width - (int)(rectWidth * rectProgress)) / 2;
    int pwidth = (int)(rectWidth * rectProgress);
    render_g.FillRectangle(&whitebrush, start_x, 
                          (height - rectHeight) / 2, pwidth, rectHeight);

    BitmapData bitmapData;
    Rect rect(0, 0, width, height);

    renderbmp.LockBits(
        &rect,
        ImageLockModeRead,
        PixelFormat32bppARGB,
        &bitmapData);

    UINT* pixelsSrc = (UINT*)bitmapData.Scan0;

    if (!pixelsSrc)
        return false;

    int stride = bitmapData.Stride >> 2;

    for (int col = 0; col < width; ++col)
    {
        for (int row = 0; row < height; ++row)
        {
            int indexSrc = (height - 1 - row) * stride + col;
            int index = row * width + col;
            pixels[index] = pixelsSrc[indexSrc];
        }
    }

    renderbmp.UnlockBits(&bitmapData);

    return true;
}

The code is hosted at GitHub. Remember to copy the image folder to Debug or Release folder before running the executable. Have fun with converting your cool animations to H264/HEVC video to share with others and keepsake for posterity!

What Web Developers Need to Know About Content Security Policy

Introduction

Content Security Policy (CSP) is a computer security standard introduced by the World Wide Web Consortium (W3C) to prevent cross-site scripting (XSS) and clickjacking attacks. Explained simply, CSP is a whitelist of origins of content that is allowed to load or execute on a webpage. We’ll look at the three versions of CSP and the relevant features of each, though it’s important to note CSP Level 3 is not yet ratified as a W3C recommendation and is still a working draft in progress. It is still subject to change from time to time before its standardization. As we go along, the differences between these versions will be pointed out to you.

What is Cross-Site Scripting?

Cross-Site Scripting (XSS) attacks are a type of code injection, in which malicious scripts are injected into trusted websites. A good example could occur on an ecommerce site: a buyer posts a product review with malicious code that is saved on the server. For every customer who views the product review, malicious code gets executed.

CSP in Action

CSP can be specified in an HTTP response header. When a web client, like a web browser, requests a resource from web server, it sends an HTTP request with a bunch of information in a request header for the server. If the request is successful, the web server then replies back with the resource together with a response header telling the web browser how to handle the response. In the case of CSP, it is specifying what those trusted sources are to fetch the web page content from. On CSP 2 capable browsers, we have an additional option of specifying the CSP in an HTML meta tag. For our examples, this is exactly what we are going to use; we take a web framework agnostic approach to keep things simple. All you need to follow the examples is a text editor and modern web browser.

Anatomy of CSP

CSP begins with Content-Security-Policy text, which is followed by one or more directives. Each directive ends with a semicolon, which can be the beginning of the next directive. Each directive could have zero or multiple values. The values are separated by whitespace. More often than not, the value is simply a trusted source URI.

Content-Security-Policy [directive] <value>;

This is an example of a one-directive CSP. The default-src directive with a ‘self’ value instructs the web browser to only trust content from the same origin as the webpage.

Content-Security-Policy default-src 'self';

The equivalent CSP in a meta tag is shown below:

<meta http-equiv="Content-Security-Policy" content="default-src 'self';>

Take note that the meta tag has to be specified within the head section, not the body section of the HTML. One big downside a developer has to be cautious of: with the meta tag approach, CSP rules are not enforced until the meta tag is read and processed.

This is the HTML that loads the image from CodeProject without CSP. You can copy and paste the code in an empty HTML file and save it locally.

<html>
<head>
<title>CSP
in Action</title>
<head>
<body>
<p><img
src="https://www.codeproject.com/App_Themes/CodeProject/Img/logo250x135.gif"
/>  </p>
</body>
</html>

View the HTML on the browser by double-clicking the file on the File Explorer, the image is downloaded and displayed from CodeProject.

image001

Let’s add a CSP meta tag.

<html>
<head>
<meta
http-equiv="Content-Security-Policy" content="default-src
'self';">
<title>CSP
in Action</title>
<head>
<body>
<p><img
src="https://www.codeproject.com/App_Themes/CodeProject/Img/logo250x135.gif"
/>  </p>
</body>
</html>

Now try viewing the page in a browser:

image002

Bam! Now the broken image is shown to indicate the image is not fetched because http://www.codeproject.com is not the same origin domain. Hit F12 on the web browser to open developer tool and navigate to console tab. On Chrome, it shows this error in red.

Refused to load the image ‘https://www.codeproject.com/App_Themes/CodeProject/Img/logo250x135.gif&#8217; because it violates the following Content Security Policy directive: “default-src ‘self'”. Note that ‘img-src’ was not explicitly set, so ‘default-src’ is used as a fallback.

What we have effectively done with the default-src directive is to restrict all the content to the same origin with the ‘self’ keyword, as explained previously.

Let’s append default-src with a whitespace and followed by the CodeProject URI. For simplicity, I just show the updated meta tag as the rest of HTML remains unchanged.

<meta http-equiv="Content-Security-Policy" content="default-src 'self'
https://www.codeproject.com;">

View the page again. Now the CodeProject image is shown. Note: The self keyword has to be enclosed in single quotes while the URI is not required to be.

image001

View the HTML on browser. Now the image is back. Since the gif is an image resource, let’s do some refactoring and put CodeProject URI under the img-src directive.

<meta http-equiv="Content-Security-Policy" content="default-src 'self'; img-src 
https://www.codeproject.com;">

image001

View the page on browser again. The image still appears. Prior to that, img-src is not specified. What is its value then? The answer is, when not specified, it inherits from default-src. Note: If your URI redirects to a URI on another domain, that domain has to be in the CSP as well.

CSP Directives

CSP directives mostly cover the content type whose source(s) can be specified. This article covers most of the directives. All the directives that fall back to the default-src are shown on the hierarchy below.

image003

  • default-src: Is a main fallback for the other fetch directives when they are not explicitly specified
  • child-src: Lists the trusted sources for web workers and nested browsing contexts loaded using elements such as <frame> and <iframe>. This directive is deprecated in CSP 3. Instead of child-src, to list trusted source for nested browsing contexts and workers, the frame-src and worker-src directives should be used respectively.
  • script-src: Lists trusted sources for JavaScript
  • object-src: Lists trusted sources for the <object>, , and <applet> elements
  • style-src: Lists trusted sources for stylesheets (CSS)
  • img-src: Lists trusted sources of images and favicons
  • media-src: Lists trusted sources for loading media using the <audio>, <video> and <track> elements
  • frame-src: Lists trusted sources for nested browsing contexts loading using elements such as <frame> and <iframe>
  • font-src: Lists trusted sources for fonts loaded using @font-face
  • connect-src: Limits the URLs which can be loaded using script interfaces. Script interfaces include <a> ping, Fetch, XMLHttpRequest, WebSocket and EventSource
  • worker-src: Lists trusted sources for Worker, SharedWorker, or ServiceWorker scripts
  • base-uri: Limits the URLs which can be used in a document’s <base> element
  • plugin-types: Limits the set of plugins that can be embedded into a document by limiting the types of resources which can be loaded. For example, to allow Flash, specify its mime type: application/x-shockwave-flash in this directive
  • sandbox: Put the resource under a sandbox similar to the <iframe> sandbox attribute.
  • form-action: Limits the URLs which can be used as the target of a form submissions from a given context
  • frame-ancestors: Limits valid parents that may embed a page using <frame>, <iframe>, <object>, , or <applet>
  • report-uri: List URL for the web browser to report the Content Security Policy violation. These violation reports consist of JSON documents sent via an HTTP POST request to the specified URI. Deprecated in CSP 3, but still widely supported
  • report-to: report-uri (mentioned above) has been renamed to report-to and report-uri is deprecated in CSP 3. However, at the time of article writing, not a single browser supports report-to. It is perfectly fine to specify both report-uri and report-to to future-proof CSP
  • block-all-mixed-content: Forbids loading any assets using HTTP when the page is loaded using HTTPS
  • upgrade-insecure-requests: Instructs web browser to treat all of a site’s insecure URLs (those served over HTTP) as though they have been replaced with secure URLs (those served over HTTPS). This directive is intended for web sites with large numbers of insecure legacy URLs that need to be rewritten.
  • require-sri-for: Requires the use of Subresource Integrity (SRI) for external scripts or styles on the page

CSP Values

Each directive follows by one or more values separated by whitespace. The acceptable value types are in two main categories: keywords and URI.

All keyword, except wildcard, must be enclosed in single quotes:

  • self’: Restrict source to same origin
  • none’: No source is allowed
  • *: wildcard
  • unsafe-inline’: Allows the inline JavaScript code and stylesheet
  • unsafe-eval’: Allows dynamic JavaScript through eval()

One very common reason to specify external trusted source URI other than the same origin is the need to support loading resource from a Content Delivery Network (CDN), a geographically distributed network of proxy servers that store commonly downloaded content.

URIs must not be enclosed in single quotes!

In CSP 1, only the scheme (http or https), domain and port number are allowed in the URI.

https://example.com:80/

Whereas in CSP 2, subdomains and paths are allowed. This URI allows all files in the js folder:

https://example.com:80/js/

This URI treats js as file, not a folder, as it is not ended with a forward slash:

https://example.com:80/js

To allow all subdomains, use an asterisk as a wildcard.

https://*. example.com:80/

unsafe-inline

Sometimes, the webpage has come with some inline JavaScript or stylesheet and for enormous amount of work involved, it is not feasible to externalize them in a separate file. This is where unsafe-inline comes into the picture. For this example, we have an HTML that displays time periodically.

<html>

<head>
<title>unsafe-line in Action</title>
<head>

<body>

<p id="time"></p>


function displayTime()
{
    var d = new Date();
    var n = d.toLocaleTimeString();
    document.getElementById('time').innerHTML = n;
    setTimeout(function () {
            displayTime()
        }, 500);
}
displayTime();


</body>

</html>

Copy the HTML and save it in an HTML file. And open to view that HTML on web browser. We can see that time is displayed. Your time, most likely, is different from mine. Let’s add a CSP <meta> tag in the <head> section.

6:32:32 PM

<meta http-equiv="Content-Security-Policy" content="default-src 'self';">

Bam! Time does not display and now we have an error. This is the error I got on Chrome.

Refused to execute inline script because it violates the following Content Security Policy directive: “default-src ‘self'”. Either the ‘unsafe-inline’ keyword, a hash (‘sha256-TVjy1frkE+v+8vB4X884wNJ7xy5bKc32l3WYqLZZ44o=’), or a nonce (‘nonce-…’) is required to enable inline execution. Note also that ‘script-src’ was not explicitly set, so ‘default-src’ is used as a fallback.

Let’s enable our inline code with unsafe-inline.

<meta http-equiv="Content-Security-Policy" content="default-src 'self'; 
      script-src 'unsafe-inline';">

This time around, the HTML displays the time.

6:35:32 PM

Nonce and Hash to the Rescue

unsafe-inline is an all or nothing solution which leaves much to be desired. When unsafe-inline is enabled, there is a risk that we are also enabling maliciously injected code.

nonce and hashing are introduced in CSP 2 to address this gaping security hole exposed by unsafe-inline. How they work, is they are enabling JavaScript or CSS section with the same nonce value or correct cryptographic hash to execute. Nonce and hash have to be enclosed in single quotes. Remember nonce is used-only-once base64 encoded number that needs to be updated on every page fetch. As long as the nonce in CSP and script/style section matches, the JavaScript or CSS is allowed. Below are the script-src nonce and style-src nonce examples.

<meta http-equiv="Content-Security-Policy" content="default-src 'self'; 
      script-src 'nonce-2726c7f26c';">


// code remains unchanged, so it is not shown.
<meta http-equiv="Content-Security-Policy" content="default-src 'self'; 
      style-src 'nonce-5823c7f85c';">

<style nonce="5823c7f85c">
// CSS code not shown
</style>

Cryptographic hashing works by calculating the cryptographic message digest of inline code inclusive of their whitespaces and then encoded the hash in base64 format. For a Chrome user, you are lucky because Chrome calculates this hash for you when showing the error on the developer console. I reproduce the above error here again.

Refused to execute inline script because it violates the following Content Security Policy directive: “default-src ‘self'”. Either the ‘unsafe-inline’ keyword, a hash (‘sha256-TVjy1frkE+v+8vB4X884wNJ7xy5bKc32l3WYqLZZ44o=’), or a nonce (‘nonce-…’) is required to enable inline execution. Note also that ‘script-src’ was not explicitly set, so ‘default-src’ is used as a fallback.

All you need to do is to fix the error is to copy SHA256 hash to the script-src directive.

<meta http-equiv="Content-Security-Policy" content="default-src 'self'; 
      script-src 'sha256-TVjy1frkE+v+8vB4X884wNJ7xy5bKc32l3WYqLZZ44o=';">


// code remains unchanged, so it is not shown.

The same concept works for style section.

<meta http-equiv="Content-Security-Policy" content="default-src 'self'; 
      style-src 'sha256-pkvqLyskjufPOv5VOGnLcoqyD2oDwsfaPxxvXCQdq9Y=';">

<style>
// CSS code not shown
</style>

Viola, the error is gone and time display is back!

Nonce versus Cryptographic Hash

Given the choice between nonce and cryptographic hash, what would be the preferred approach? For the latter, hash has to be recalculated whenever the code is updated while the former requires a carefully-designed random nonce generation policy to ensure the nonce is not easily guessable.

Cryptographic Hashing for External JS and CSS

The same cryptographic hashing approach can be done for external JavaScript and CSS file with Subresource Integrity (SRI). Subresource Integrity is a security feature that enables browsers to verify that fetched resources (for example, from a CDN) are delivered without modification. To use SRI, just compute hash and encode the hash in base64 format and add in under integrity attribute of the script or style tag. And remember to enable the require-sri-for directive for JS or CSS respectively as shown below:

Content-Security-Policy: require-sri-for script;

https://mysite.com/example.js
Content-Security-Policy: require-sri-for style;

<link href="https://mysite.com/example.css" rel="stylesheet" type="text/css" 
        integrity="sha256-tbqu6h2Qu6rhJtNtkUI6XbYtkzEby9zQFP4DlGIqYdQ="
        crossorigin="anonymous">

When the script or stylesheet doesn’t match its integrity value, the browser shall refuse to execute the script or apply the stylesheet.

unsafe-eval

Sometimes, a legacy library cannot be easily modified and is using eval() to dynamically generate JavaScript code. In this case, the resolution is either if feasible, a library replacement or, as a last resort, allowing of dynamic JavaScript code through unsafe-eval keyword.

Clickjacking Prevention

Clickjacking is a malicious technique of tricking a user into clicking on something different (usually invisible) from what the user can see. For instance, a web page is overlapped with an iframe whose opacity set to zero, when the user clicks a legitimate link, unbeknownst to him, he is clicking a link or button on that invisible iframe. CSP 2 introduces frame-ancestors directive to whitelist URL(s) that is permitted to embed your webpage.

Upgrade Requests from HTTP to HTTPS

By setting the upgrade-insecure-requests directive, web browser is instructed to fetch all resources using HTTPS scheme. Another directive, block-all-mixed-content forbids loading any assets using HTTP when the page is loaded using HTTPS. In practice, you only need to set either upgrade-insecure-requests or block-all-mixed-content but not both.

Zero Risk CSP: Report-Only

There is an inherent risk in CSP whitelisting approach where a legitimate source of content is overlooked and omitted in CSP, causing some functionality to break. This is simply unacceptable. In CSP 2, enforcement can be turned off and switched to report-only mode by renaming Content-Security-Policy to Content-Security-Policy-Report-Only and remember to add report-uri and report-to directive for report destination. Note that report-uri and report-to can also be added to normal violation blocking Content-Security-Policy as well.

Why is there a need to specify 2 directives that point to the same report destination? To keep the long story short, report-uri has been renamed to report-to in CSP 3 but at the time of article writing, no web browser supports report-to directive yet. To future-proof your CSP, it is better to specify report-to in addition to report-uri. A point for developer to note is these report directives are not supported in the <meta> element, meaning it has to be specified in the CSP response header. Violation report is sent in JSON format by HTTP POST method. Whenever there is a violation report, it could mean one of the two things, a trusted source is not whitelisted or the webpage is having XSS attacks.

 

 

Bring Your C++ Code to the Web

 

Table of Contents

Introduction

WebAssembly’s predecessor, asm.js converts C/C++, Rust code into a low level JavaScript to run on the web browser. The unveiling of WebAssembly on March 2017 has enabled code, native or managed alike, compiled down to a binary instruction format for a stack-based virtual machine making possible performance improvement from modest 20% to 600% over JavaScript, undreamed-of feat a decade ago. Today’s article provides step-by-step instructions on installing the Emscripten toolset and having the “Hello World” program running on web browser of our choice in less than 20 minutes. Reader’s timing may vary depending on his internet speed/quality.

Installing Windows Subsystem For Linux

In order to use the Emscripten, we need to install Windows Subsystem For Linux (WSL), a feature available only on Windows 10, so that rules out older Windows version. If you are an apt Linux user, you are welcome to use your favourite Linux distribution. You can also download those Emscripten windows installer if you do not want to go through the hassle of installing a Linux OS. Those windows installer have never worked for me but that was a few years ago, it is possible the situation has improved. No harm trying the windows installer before going the Linux route. Latest windows installers usually are few versions behind the latest and greatest version. It might be okay if you are not those who live on the bleeding edge of technology.

Why WSL?

Other virtual machine options includes Oracle’s VirtualBox and Microsoft’s Hyper-V. VirtualBox only supports 32-bit guest OS. Emscripten tool can only be built with more than 4GB RAM, more accurately for linkage, not compilation. More RAM than 4GB mean 64-bit OS, so that excludes VirtualBox. Hyper-V has 64-bit guest support but only comes with Windows 10 Pro. For home users, they typically has Windows Home edition. WSL is the most attractive choice for our case.

Enabling WSL on Windows 10

Before we can install Ubuntu from Microsoft, we must first enable Developer’s mode and WSL. To enable Developer mode, head to Settings > Update & Security > For Developers and select “Developer mode”.

dev_mode

This can also be accomplished via Powershell. Launch PowerShell in administrative mode. And type the following and hit ‘Enter‘ key.

Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux

Next, WSL has to be enabled on Windows Features. In the Windows searchbar, type “Turn Windows Features On or Off” and select that option. Scroll all the way down and check the “Windows Subsystem For Linux” option as shown.

enable_wsl

Installing Ubuntu 18.04

Next, launch Microsoft Store by clicking its button on the taskbar. Search for “Ubuntu” and click “Install” on the Ubuntu 18.04 in the search result. It takes about 5 minutes.

win_store_button

Ubuntu First Launch Important Notes

After Ubuntu is installed, Windows ask you to launch it. Stop! Do not touch the Launch button! You have to launch Ubuntu with administrative rights by right-clicking it and click “Run as administrator“. After this message, “Installing, this may take a few minutes…” appears and disappears, Ubuntu prompts you for username and password during the 1st launch. Now we can start installing Emscripten.

Installing Emscripten

Updating Ubuntu Packages and Installing Python 2.7

Since this is the fresh Ubuntu installation, we have to update the packages prior to installing Python. Run these 3 commands. It may take quite a while but do not cancel midway.

sudo apt update
sudo apt upgrade
sudo apt install python2.7 python-pip

You may have to install Git. Run the command below:

sudo apt-get install git-core

Next, check out Emscripten from the GitHub.

git clone https://github.com/emscripten-core/emsdk.git

Finally, we are ready for Emscripten installation. Run the commands below to download and install a precompiled toolchain.

# change to the newly cloned emsdk directory.
cd emsdk

# Download and install the latest SDK tools.
./emsdk install latest

# Set up the compiler configuration to point to the "latest" SDK.
./emsdk activate latest

# Activate PATH and other environment variables in the current terminal
source ./emsdk_env.sh

For the last step, it has to be called every Ubuntu startup in order to set the environment variables and path before you utilize the Emscripten toolset.

Compiling the Toolchain for Unsupported Linux Distributions or Just for Fun

For those readers with unsupported Linux distributions, you can build the Emscripten toolchain with these commands. It can take up to 3 hours to build depending on your storage type like SSD or HDD, number of CPU cores and amount of RAM.

git clone https://github.com/emscripten-core/emsdk.git
cd emsdk
./emsdk install --build=Release sdk-incoming-64bit binaryen-master-64bit
./emsdk activate --build=Release sdk-incoming-64bit binaryen-master-64bit

Hello World Programs

In this section, we build a C “hello world” program followed with the C++ version and compare their output file size.

Hello World C Program

This is the C source we use. There is a quirk with this program I have to let you know. The printf() format string must end with a newline that serves a flag to flush the console output, else you get no output.

#include <stdio.h>

int main() 
{
  printf("Hello, world!\n");
}

We saved the source in hello.c. This is the command to build hello.c. It takes about 5 minutes as it builds the required C static libs with file extension of bc. Subsequent builds should be snappy.

emcc hello.c -s WASM=1 -o hello.html

The output of the compilation are hello.html, hello.js and lastly the hello.wasm which is the Webassembly file.

Hello World C++ Program

#include <iostream>

int main() 
{
  std::cout << "Hello, world!" << std::endl;
}

We saved the C++ source in hello2.cpp. The command to build hello2.cpp. As with C version, it takes awhile to build the C++ static libs.

emcc hello2.cpp -s WASM=1 -o hello2.html

The output of the compilation are hello2.html, hello2.js and lastly the hello2.wasm. Single file compilation is fine for one file. For building multiple source files with make command, just open your existing Makefile in text editor and replace all the “gcc” or “g++” occurrences with “emcc”. The C++ output files are 400KB more than C ones due to template bloat brought in merely by including iostream header.

file_size

To view the HTML, open up Visual Studio 2019 and create an ASP.NET project and add the just mentioned HTML, JavaScript and wasm files into the newly created project. Right click on the hello.html to view in the browser. Do the same thing for hello2.html. Visual Studio 2017 development web server has problems serving out a wasm file. Use VS2019 or other web server. For other web server, you may have to add the MIME type for Webassembly (Content-Type=application/wasm) to its configuration file if it hasn’t been aded already. The exact method of doing this differs from each web server, be sure to check the manual or instruction guide. If you do not wish to set the MIME type and want to see the HTML output, try setting WASM=0 during emcc build to generate asm.js code. asm.js file type is legitimate JavaScript file that all web servers have no problem serving. asm.js is in maintenance mode, meaning all the new shiny features shall only come to Webassembly. For the next 2 sections, we explore C/C++ interaction with JavaScript.

Calling JavaScript Code from C/C++

To call JavaScript from C++, put your JavaScript code within EM_ASM(). Since EM_ASM() isn’t legal C++ code, you have to guard the code with a macro to stop C++ compiler from parsing this. In my case, I declare __EMSCRIPTEN__ in my Emscripten Makefile. The example below find a WebAudio element named MyMusic and call its play method to play the MP3.

#ifdef __EMSCRIPTEN__
EM_ASM(
    document.getElementById("MyMusic").play(); 
);
#endif

To pass some arguments to the EM_ASM JavaScript snippets, call the EM_ASM_ with a underscore suffix with your arguments: $0 is the placeholder for first argument and $1 is the second and so on.

EM_ASM_({
    console.log('I received: ' + $0);
}, 100);

To return a integer or double value from JavaScript snippet, call EM_ASM_INT or EM_ASM_DOUBLE.

int x = EM_ASM_INT({
    console.log('I received: ' + $0);
    return $0 + 1;
}, 100);
printf("%d\n", x);

Calling C/C++ Code from JavaScript

In order for JavaScript to call the C function, you have to export the C function during compilation and call Module.cwrap() in JavaScript to put it in a callable wrapper. Below is the signature of the gen_enum_conv. Since all C++ compilers mangle/change the function names: to avoid that so that the function name remained intact for JavaScript to find it, we must declare extern "C" before the function signature.

extern "C" const char* gen_enum_conv(const char* cs);

This is the Makefile with exported function of main and gen_enum_conv: both names are preceded with underscore which is a naming convention.

CC=emcc
SOURCES:=~/EnumStrConv.cpp
SOURCES+=~/ParseEnum.cpp
LDFLAGS=-O2 --llvm-opts 2
OUTPUT=~/EnumConvGen.html
EMCC_DEBUG=1

all: $(SOURCES) $(OUTPUT)

$(OUTPUT): $(SOURCES) 
	$(CC) $(SOURCES) --bind -s NO_EXIT_RUNTIME=1 -s 
    EXPORTED_FUNCTIONS="['_main', '_gen_enum_conv']" -s 
    ALLOW_MEMORY_GROWTH=1 -s DEMANGLE_SUPPORT=1 -s ASSERTIONS=1 
                      -D__EMSCRIPTEN__ -std=c++11 $(LDFLAGS) -o $(OUTPUT)

clean:
	rm $(OUTPUT)

The JavaScript code to wrap this function and call with a button click is shown below. To peruse more of the code, please go to EnumConvGen GitHub. EnumConvGen is a C++ project to generate C++ enum to string conversion functions and vice-versa.

var gen_enum_conv_func;
function btnClick()
{
    var input_str = document.getElementById("InputTextArea").value;
    document.getElementById("OutputTextArea").value = gen_enum_conv_func(input_str);
}
$( document ).ready(function() {
    gen_enum_conv_func = Module.cwrap('gen_enum_conv', 'string', ['string']);
    $('[name="GenButton"]').click(btnClick);
});

What About Calling C++ Member Function?

What I have just shown you is calling C function. Then what about calling C++ member function from JavaScript? You have to use embind to accomplish that. Refer to its documentation on how to do that. For myself, I do not use embind because of its complexity. What I usually do, is I encapsulate all my C++ calls inside the C function body. I haven’t encountered a situation that specifically calls for me to use embind to do what I need to do.

Cross Platform

Emscripten is restricted to calling portable functions from C++ Standard Library and the emscripten ported C/C++ libraries (see list below). OS specific functions like win32 and functions with assembly code are out of question. The list of libraries mainly in areas of graphics, audio, network and font, as you can see, are geared towards enabling game programming.

  • SDL2
  • regal
  • HarfBuzz
  • SDL2_mixer
  • SDL2_image
  • Cocos2d
  • FreeType
  • asio
  • SDL2_net
  • SDL2_ttf
  • Vorbis
  • Ogg
  • Bullet
  • libpng
  • zlib

That’s all, folks! Stay tuned for second installment of the “Bring your XXX” article series! Meanwhile, have fun with Webassembly!

C++23: fullptr to replace nullptr

Coming in C++ 2023, detailed in the AF0401 proposal, fullptr_t is fully recommended as the invalid pointer type to replace current nullptr_t which first standardized in C++11.

nullptr is defined as a pointer value with all its bits set to zeroes while fullptr has its bits set to ones (the address of the last addressable byte). nullptr is an invalid unused address at address zero and any memory access to this particular address cause a running process to die from a segmentation fault. This poses unused memory wastage as memory allocation typically starts from a few million addresses above zero. In a process running in the 32bit operating system, entire of addressable 4GB of memory cannot be fully utilized because of this limitation. And adding up all the underutilized memory from every process running in the system, the total memory is sizable.

This is where fullptr comes into the picture: fullptr is used for invalid pointer checks but is also fully addressable. Let me gives you an example, assuming on 32bit platform, fullptr is defined to be at 0xFFFFFFFF, if a integer is located at address 0xFFFFFFFC, then all its 4 bytes would occupy from 0xFFFFFFFC to 0xFFFFFFFF. Since fullptr is fully addressable, it is okay to read/write the integer.

int* a = new int;

a can be accessed when its address is at 0xFFFFFFFC;

char* b = new char[4];

To illustrate my point of fully addressable fullptr, b[3] can be accessed when b’s address is at 0xFFFFFFFC, though b[3] is fullptr.

Q: How can a valid address be used for invalidation checks?

A: The maximum memory fullptr can hold is exactly 1 byte, so that excludes 99.9999% of the object or array that are larger than 1 byte. And moreover, all memory allocation has to be boundary aligned.

Q: What if I allocate exactly 1 byte? I am adamant (that) you answer this!!

A: It beats me why anyone wants to allocate 1 byte. Don’t worry, I have you fully covered. The memory allocator must take precaution never to allocate 1 byte at the last address location. Allocating at last byte address is plain undoable for sophisticated allocator that pads canary at both ends of memory for buffer overrun detection.

In making C++ a better language, fullptr is fully poised to replace nullptr without objection from the C++ committee. Bjarne Stroustrup has already given his nod of approval. Resistance is futile.  Exterminate nullptr and embrace fullptr!

bjarne

fullptr is unquestionably the most revolutionary improvement to C++ since its creation.” – Bjarne Stroustrup

herb_sutter

“C++ committee gets overwhelming requests from developers to backport fullptr to C++98/11/14/17; I’m putting metaclasses work on hold for this.” – Herb Sutter

andrei

fullptr is finest thing to come to C++ since typelist.” – Andrei Alexandrescu

scott_meyers

fullptr makes me want to come out of C++ retirement, back to teaching C++.” – Scott Meyers

dennis_ritchie

Dennis Ritchie is turning in his grave and vows to get fullptr into C language by hook or by crook.

anders

“C# (is) not be outdone. C# 10 to get fullptr in 2022 before C++23″ – Anders Hejlsberg

walter_bright

“C++ has finally upped the ante! I am ditching D language and pledging full allegiance to C++ for fullptr.” – Walter Bright

Before anyone attempt to mount any rigorous effort to refute my points, take a good look at the date of this blog post first.

Disclaimer: None of the abovementioned persons made the comments listed in this blog. The comments are fully fictitious.

For those who like this 2019 April’s Fool post, be sure to check out 2020 April’s Fool post: C++23: Mutable string_view