C++: Size Matters in Platform Compatibility

Introduction

For file storage and data communication to work interoperably, the width of datatype must stay invariant across platforms. This tip discusses the pitfalls of platform-dependent data width and their solution. Endianess, deserving a tip of its own, is not covered here.

time_t

time_t stores the number of seconds since 1st January 1970. It is a 32-bit integer, on 32-bit Linux, where it can run up to year 2038, a Y2K equivalent crisis for Linux and otherwise it is 64-bit on 64-bit Linux. Whereas on modern Visual C++, time_t is 64-bit, no matter the x86 or x64 platform. time_t is not guaranteed to be interoperable between platforms, so it is best to store time as text and convert to time_t accordingly.

wchar_t

wchar_t type to hold the Unicode character is UTF-16 on Windows while UTF-32 on Linux/MacOS, therefore incompatible with each other. UTF-16 character can be 2 bytes or 4 bytes depending on its codepage while UTF-32 character is always 4 bytes which is a colossal waste of memory since most Unicode characters can be expressed in 2 bytes. UTF-8 is 1 byte for ASCII and multibyte for Unicode. For interoperability between Windows and other OSes, the solution is to store the text in UTF-8 and convert to wchar_t upon loading. Another solution is to use fixed-width character types such as char16_t or char32_t introduced in C++11.

Integer Types

size_t and its signed counterpart type, ptrdiff_t whose width varies on x86 or x64 platform, should always be avoided in storage and communication packet. Undetermined width type like long type should be avoided as well. Use the fixed width integer types introduced in C++11, such as uint32_t and int32_t.

Pointer Types

Pointer width varies according to x86 or x64 mode. Pointer sometimes are used as a opaque index/identity. Window SDK’s DWORD_PTR is one such example. Pointer derived identity can be temporarily stored in database, file storage or network packets due to distinctness of memory address. It poses a problem where a 64-bit value is sliced off in a 32-bit, say database column type, when the code is recompiled in x64 mode from the original x86 mode. If it has to be done, then use the largest pointer width as the data width. If not, it is best to derive your identity through other means like GUID or truly random number generation.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close