Bring Your Animations to H264/HEVC Video

 

Table of Contents

Introduction

Last year, I introduced a single header windows-based software-based video encoder for OpenGL that works on Windows 7 and above. See the above video demo! I have decoupled it from the OpenGL thread and make it simpler to encode a 2D frame. All you need is to fill in the frame buffer, frame by frame to create your animations. In this article, I use GDI+ since I am most familiar with it but you are welcome to use your favourite graphics library; The video encoder is not coupled with GDI+. HEVC codec used to come bundled with Windows 10 but now Microsoft has removed it and put it on sale in the Microsoft Store. That HEVC codec has a quality issue where higher bitrate has to be given to maintain the same quality as H264 encoded video. Make sure the video file is not opened or locked by video player before you begin to write to it. The new H264Writer constructor is as follows:

H264Writer(const wchar_t* mp3_file, const wchar_t* dest_file, VideoCodec codec, 
    int width, int height, int fps, int duration /*in milliseconds*/, 
    std::function<bool(int, int, int, int, UINT32*)> renderFunction,
    UINT32 bitrate = 4000000);

The mp3_file parameter is a MP3 file path (which can be empty if you do not want any audio) and dest_file parameter is the resultant video file. codec parameter can be either H264 or HEVC. The width and height parameters refer to the video width and height. fps parameter is the frames per second of the video which I usually specified as 30 or 60. duration parameter refers to the video duration in milliseconds which can be set as -1 to indicate the video duration to be the same as the MP3. renderFunction parameter is the render method to be called every frame. bitrate parameters refers to the video bitrate of bytes per second. Remember to set the bitrate higher for high resolution video and HEVC. The render function signature can be as follows. The width and height is the video dimension. fps is the frames per second while frame_cnt is the frame count which auto-increments itself on every frame. pixels parameter is the single dimensional array to be filled up with your bitmap data. The return value should be false for catastrophic error which encoding shall be stopped.

bool render(int width, int height, int fps, int frame_cnt, UINT32* pixels);

Red Video

For our first example, I keep it simple. We just render a red video.

red_video

This is the main function whereby H264Writer.h is included and H264Writer is instantiated and Process() is called to encode the video. Process() calls the given renderFunction() which is renderRedImage().

#include "../Common/H264Writer.h"

bool renderRedImage(int width, int height, int fps, int frame_cnt, UINT32* pixels);

int main()
{
    std::wstring musicFile(L"");
    std::wstring videoFile(L"C:\\temp\\RedVideo.mp4");

    std::function<bool(int, int, int, int, UINT32*)> renderFunction = renderRedImage;
    H264Writer writer(musicFile.c_str(), videoFile.c_str(), 
                    VideoCodec::H264, 640, 480, 30, 5000, renderFunction);
    if (writer.IsValid())
    {
        if (writer.Process())
        {
            printf("Video written successfully!\n");
            return 0;
        }
    }
    printf("Video write failed!\n");
}

Below is the renderRedImage() body. It only renders when frame_cnt is zero, meaning on the first frame because since pixels remains unchanged, there is no need to fill it up again on every frame.

// render a red image once!
bool renderRedImage(int width, int height, int fps, int frame_cnt, UINT32* pixels)
{
    if (frame_cnt == 0)
    {
        for (int col = 0; col < width; ++col)
        {
            for (int row = 0; row < height; ++row)
            {
                int index = row * width + col;
                pixels[index] = 0xffff0000;
            }
        }
    }
    return true;
}

Pixel format is in Alpha,Red,Green,Blue (ARGB) format. For example, you want a blue video, just change to pixels[index] = 0xff0000ff;

One JPEG Video

For our second example, we load a JPEG image with GDI+ and render once, that is when frame_cnt is zero.

yes_video

Because we are using GDI+ now, we have to include the Gdiplus.h header and its Gdiplus.lib, and also to initialize and destroy GDI+ with GdiplusStartup() and GdiplusShutdown respectively. Otherwise, the main function is unchanged, except the renderFunction method is set to renderJPG now.

#include "../Common/H264Writer.h"
#include <Gdiplus.h>
#pragma comment(lib, "gdiplus.lib")

bool renderJPG(int width, int height, int fps, int frame_cnt, UINT32* pixels);

int main()
{
    std::wstring musicFile(L"");
    std::wstring videoFile(L"C:\\temp\\JpgVideo.mp4");

    std::function<bool(int, int, int, int, UINT32*)> renderFunction = renderJPG;

    // Initialize GDI+ so that we can load the JPG
    Gdiplus::GdiplusStartupInput m_gdiplusStartupInput;
    ULONG_PTR m_gdiplusToken;

    Gdiplus::GdiplusStartup(&m_gdiplusToken, &m_gdiplusStartupInput, NULL);

    H264Writer writer(musicFile.c_str(), videoFile.c_str(), 
               VideoCodec::H264, 640, 480, 30, 10000, renderFunction);
    if (writer.IsValid())
    {
        if (writer.Process())
        {
            printf("Video written successfully!\n");
            Gdiplus::GdiplusShutdown(m_gdiplusToken);
            return 0;
        }
    }
    printf("Video write failed!\n");
    Gdiplus::GdiplusShutdown(m_gdiplusToken);
}

renderJPG() is straightforward for those developers familiar with GDI+. It loads the “yes.jpg” with the Bitmap class. bmp is the Bitmap with the same dimension as the video. We fill bmp with black color using FillRectangle(). Then we calculate the aspect ratio of the jpeg file and video frame. If w_ratio_jpg is greater than w_ratio_bmp, it means image is wider than video so you will see 2 horizontal black bars at the top and bottom of the video, otherwise you shall see 2 vertical black bars on the 2 sides of the video. In other words, we try to render the image as much as to cover the video while maintaining its original aspect ratio. To get bmp pixel pointer, we must call LockBits() and UnlockBits() afterwards after use. You notice in the double for loop, the image is rendered vertically upside down, so that it appears correctly in the video output.

// render a jpg once!
bool renderJPG(int width, int height, int fps, int frame_cnt, UINT32* pixels)
{
    using namespace Gdiplus;
    
    if (frame_cnt == 0)
    {
        Bitmap bmp(width, height, PixelFormat32bppARGB);
        Bitmap jpg(L"image\\yes.jpg", TRUE);
        Graphics g(&bmp);

        SolidBrush brush(Color::Black);
        g.FillRectangle(&brush, 0, 0, bmp.GetWidth(), bmp.GetHeight());

        float w_ratio_bmp = bmp.GetWidth() / (float)bmp.GetHeight();
        float w_ratio_jpg = jpg.GetWidth() / (float)jpg.GetHeight();

        if (w_ratio_jpg >= w_ratio_bmp)
        {
            int width2 = bmp.GetWidth();
            int height2 = (int)((bmp.GetWidth() / (float)jpg.GetWidth()) * jpg.GetHeight());
            g.DrawImage(&jpg, 0, (bmp.GetHeight() - height2) / 2, width2, height2);
        }
        else
        {
            int width2 = (int)((bmp.GetHeight() / (float)jpg.GetHeight()) * jpg.GetWidth());
            int height2 = bmp.GetHeight();
            g.DrawImage(&jpg, (bmp.GetWidth() - width2) / 2, 0, width2, height2);
        }

        BitmapData bitmapData;
        Rect rect(0, 0, bmp.GetWidth(), bmp.GetHeight());

        bmp.LockBits(
            &rect,
            ImageLockModeRead,
            PixelFormat32bppARGB,
            &bitmapData);

        UINT* pixelsSrc = (UINT*)bitmapData.Scan0;

        if (!pixelsSrc)
            return false;

        int stride = bitmapData.Stride >> 2;

        for (int col = 0; col < width; ++col)
        {
            for (int row = 0; row < height; ++row)
            {
                int indexSrc = (height-1-row) * stride + col;
                int index = row * width + col;
                pixels[index] = pixelsSrc[indexSrc];
            }
        }

        bmp.UnlockBits(&bitmapData);

    }
    return true;
}

Two JPEG Video

For the third example, we display first image and slowly alphablend with the second image until it appears. You can see the effect by looking at the video.

The main function is exactly the same as previous except renderFunction is set to render2JPG().

#include "../Common/H264Writer.h"
#include <Gdiplus.h>
#pragma comment(lib, "gdiplus.lib")

// render 2 jpg
bool render2JPG(int width, int height, int fps, int frame_cnt, UINT32* pixels);
inline UINT Alphablend(UINT dest, UINT source, BYTE nAlpha, BYTE nAlphaFinal);

int main()
{
    std::wstring musicFile(L"");
    std::wstring videoFile(L"C:\\temp\\TwoJpgVideo.mp4");

    std::function<bool(int, int, int, int, UINT32*)> renderFunction = render2JPG;

    // Initialize GDI+ so that we can load the JPG
    Gdiplus::GdiplusStartupInput m_gdiplusStartupInput;
    ULONG_PTR m_gdiplusToken;

    Gdiplus::GdiplusStartup(&m_gdiplusToken, &m_gdiplusStartupInput, NULL);

    H264Writer writer(musicFile.c_str(), videoFile.c_str(), 
                      VideoCodec::H264, 640, 480, 30, 3000, renderFunction);
    if (writer.IsValid())
    {
        if (writer.Process())
        {
            printf("Video written successfully!\n");
            Gdiplus::GdiplusShutdown(m_gdiplusToken);
            return 0;
        }
    }
    printf("Video write failed!\n");
    Gdiplus::GdiplusShutdown(m_gdiplusToken);
}

render2JPG is almost similar to renderJPG, except it loads 2 jpeg with the Bitmap class. The transparency stored in alpha variable is zero(total transparent) and 255(total opaque) when the duration is less or equal to 1000 milliseconds and is more or equal to 2000 milliseconds respectively. Between duration of 1000 and 2000 milliseconds, the alpha is calculated. A little note about the frame_duration = 1000 / fps: it is imprecise because it is in integer. For example, when the fps is 30: 1000/30 gives 33 millseconds but 30 * 33 only yields 990 millseconds, not the original 1000 milliseconds. Just to warn you, render2JPG() can take a long time because it is opening the two JPEG file and rendering on every frame, unlike the previous 2 examples which only render once on the first frame.

// render 2 jpg
// This function takes a long time.
bool render2JPG(int width, int height, int fps, int frame_cnt, UINT32* pixels)
{
    using namespace Gdiplus;

    Bitmap bmp(width, height, PixelFormat32bppARGB);
    Bitmap bmp2(width, height, PixelFormat32bppARGB);
    // Warning JPG1 and JPG2 must have the same dimensions
    Bitmap jpg1(L"image\\first.jpg", TRUE);
    Bitmap jpg2(L"image\\second.jpg", TRUE);
    Graphics g(&bmp);
    Graphics g2(&bmp2);

    BYTE alpha = 0;
    int frame_duration = 1000 / fps;
    if (frame_cnt * frame_duration <= 1000)
        alpha = 0;
    else if (frame_cnt * frame_duration >= 2000)
        alpha = 255;
    else
        alpha = ((frame_cnt * frame_duration) - 1000) * 255 / 1000;

    float w_ratio_bmp = bmp.GetWidth() / (float)bmp.GetHeight();
    float w_ratio_jpg = jpg1.GetWidth() / (float)jpg1.GetHeight();

    SolidBrush brush(Color::Black);
    g.FillRectangle(&brush, 0, 0, bmp.GetWidth(), bmp.GetHeight());

    if (w_ratio_jpg >= w_ratio_bmp)
    {
        int width2 = bmp.GetWidth();
        int height2 = (int)((bmp.GetWidth() / (float)jpg1.GetWidth()) * jpg1.GetHeight());
        g.DrawImage(&jpg1, 0, (bmp.GetHeight() - height2) / 2, width2, height2);
        g2.DrawImage(&jpg2, 0, (bmp2.GetHeight() - height2) / 2, width2, height2);
    }
    else
    {
        int width2 = (int)((bmp.GetHeight() / (float)jpg1.GetHeight()) * jpg1.GetWidth());
        int height2 = bmp.GetHeight();
        g.DrawImage(&jpg1, (bmp.GetWidth() - width2) / 2, 0, width2, height2);
        g2.DrawImage(&jpg2, (bmp2.GetWidth() - width2) / 2, 0, width2, height2);
    }

    BitmapData bitmapData;
    BitmapData bitmapData2;
    Rect rect(0, 0, bmp.GetWidth(), bmp.GetHeight());

    bmp.LockBits(
        &rect,
        ImageLockModeRead,
        PixelFormat32bppARGB,
        &bitmapData);

    bmp2.LockBits(
        &rect,
        ImageLockModeRead,
        PixelFormat32bppARGB,
        &bitmapData2);

    UINT* pixelsSrc = (UINT*)bitmapData.Scan0;
    UINT* pixelsSrc2 = (UINT*)bitmapData2.Scan0;

    if (!pixelsSrc || !pixelsSrc2)
        return false;

    int stride = bitmapData.Stride >> 2;

    for (int col = 0; col < width; ++col)
    {
        for (int row = 0; row < height; ++row)
        {
            int indexSrc = (height - 1 - row) * stride + col;
            int index = row * width + col;
            pixels[index] = Alphablend(pixelsSrc2[indexSrc], pixelsSrc[indexSrc], alpha, 0xff);
        }
    }

    bmp.UnlockBits(&bitmapData);
    bmp2.UnlockBits(&bitmapData2);

    return true;
}

pixels[index] is determined by the Alphablend() function below:

inline UINT Alphablend(UINT dest, UINT source, BYTE nAlpha, BYTE nAlphaFinal)
{
    BYTE nInvAlpha = ~nAlpha;

    BYTE nSrcRed = (source & 0xff0000) >> 16;
    BYTE nSrcGreen = (source & 0xff00) >> 8;
    BYTE nSrcBlue = (source & 0xff);

    BYTE nDestRed = (dest & 0xff0000) >> 16;
    BYTE nDestGreen = (dest & 0xff00) >> 8;
    BYTE nDestBlue = (dest & 0xff);

    BYTE nRed = (nSrcRed * nAlpha + nDestRed * nInvAlpha) >> 8;
    BYTE nGreen = (nSrcGreen * nAlpha + nDestGreen * nInvAlpha) >> 8;
    BYTE nBlue = (nSrcBlue * nAlpha + nDestBlue * nInvAlpha) >> 8;

    return nAlphaFinal << 24 | nRed << 16 | nGreen << 8 | nBlue;
}

Text Animation Video

For our last example, we show the 2 prerendered text images appearing from the middle of the video. See the video below for example.

Because main function essentially remains unchanged, I shall only show the renderFunction named renderText(). There is a thin white rectangle expanding progressively. renderbmp variable is the one where top half of bmp and bottom half of bmp2 are shown. bmp is rendered with jpg1 progressively moving up while bmp2 is rendered with jpg2 progressively moving down. The jpg1 and jpg2 are misnomers since the image loaded are actually PNGs. Bitmap class can load both JPEG and PNG. JPEG is best for storing photographs while PNG is for storing illustrations.

// render text
bool renderText(int width, int height, int fps, int frame_cnt, UINT32* pixels)
{
    using namespace Gdiplus;

    Bitmap renderbmp(width, height, PixelFormat32bppARGB);

    Bitmap bmp(width, height, PixelFormat32bppARGB);
    Bitmap bmp2(width, height, PixelFormat32bppARGB);
    Bitmap jpg1(L"image\\Mandy.png", TRUE);
    Bitmap jpg2(L"image\\Frenzy.png", TRUE);
    Graphics render_g(&renderbmp);

    Graphics g(&bmp);
    Graphics g2(&bmp2);

    float rectProgress = 0.0f;
    float textProgress = 0.0f;
    float frame_duration = 1000.0f / fps;
    float total_duration = frame_cnt * frame_duration;

    SolidBrush brush(Color::Black);
    render_g.FillRectangle(&brush, 0, 0, width, height);
    g.FillRectangle(&brush, 0, 0, width, height);

    int rectHeight = 4;

    int rectWidth = (int)(width * 0.8f);
    if (total_duration >= 1000.0f)
        rectProgress = 1.0f;
    else
        rectProgress = total_duration / 1000.0f;


    if (total_duration >= 2000.0f)
        textProgress = 1.0f;
    else if (total_duration <= 1000.0f)
        textProgress = 0.0f;
    else
        textProgress = (total_duration - 1000.0f) / 1000.0f;

    g.DrawImage(&jpg1, (width - jpg1.GetWidth()) / 2, 
    (height / 2) - (int)(jpg1.GetHeight() * textProgress), jpg1.GetWidth(), jpg1.GetHeight());
    g.FillRectangle(&brush, 0, height / 2 - 4, width, height / 2 + 4);
    render_g.DrawImage(&bmp, 0, 0, width, height);

    g2.DrawImage(&jpg2, (width - jpg2.GetWidth()) / 2, 
    (int)((height / 2 - jpg2.GetHeight()) + (int)(jpg2.GetHeight() * textProgress)), 
    jpg2.GetWidth(), jpg2.GetHeight());
    g2.FillRectangle(&brush, 0, 0, width, height / 2 + 4);
    render_g.DrawImage(&bmp2, 0, height / 2 + 4, 0, height / 2 + 4, 
                       width, height / 2 - 4, Gdiplus::UnitPixel);

    SolidBrush whitebrush(Color::White);
    int start_x = (width - (int)(rectWidth * rectProgress)) / 2;
    int pwidth = (int)(rectWidth * rectProgress);
    render_g.FillRectangle(&whitebrush, start_x, 
                          (height - rectHeight) / 2, pwidth, rectHeight);

    BitmapData bitmapData;
    Rect rect(0, 0, width, height);

    renderbmp.LockBits(
        &rect,
        ImageLockModeRead,
        PixelFormat32bppARGB,
        &bitmapData);

    UINT* pixelsSrc = (UINT*)bitmapData.Scan0;

    if (!pixelsSrc)
        return false;

    int stride = bitmapData.Stride >> 2;

    for (int col = 0; col < width; ++col)
    {
        for (int row = 0; row < height; ++row)
        {
            int indexSrc = (height - 1 - row) * stride + col;
            int index = row * width + col;
            pixels[index] = pixelsSrc[indexSrc];
        }
    }

    renderbmp.UnlockBits(&bitmapData);

    return true;
}

The code is hosted at GitHub. Remember to copy the image folder to Debug or Release folder before running the executable. Have fun with converting your cool animations to H264/HEVC video to share with others and keepsake for posterity!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close