C++: Quick and Dirty Log Extraction and Replay

New Text Stream library is hosted at Github.

Log Extraction

Hackers can stay infiltrated and undetected up to 200 days! Nowadays, hackers do not even bother to cover their tracks because most logs are not checked for unusual activities. This tip-trick is to help to change all that with the New Text Stream library. Unfortunately, I had given it a New Text name which stuck until now. Talk about bad naming! New Text Stream library has write-read symmetry which write and read operations are the exact opposite. See the pseudo-code below. See New Text Stream tutorial for more information.

// Writing
ostm << name <> name >> age;

In this tip-trick, we are just going to focus on the input stream. For it to work, every of your log information mustn’t be broken into multi-line because the library only parses one line at any given time. And timestamp at the beginning of each log line must be removed first because timestamp is certainly different and cannot be matched. In the example below, the log logged the new user registration activity and UserName and DoB are extracted. match() has a second process parameter default to true. Set to false if you are not sure whether the input line is going to match successfully, so as not to waste time to process (or tokenize) it. The matching is not foolproof. For example, say ", DoB=" appears in the UserName which you are trying to extract, the 2nd extraction (DoB) shall fail. The principle behind it is simple, for the first input ({0}), the input stream try to extract whatever text in between "NEW_USER UserName=" and ", DoB=" and the last string extraction ({1}) is whatever text behind ", DoB=". This method cannot extract variable-length array, to workaround it, extract the array as a string input and split the string according to its delimiter.

try
{
    new_text::ifstream is;

    std::string log_line = "2020-01-16 18:23:56.020 NEW_USER UserName=Kelly, DoB=1999-01-02";
    log_line = log_line.substr(24); // get rid of the log timestamp
    is.str(log_line);
    if (is.match("NEW_USER UserName={0}, DoB={1}"))
    {
        std::string name = "";
        std::string date = "";
        is >> name >> date;
        std::cout << name << ":" << date << std::endl;
    }
}
catch (std::exception & e)
{
    std::cout << "Exception thrown:" << e.what() << std::endl;
}

The output is as follows.

Kelly:1999-01-02

Next, to enable extracting the date into year, month and day with ease, you can use the matching code below. :t signals to the library to trim the string afterward and :02 tells the New Text Stream that the number is always 2 digits and may come with a leading zero and trim away that zero. Read New Text Stream tutorial for more information.

try
{
    new_text::ifstream is;

    std::string log_line = "2020-01-16 18:23:56.020 NEW_USER UserName=Kelly, DoB=1999-01-02";
    log_line = log_line.substr(24); // get rid of the log timestamp
    is.str(log_line);
    if (is.match("NEW_USER UserName={0}, DoB={1:t}-{2:02}-{3:02}")) // Parse date in YYYY-MM-DD
    {
        std::string name = "";
        int year = 0, month = 0, day = 0;
        is >> name >> year >> month >> day;
        std::cout << name << ":" << year << "-" << month << "-" << day << std::endl;
    }
}
catch (std::exception & e)
{
    std::cout << "Exception thrown:" << e.what() << std::endl;
}

The output is as follows.

Kelly:1999-1-2

Fuzzing

Fuzzing is a methodology to check for runtime problem or vulnerability by attempting to crash your application with unpredictable input on the attack surface. There are 2 types of random input, black-box and white-box. Black box fuzzing is truly random and fuzzer is not aware of the data type being fuzzed whereas white-box fuzzing is different in this regard, for example, if it knows the data type is a date and it tries to input an invalid date with 13th month or 30th day in February. The number one frustration with fuzzing is most of the time we do not have access to the random input that causes the crash, rendering developer helpless to troubleshoot. We can log every random input and try to replay the commands with input extracted with New Text Stream. Remember to clear all log after a successful fuzzing run so as not to accumulate the log.

Log Replay

When the application crashed at a customer site, we may not have the necessary information to reproduce the crash. If we log down commands and its input, we can replay, provided that the input is neither huge nor consists of complex composite data type. We do not have to log every function call, say you have an engine with 10,000 functions and only 100 of them are publicly accessible, you only need to implement logging for those, not all the functions. Log replay is much harder to implement for GUI application that involves mouse clicking and keyboard typing. However, if the GUI has an underlying engine and that engine can be compiled and linked to run standalone from the GUI; The log replay still can work on the standalone engine to debug the issue.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close