- Primary Motivation
- Single Threaded Permutation
- Multi-Threaded Permutation
- Single Threaded Combination
- Multi-Threaded Combination

## Primary Motivation

My Github repository, Boost Concurrent Permutations and Combinations on CPU, has been up since January 2017. However, download is close to none while my older single threaded `next_combination`

has plenty of downloads. So I figured it must be the code usage is deemed too complex. Aim of this article is to show a very simple, albeit inefficient, multithreaded example to give users a guide. It should be relatively easy for user to come up with an efficient multithreaded code, once they understand the examples in this article. For each permutation and combination section, I am going to start off with a single threaded example then followed by a multi-threaded example.

## Single Threaded Permutation

First I’ll show you the example on using `next_permutation`

in single threaded scenario. The `while`

loop display `std_permuted`

until `next_permutation`

returned `false`

when `std_permuted`

is detected to be in descending order. That is “54321”.

void usage_of_next_perm() { std::string std_permuted = "12345"; do { std::cout << std_permuted << std::endl; } while (std::next_permutation(std_permuted.begin(), std_permuted.end())); }

Output is follows.

12345 12354 12435 12453 12534 12543 ... 54123 54132 54213 54231 54312 54321

## Multi-Threaded Permutation

Next, I’ll show you on how to finding permutation in multithreaded scenario. There is a total of 120 different permutations in collection of 5 elements. We’ll use 4 threads in my example, meaning each thread will find 120/4=30 permutations. To tell you the truth, each thread will find 30-1=29 permutations, because the 1st one is already computed by `find_perm_by_idx`

in the main thread and passed to the thread. To keep the discussion simple, we pretend each thread computes 30 permutations. 1st thread finds 0th to 29th permutations, while 2nd thread starts from 30th to 59th permutations, and 3rd thread begins from 60th to 89 and so on. From 2nd thread and 3rd thread to start from 30th and 60th permutation respectively, we need to make use of `find_perm_by_idx`

to find 30th and 60th permutation given their index, we’ll find the rest with `next_permutation`

. Techincally, it is possible to use `find_perm_by_idx`

to find all permutations by giving indices from 0 to 119 but it is simply not fleasible to do so, because `find_perm_by_idx`

is much much slower than `next_permutation`

. As a word of caution, `index_to_find`

must be of type that is large enough to store the factorial(n). For instance, you are just finding the 1st 10 permutations, `index_to_find`

type can store 0 to 10 but if it cannot store the factorial(n), `find_perm_by_idx`

will not generate the permutation correctly. This applies to `find_comb_by_idx`

as well. If your largest integer type is not large enough, you can consider using Boost Multiprecision Integer. And `vector_type`

can be any collection type that supports `push_back()`

and `size()`

methods, like `std::string`

. You can see I use `std::string`

in my example.

I join my theads before starting the next thread because `all_results`

is shared with all threads and it is not guarded with a lock, therefore not thread-safe. It is only meant to be a simple example on how to `find_perm_by_idx`

. You can see I call the the slow `find_perm_by_idx`

in the main thread. It is your job to try figure out on how to call `find_perm_by_idx`

in the worker threads.

template<typename int_type, typename vector_type> vector_type find_perm_by_idx(int_type index_to_find, vector_type& original_vector); void usage_of_perm_by_idx() { uint64_t index_to_find = 0; std::string original_text = "12345"; std::string std_permuted = "12345"; std::vector all_results; size_t repeat_times = 30; for (size_t i=0; i<4; ++i) { std::string permuted = concurrent_perm::find_perm_by_idx(index_to_find, original_text); std::thread th([permuted, &all_results, repeat_times] () mutable { // do your work on the permuted result, instead of pushing to vector all_results.push_back(permuted); for (size_t j=1; j<repeat_times; ++j) { std::next_permutation(permuted.begin(), permuted.end()); // do your work on the permuted result, instead of pushing to vector all_results.push_back(permuted); } }); th.join(); index_to_find += repeat_times; } }

I skip the output here, I have verified they are the same as generated by single threaded `next_permutation`

.

## Single Threaded Combination

Next, we have come to `next_combination`

section. To compute the total combination count, we use `compute_total_comb`

. `total`

must be of type that is large enough to store the factorial(n).

void usage_of_next_comb() { std::string original_text = "123456"; std::string std_combined = "123"; uint64_t total = 0; if(concurrent_comb::compute_total_comb(original_text.size(), std_combined.size(), total)) { std::cout << std_combined << std::endl; for (uint64_t i = 1; i < total; ++i) { stdcomb::next_combination(original_text.begin(), original_text.end(), std_combined.begin(), std_combined.end()); std::cout << std_combined << std::endl; } } }

Output is follows. If your collection is sorted like `original_text`

, all generated combination will be sorted as well, as you can see below.

123 124 125 126 134 135 136 145 146 156 234 235 236 245 246 256 345 346 356 456

## Multi-Threaded Combination

`index_to_find`

must be of type that is large enough to store the factorial(n) and `vector_type`

can be any collection type that supports `push_back()`

and `size()`

methods, like `std::string`

. Similar to `find_perm_by_idx`

mentioned above, it is possible to find all combinations with `find_comb_by_idx`

but it is much slower compared to `next_combination`

, so we resort to `next_combination`

to find the rest of combinations. There are total of 20 combinations, so each of the 4 thread will find 20/4=5 combinations. Actually, each thread will find 5-1=4 combination, because the 1st one is already computed by `find_comb_by_idx`

in the main thread and passed to the thread.

template<typename int_type, typename vector_type> vector_type find_comb_by_idx(const uint32_t subset, int_type index_to_find, vector_type& original_vector); void usage_of_comb_by_idx() { uint64_t index_to_find = 0; std::string original_text = "123456"; std::string std_combined = "123"; std::vector all_results; size_t repeat_times = 5; for (size_t i = 0; i < 4; ++i) { std::string combined = concurrent_comb::find_comb_by_idx(std_combined.size(), index_to_find, original_text); std::thread th([original_text, combined, &all_results, repeat_times] () mutable { // do your work on the combined result, instead of pushing to vector all_results.push_back(combined); for (size_t j = 1; j < repeat_times; ++j) { stdcomb::next_combination(original_text.begin(), original_text.end(), combined.begin(), combined.end()); // do your work on the combined result, instead of pushing to vector all_results.push_back(combined); } }); th.join(); index_to_find += repeat_times; } }

Output is skipped here, afterall I have verified they are the same as generated by single threaded `next_combination`

.