# C++: Simple Permutation and Combination Parallelism

## Primary Motivation

My Github repository, Boost Concurrent Permutations and Combinations on CPU, has been up since January 2017. However, download is close to none while my older single threaded `next_combination` has plenty of downloads. So I figured it must be the code usage is deemed too complex. Aim of this article is to show a very simple, albeit inefficient, multithreaded example to give users a guide. It should be relatively easy for user to come up with an efficient multithreaded code, once they understand the examples in this article. For each permutation and combination section, I am going to start off with a single threaded example then followed by a multi-threaded example.

First I’ll show you the example on using `next_permutation` in single threaded scenario. The `while` loop display `std_permuted` until `next_permutation` returned `false` when `std_permuted` is detected to be in descending order. That is “54321”.

```void usage_of_next_perm()
{
std::string std_permuted = "12345";
do
{
std::cout << std_permuted << std::endl;
} while (std::next_permutation(std_permuted.begin(), std_permuted.end()));
}
```

Output is follows.

```12345
12354
12435
12453
12534
12543
...
54123
54132
54213
54231
54312
54321
```

Next, I’ll show you on how to finding permutation in multithreaded scenario. There is a total of 120 different permutations in collection of 5 elements. We’ll use 4 threads in my example, meaning each thread will find 120/4=30 permutations. To tell you the truth, each thread will find 30-1=29 permutations, because the 1st one is already computed by `find_perm_by_idx` in the main thread and passed to the thread. To keep the discussion simple, we pretend each thread computes 30 permutations. 1st thread finds 0th to 29th permutations, while 2nd thread starts from 30th to 59th permutations, and 3rd thread begins from 60th to 89 and so on. From 2nd thread and 3rd thread to start from 30th and 60th permutation respectively, we need to make use of `find_perm_by_idx` to find 30th and 60th permutation given their index, we’ll find the rest with `next_permutation`. Techincally, it is possible to use `find_perm_by_idx` to find all permutations by giving indices from 0 to 119 but it is simply not fleasible to do so, because `find_perm_by_idx` is much much slower than `next_permutation`. As a word of caution, `index_to_find` must be of type that is large enough to store the factorial(n). For instance, you are just finding the 1st 10 permutations, `index_to_find` type can store 0 to 10 but if it cannot store the factorial(n), `find_perm_by_idx` will not generate the permutation correctly. This applies to `find_comb_by_idx` as well. If your largest integer type is not large enough, you can consider using Boost Multiprecision Integer. And `vector_type` can be any collection type that supports `push_back()` and `size()` methods, like `std::string`. You can see I use `std::string` in my example.

I join my theads before starting the next thread because `all_results` is shared with all threads and it is not guarded with a lock, therefore not thread-safe. It is only meant to be a simple example on how to `find_perm_by_idx`. You can see I call the the slow `find_perm_by_idx` in the main thread. It is your job to try figure out on how to call `find_perm_by_idx` in the worker threads.

```template<typename int_type, typename vector_type>
vector_type find_perm_by_idx(int_type index_to_find,
vector_type& original_vector);

void usage_of_perm_by_idx()
{
uint64_t index_to_find = 0;
std::string original_text = "12345";
std::string std_permuted = "12345";
std::vector all_results;
size_t repeat_times = 30;
for (size_t i=0; i<4; ++i)
{
std::string permuted = concurrent_perm::find_perm_by_idx(index_to_find, original_text);

std::thread th([permuted, &all_results, repeat_times] () mutable {

// do your work on the permuted result, instead of pushing to vector
all_results.push_back(permuted);
for (size_t j=1; j<repeat_times; ++j)
{
std::next_permutation(permuted.begin(), permuted.end());
// do your work on the permuted result, instead of pushing to vector
all_results.push_back(permuted);
}
});
th.join();

index_to_find += repeat_times;
}
}
```

I skip the output here, I have verified they are the same as generated by single threaded `next_permutation`.

Next, we have come to `next_combination` section. To compute the total combination count, we use `compute_total_comb`. `total` must be of type that is large enough to store the factorial(n).

```void usage_of_next_comb()
{
std::string original_text = "123456";
std::string std_combined = "123";
uint64_t total = 0;
if(concurrent_comb::compute_total_comb(original_text.size(), std_combined.size(), total))
{
std::cout << std_combined << std::endl;
for (uint64_t i = 1; i < total; ++i)
{
stdcomb::next_combination(original_text.begin(), original_text.end(), std_combined.begin(), std_combined.end());
std::cout << std_combined << std::endl;
}
}
}
```

Output is follows. If your collection is sorted like `original_text`, all generated combination will be sorted as well, as you can see below.

```123
124
125
126
134
135
136
145
146
156
234
235
236
245
246
256
345
346
356
456
```

`index_to_find` must be of type that is large enough to store the factorial(n) and `vector_type` can be any collection type that supports `push_back()` and `size()` methods, like `std::string`. Similar to `find_perm_by_idx` mentioned above, it is possible to find all combinations with `find_comb_by_idx` but it is much slower compared to `next_combination`, so we resort to `next_combination` to find the rest of combinations. There are total of 20 combinations, so each of the 4 thread will find 20/4=5 combinations. Actually, each thread will find 5-1=4 combination, because the 1st one is already computed by `find_comb_by_idx` in the main thread and passed to the thread.

```template<typename int_type, typename vector_type>
vector_type find_comb_by_idx(const uint32_t subset,
int_type index_to_find,
vector_type& original_vector);

void usage_of_comb_by_idx()
{
uint64_t index_to_find = 0;
std::string original_text = "123456";
std::string std_combined = "123";
std::vector all_results;
size_t repeat_times = 5;
for (size_t i = 0; i < 4; ++i)
{
std::string combined = concurrent_comb::find_comb_by_idx(std_combined.size(), index_to_find, original_text);

std::thread th([original_text, combined, &all_results, repeat_times] () mutable {

// do your work on the combined result, instead of pushing to vector
all_results.push_back(combined);
for (size_t j = 1; j < repeat_times; ++j)
{
stdcomb::next_combination(original_text.begin(), original_text.end(), combined.begin(), combined.end());
// do your work on the combined result, instead of pushing to vector
all_results.push_back(combined);
}
});
th.join();

index_to_find += repeat_times;
}
}
```

Output is skipped here, afterall I have verified they are the same as generated by single threaded `next_combination`.