# Balancing Partitions

I never liked Dynamic Programming.

In fact, in programming contests, where the pressure is always on winning, getting it, I would feel immensely stupid for not getting it. Anyway, after reading up a bit about how it works from the CLR, I ended up understanding a bit more about how it works.

This post is about a problem given in this month’s USACO contest (Silver Division), called **divgold** where you are tasked with solving the Balanced Partition problem. It’s a fairly known example, but I was unfamiliar with it. You can read all the problems by viewing the January contest on the USACO Contestgate. You need to register though.

## Balanced-Partition

You are given numbers, let’s call them , and you must find the number of ways to partition these numbers into two groups and such that:

- ;
- is minimized, where and .

And you must find the minimum difference itself, .

Let’s make some notes before actually talking about the solution (which – surprise! – involves dynamic programming). First of all, it is . This can be of course proven quite easily, and we will actually do it since it contains the key idea that helps solve the problem.

**Warning! Theoretical stuff ahead. If you just want to learn how to solve the problem, read the official analysis!**

So, when proving that Balanced-Partitions we’ll first prove that Balanced-Partitions by devising a non-deterministic algorithm that solves it.

Note however that the problem wants to find the number of ways to partition the numbers. It is **not** a decision problem. We’ll instead redefine it to ask whether you can get a **difference of at exactly ** between the two sets. So we’re completely ignoring the number of ways to get that difference, we’re only interested in whether getting it is possible or not. We’ll see how to actually compute the number of ways later.

### Balanced-Partitions

We’ll use **choice**, **success** and **fail** to describe it. I think that the simplest way to build the actual partition is to generate all possible 0/1 assignments and get the sums of the two resulting sets of numbers (those who were given a 0, and those who were given a 1).

M-Balanced-Partition(A, N, m): S1 = S2 = 0 for i = 1 .. N: if choice(0, 1) == 1: S1 += A[i] else S2 += A[i] if abs(S1 - S2) == m: success fail

So, what this simple algorithm does is assign each element of a value of either 0 or 1. If is 1, it’s in the first set, otherwise, it’s in the second set. By generating all of the possibilities using** choice **we will clearly determine whether or not obtaining the difference we need is possible.

The actual complexity of the algorithm itself is since we’ll partition the array into one go (lines 3-7). Since is a polynomial, we conclude that Balanced-Partitions .

### Balanced-Partitions

Let’s suppose that we want to know whether we can get a certain difference, . Let’s first note that if the two sets are (note: we’ll also use when referring to the sums themselves as well!) and we call the smaller sum, and so therefore and . In fact, no matter what we call these two sets, if we can form a set of sum , we’ll clearly have obtained the other set as well (it’s this set’s complement!).

So, in fact, finding whether or not Balanced-Partitions has a solution is kind of equivalent to wanting to find whether we can get a certain sum, let’s call it . This, however, is the Q-Sums problem which is *known* to be . So, we’ll try reducing Q-Sums to Balanced-Partitions. If we prove this to be true, then Balanced-Partitions because it means that for any problem , Q-Sums (Q-Sums is also $NP-Hard$!) So, by the transitive nature of the polynomial reduction relationship (that’s what the funny symbol is) it follows that Balanced-Partitions too.

**Warning! Possible rambling ahead. Feel free to skip the following paragraph (Also, the picture is only chuckle-worthy if visiting from Facebook!) : )**

It’s important to note that the key element is reducing a problem that is already known to be $NP-Hard$ to our problem. That means intuitively that our problem is **at least as hard as** the original problem. If we’d do it in the opposite direction, it would be meaningless! Why? Well, consider the problem (not a decision problem, I know, but I can’t think of a better example, maybe a helpful comment someone?) — sort an array of integers. This clearly has lots of great polynomial time algorithms that solve it (my favorite being quicksort with a randomized pivot). But, we could equally well sort an array of numbers by generating all possible permutations of the numbers and outputting first one that we generate that is sorted. This would run in exponential time though, as generating the permutations of a set is in (to be taken with a grain of salt, since it’s not actually a decision problem). So, **reducing** a problem we know nothing about **to a difficult problem achieves nothing**.

Ok, so, If you’re still following at this point, let’s get on with it and try to do the reduction itself. We must prove that we can solve Q-Sums using Balanced-Partitions and that there can be no false-positive, i.e. for an input , being the set of inputs for Q-Sums and a polynomial time algorithm/function , where is the set of inputs for Balanced-Partitions, .

If we want to find a set of sum , we’ve seen that there existing a difference is equivalent to there being a set of sum . So, we’ll just map and solve Balanced-Partitons for that value of . It’s pretty clear that the two problems are equivalent I would say, to prove it, we just“go back” and forth through the relationship between and .

Now, we’re finally content. Balanced-Partition is both in and in , and we can finally conclude that it is .

### Back to the *initial problem*

Why go through all the trouble proving that it’s $NP-Complete$ anyway? Well, as a nifty exercise for one, but it also provides the essential insight that can help solve the initial problem from the USACO contest, **divgold**. There, we were tasked with finding the minimum difference as well as the number of ways to obtain it. Well, this illustrates the connection between this kind of optimum problems and their associated decision problem pretty clearly. To find the minimum difference, call it , we’ll essentially start with and keep increasing the difference until we find that .

How far are we supposed to go anyway? Well, for one, the difference between the two sets can be at most for a very loose upper bound, (but I really think it should something much tighter, like perhaps the maximum value from , but I didn’t manage to prove this, not at this late hour anyway; I’m not even sure whether or not it’s true to be honest). Anyway, the point is we’ll stop at some point, in at most steps.

How to find out whether we can get a particular difference though? Well, here the insight from the NP-Hardness proof comes in handy since the transformation we’ve done between and is bijective, and that means intuitively that we can also use it to solve Balanced-Partition using Q-Sums. In English, we want to know whether it’s possible to obtain a sum . What would happen if is odd? Well, we’d certainly not be able to get a rational sum out of integers. That doesn’t bother us one bit though as we’ll soon see.

This is where dynamic programming comes in. Q-Sums is a problem with a pseudo-polynomial time algorithm. That simply means that there exists an algorithm that finds the answer in polynomial time dependent on the numeric value of the input. That numeric value is exponential in terms of bits however!

(neat fact: Problems that can be solved by such algorithms are called *weakly NP-Complete*. These are as far as I understand the only NP-Complete problems where dynamic programming can be applied. If however, the numbers in the array had been real numbers, such a solution would not have been possible. And by *real numbers* I mean actual real numbers, not IEE754 floating point, which are in fact rational.)

To use dynamic programming, we need to define an optimal sub-solution’s structure. This structure will be used do determine the larger sub-solutions, until we finally have the whole solution! If I mess up the explanation, feel free to use the extremely helpful tutorials of Brian Dean. He actually has many more dynamic programming examples there you might find interesting.

So, we’re going to solve Q-Sums. We need the following structure (usually a matrix that adequately describes what exactly a sub-problem is with a couple of indices):

= the number of ways to get the sum using the first numbers. To actually solve it, we need a recurrence relation. We need to think how to get from one state to the next, either with a top-down or a bottom-up approach (i.e. either forward or backward). Let’s look “back”, at . We’re at the -th number and we can either chose to add it to the sum or not.

If we don’t add it, then we need to get the same sum , without the current number, , so we add . Otherwise, to get in total, we’ll need :

Recursion formula –

The base case being –

Note that we actually count the number of ways to get the desired sum here. It could have been the same decision problem we talked about if we performed the same calculations modulo 2 (or used a Boolean data type and interpreted the ‘+’ operation to mean logical OR).

At any rate, for an array $A$, of length , the answer to Q-Sums will be in . We want to check for in a loop from . Also, needs to be even.

for (j = 0; (s - j)/2 >= 0; ++ j) if ((s - j) % 2 == 0 && N[n][(s - j)/2] > 0) break;

And so, the minimum difference will be and the number of ways to get will be .

### Final remarks

You can in fact optimize this solution space-wise a lot ( although this completely escaped my mind in the contest and I didn’t get max for this problem 😦 ). Notice that we don’t use all of the rows in the matrix at once, we only use the final two, and really, we only need one row since we’re only interested in and all of the updates are done in place.

So, we can finally write a much better recurrence:

where

where is the number of ways to get a subset of sum .

You can find all of the implementations here.

Also, *dear reader*, I also need your help with a couple of questions:

- What happens (in the NP-Hardness proof) when Balanced-Partition is taken to require the partitioning into two subsets whose sums have an absolute difference of
**at most**? Basically replace the equality with an inequality.

I was thinking about calling Balanced-Partition twice with and to see if you get different answers, If you do, then you might not get the in the Q-Sum you were looking for. But does that mean it doesn’t exist? - Is it true that the maximum difference between the two partitions in the optimum case can be no larger than the maximum number in the array ?

I feel that this is true, but can’t think of any good reason right now. Assistance appreciated 😉

Re question 2: Yes, it’s true (assuming all numbers are non-negative, figure out what has to change if negative numbers are allowed).

Suppose A is sorted in decreasing order. Put the first (largest) number in set 1.

Put the next numbers in set 2 until either A is exhausted or the sum of set 2 exceeds the sum of set 1. In the first case, the difference is at most A[0], otherwise it’s at most A[k] if that’s the last number you put in set 2.

If there are numbers left, put the next numbers in set 1 until either A is exhausted or the sum of set 1 exceeds the sum of set 2.

If there are still numbers left, switch again and put the next numbers in set 2 until you-know-what.

Iterate until A is exhausted. If the largest number is at most half the sum, the minimal difference is at most the smallest number in the array.

Arrgh, not the smallest in the array; the smallest number you put in the set with the larger sum, d’oh.

Good idea. I see how this works for a 3 number set, but I’ve failed to prove that it works on a more general case with induction.

I’ll hopefully look into it at some point. For 3 numbers though, the resulting difference is indeed smaller than the smallest possible element in the array. It’s like interspersing pluses and minuses and asking how to do that so you get the smallest difference.

This is the actual greedy algorithm for an approximation I think. If it’s true for this case, it surely applies for the optimal one, although I still need to figure out how to actually prove it.

Thanks for the help dafis!

Oh, and I gave up on Criterion for measuring the performance. It turns out that I just want the runtime of an operation and Data.Time can easily do that 🙂