810:153 Substitution Programming Project
Knapsack Problem with Branch and Bound Pruning

Problem Overview/Discussion

The goal of the knapsack problem is to choose a subset of given items that maximizes total benefit while staying within the limit of a prescribed total cost. A decision tree can be used to organize and guide the examination of items for selection.

A sample decision tree that uses five items shows that taking one branch of the tree (e.g., left) implies selecting the item at that level and taking the other branch (right) implies not selecting that item. The leaf nodes show all possible sets of items that might be chosen. Note that at any given node in the tree, we can have a history of the selection or non-selection of items higher in the tree (if we keep track of that).

A brute force approach to this problem could use either a depth-first examination of the tree or a breadth-first examination. Either approach could examine all possible selections—2N possibilities. Note that, when a depth is reached where the limit on cost is exceeded, an algorithm could cut off any deeper examination. We refer to cutting of further examination as pruning the tree.

So, we can stop looking down a branch of a tree when the cost is too much. What would be really nice at this point in time is to be able to also prune branches of the tree because of benefit—that it is or will be too little.

We can! Doing so requires that we two added notions. First, we must select the best possible items first. Second, we need some mechanism for determining a theoretical best benefit if we proceed down a branch of a tree. We can do that!

If we sort the items according to their benefit/cost ratio with the highest benefit per cost first, we could calculate a theoretical best set. If we were allowed to select part of an item (rather than all or none), we would choose the item with highest benefit/cost, then the one with the next highest benefit/cost, etc. Eventually, we would approach the maximum cost and not be able to select the next item. But if we were able to take the part of it that would get us to the maximum cost, then we would have the maximum benefit and exactly the maximum cost.

We cannot actually select items that way, but we can use that theoretical best calculation to help guide our selection process. At any node we can calculate the theoretical best that is possible if we proceed down the tree from that point. Knowing the theoretical best will allow us to avoid going down branches of the tree if we cannot get a better total benefit than we have already seen. The theoretical best is an upper bound on the value that we can achieve for benefit. We sometimes just call it the bound.

So, we will be able to prune branches when their total cost gets too big and when their total benefit will not get any bigger than the best benefit we have seen so far.

There is one additional item to consider. Which branches of the tree should we examine first?

Rather than thinking about branches, it may be useful to think about being at particular nodes and deciding which node to "expand" first. Expanding a node means checking the two descendants of the the node. The two descendants consist of 1) adding the next item and 2) not adding the next item—going left and going right down the tree. We could then add those two nodes to a list of nodes that we might check further or expand. At each point we would, of course, ignore checking further if adding the next item would make the cost too much or if the bound was not greater than whatever best we have seen so far.

Now back to the question about selecting branches to pursue (nodes to expand). The greedy approach suggests that we might want to select the node that is best. What does "best" mean. Two obvious choices are highest value for benefit and highest value for bound. Whichever of these two (or some other choice we make), choosing the best node each time we have a choice should get us to a good answer quicker than not choosing the best. Getting to a good answer sooner means that we can more often ignore branches with bounds less than the best benefit we have found so far, i.e., we get to prune the tree more often.

Pruning the tree is what saves time and thus allows us to solve bigger problems than we might be able to when other algorithms are used.
Note:
The selection tree is never actually stored. One merely uses it as a mental framework for examining the problem. A program can create or follow the tree structure on the fly by knowing the level of the current node being added and the two branches are created by adding and not adding the next item. Thus, it will be important to keep track of the level number and perhaps items selected prior to the current level.

An Algorithm

An algorithm for branch and bound pruning of the knapsack problem is given below. It assumes you have a priority queue that will allow for insertions of nodes at will and for removals of the "best" node. Best node is not defined.

Note that the algorithm is not intended to specify how the items are stored or nodes are stored. It does suggest, however, that each needs to have access to:
  •  level
  •  a set of items selected
  •  a benefit value
  •  a cost value
  •  a bound value
Each node will need its own version of these values.

The algorithm is:
  • sort items in non-increasing order by benefit/cost
  • initialize bestSoFar ( 0, {}, —INFINITY, 0, 0 )
  • initialize a_node ( 0, {}, 0, 0, calculatedBound( bestSoFar ) )
  • enqueue a_node into your priority queue

  • while the priority queue is not empty
    • dequeue the "top" item (currNode)

    • if currNode's bound > bestSoFar's benefit

      • create a node nextAdded equal to currNode with the next item added
      • if nextAdded's cost <= MAX_COST
        • if nextAdded's benefit > bestSoFar's benefit
          • set bestSoFar equal to nextAdded
          end-if

        • if nextAdded's bound > bestSoFar's benefit
          • enqueue nextAdded
          end-if
        end-if

      • create a node nextNotAdded equal to currNode without the next item added
      • set nextNotAdded's bound = calculatedBound( nextNotAdded )
      • if nextNotAdded's bound > bestSoFar's benefit
        • enqueue nextNotAdded
        end-if
      end-if
    end-while

Calculating a bound involves accumulating values for additional-benefit and additional-cost until the MAXCOST value is reached. The additional-benefit and additional-cost are calculated by starting with the next "item" in the sorted list of items and adding on each's benefit and cost until an item is reached for which the
    actual-cost-so-far + additional-cost > MAXCOST
Call this item partial-item. At this point a fraction of partial-item's cost could be added to reach MAXCOST. So could a fraction of the partial-item's benefit be added to the benefit. The fractional-benefit is
      MAXCOST  -  additional-cost
      ---------------------------  X  benefitOfPartialItem
           costOfPartialItem
The bound then has three components. The first is the actual benefit that has been accumulated at a node. The second component is the additional-benefit accumulated as desribed above by adding the whole benefit of as many items as possible. The third component is the fractional-benefit of the partial-item. When totalled, these values provide the bound.

Notice that calculating the bound is not supposed to have any effect on the cost or benefit of any node.