CS 3530 Session 6

Session 6

Describing Complexity

CS 3530
Design and Analysis of Algorithms

Let's Color a Picture!

In the old days... Printed maps... Number of colors affected total cost. Now, on-line... But colors still matter: usability.

Consider this map of a very square continent:

map from Levitin, Problem 1.3.8

Your tasks:

Color the picture so that no adjacent countries have the same color -- but using the fewest colors possible.

Draw a graph to represent the map.

Give an algorithm for assigning colors to the nodes of the graph that would enable us to color any map so represented.

You may, of course, do the steps in any order you like.

Exploring the Exercise

Here is my coloring for the map:

map colored with four hues

Question: What is the minimum number of colors needed to color this map?

Four. We can economize early in the labeling in order to save colors. After coloring A blue and C green, I colored both B and E red. Alas, D and F need different colors -- they both touch a red state and a green state, but also each other!
It has been proven that four is enough colors for all planar maps.

My graph representing the map looks like this:

a graph to represent the map above

In this graph, each node corresponds to a state on the map. There is an edge between two nodes if the corresponding states touch one another. Whenever you create a graph to model a problem, be sure you understand what each node and edge means.

This graph captures exactly the same "neighbor" relationships found in the map. So, four colors is all we need to color the graph, too. In general, though, four is not enough colors for all graphs, because a graph can have a higher connectivity than regions in a plane.

Finally, here is an algorithm for coloring a graph:

Let C = V.

While C is not empty,
1. Pick a vertex v from C.
2. Give v a color different from any of v's neighbors' colors. Use a new color only if necessary.
3. Remove v from C.

Notice...

Representing the map as a graph gives us a vocabulary for describing the process more clearly and with less ambiguity...
Other parts of the original problem, such as the colors, also take on formal representations. The colors become labels on the vertices.

Question: Is there any advantage to choosing the vertices in a particular order?

As written, this algorithm is nondeterministic. It does not specify the order in which we are to select nodes to be colored. As a result, it could have different behaviors for different executions.
For this graph, there is no advantage to choosing the vertices in a particular order. It requires four colors, no matter how we color them.

Can you design a graph where choosing the wrong vertex first matters?

Sometimes, processing vertices with higher degree first can help. The degree of a vertex is the number of edges entering or exiting the vertex. In the map above, vertex C has a degree of 5, and the remaining vertices have a degree of 3.

Question: What is the time complexity of the algorithm above?

Let D = maximum degree d(v) of any vertex v.

It makes n = |V| passes through the loop.

It makes d(v) passes on the Step 2 inner loop.

So, O( nD ). Assuming no "loops", or edges of the form (x,x), the worst case for D is n-1, which gives a complexity of O( n(n-1) ) → O( n² ).

Graph coloring is one of the classic graph problems. (Which kind of problem is it?) It has important applications in scheduling and other resource allocation problems.

Basics of Describing Complexity

The primary goal of algorithm analysis is to describe how much of a resource the algorithm uses. Different resources are important in different problem domains, but generally we will concern ourselves with the most broadly important: time and space. Time is the quintessential limiting resource. Space also limits many algorithms in fundamental ways, though as technology develops the scale of space's limitations changes.

The act of analyzing an algorithm requires that we find a way to measure the use of the resource in a general way. We then cast this measurement in terms of how much resource usage grows as the size of the problem instance grows. For example,

in the End Game, the length of the list of numbers
in our game scoring exercise, the total number of points scored

O, pronounced Big Oh, expresses an upper bound. It is a function that bounds the growth of the resources used from above. We ignore constants and lower-order terms because, as n grows, the highest-order power "dominates".

Ω, or Omega, expresses a lower bound. It is a function that bounds the growth of the resources used from below.

Θ, or Theta, is a function that combines Big Oh and Omega. It bounds resource usage of the algorithm from above and below using the same function, though perhaps with different constants.

a graph bounded from above and below

Questions: Why would we want to know Big Oh? Ω? Θ?

Question: How do we show that 4n² + 6n - 4 is Θ(n²)?

    10n² ≥ (4n² + 6n + 4) for all n ≥ 2

    (4n² + 6n + 4) ≥ 4n² for all n ≥ 0

The 2 and 0 at the ends of these statements are the n₀ you see in textbook definitions. They show that, once the size of the problem gets big enough, the algorithm's fundamental performance characteristics determine the consumption of the resource more than any external factors.

Question: Why do we care about the values of n₀ on these definitions?

If the algorithm will be applied only or primarily to large problem instances, then n₀ tells us what counts as "large". For n > n₀, the bounds are meaningful. In some domains, problem instances are always large, or are usually large, or may be large...

If the algorithm will never be applied to "large" problems, where n₀ tells us what counts as "large", then the bounds are not meaningful. An algorithm with a nominally worse bounding function may perform better on such small data sets!
Can you think of an example? (Files in a directory. Names on a class list.)

Know your problem domains. Know your implementations.

A Counting Exercise

You own five pairs of socks, one per day. You do your laundry on the weekend to get ready for the next week. One weekend at the laundromat, you lose two socks.

Your job:

What's the best case scenario? 4 complete pairs
What's the worst case scenario? 3 complete pairs
What's the average case? ...

There are 10[C]2 = 45 possible outcomes choosing 2 socks from 10. There are only five outcomes that are best-case, which gives a probability of 1/9. The only other possibility is the worst case, with a probability of 8/9. So the "expected value" of the number of complete pairs is (1/9)*4 + (8/9)*3 = 3 1/9.

Wow. Discrete Structures matters. Probability and other math help, too.

Interlude: Strange Behavior

Some algorithms perform differently than you might expect under certain circumstances. Knowing these about an algorithm can make a big difference in performance.

Quicksort is one of the best sorts, given its O(n log n) performance, low constants c and n₀, ease of understanding, and relative ease of implementation.

For most cases of most problems, it vastly outperforms all but much more complex algorithms.

But if you give Quicksort an already-sorted or nearly-sorted input, its behavior degrades to O(n²) rapidly.

Understand the algorithms you study.

Basics of Analyzing Complexity

Find the basic operation: the one that is performed most often, or the one that dominates the algorithm's resource usage for some other reason, such as the underlying implementation. (Example: RAM versus file system.)

Often, this is straightforward. Consider this simple sequential search algorithm:

    search(list L, item T)

    1.  for i = 1 to n
        a.  if L[i] == T return i
    2.  fail

The basic operation here is the comparison L[i] == T. It determines whether the algorithm stops or not. This algorithm could run 0 times or n = |L| times, depending on the presence and position of T.

Some common basic operations when analyzing algorithms are:

comparisons
swaps
multiplications and other arithmetic operations
assignments

Some sorting algorithms have differently shaped complexity curves for comparisons and swaps, so they are best compared using both metrics.

Know your problem domain.

References

Recall the graph challenge above: Can we design a graph where choosing to color the wrong vertex first gives a less than optimal result?

A bad case for this algorithm is a bipartite graph -- a graph in which the vertices can be partitioned into two subsets where all edges are between the subsets. A greedy coloring of a bipartite graph can give especially bad behavior. The best way to color the graph is to give the same color to all vertices in each subset, resulting in using only two colors. My algorithm can give such a coloring if it selects the vertices in the right order. If it selects them in the wrong order, it can use |V|/2 colors!

Wrap Up

Reading -- Follow the links scattered throughout the notes above. Ask questions!

Homework -- Homework 1 is due today.

Eugene Wallingford ..... wallingf@cs.uni.edu ..... January 30, 2014