CS 3140 Session 7

Session 7

An Introduction to Relational Algebra

CS 3140
Database Systems

Opening Exercise

Debugged instances of the Movie relation from Exam 1, Problem 8 are tables containing the favorite movies of Alice and Kurt:

   name                 year   length   genre
   --------------------------------------------     Alice
a1 The Big Chill        1983   105      drama
a2 Blade Runner         1982   117      scifi
a3 The Princess Bride   2020   98       fantasy
a4 Lethal Weapon        1987   109      action
a5 Black Panther        2018   134      action
a6 The Princess Bride   1987   98       fantasy
a7 The Replacements     2000   118      comedy
a8 Star Trek            1979   132      scifi
   --------------------------------------------     Kurt
k1 Black Panther        2018   134      action
k2 The Princess Bride   1987   98       fantasy
k3 The Replacements     2000   118      comedy
k4 Star Trek            1979   132      scifi
k5 Lethal Weapon        1987   109      action
k6 Return of the Jedi   1983   131      scifi
k7 48 Hrs.              1982   96       comedy
k8 The Big Chill        1983   105      drama

Identify...

the movies that both Alice and Kurt like
the movies that Kurt likes, but not Alice
the genres that Alice likes
the movies longer than two hours that Kurt likes

You can use the tuple names for 1, 2, and 4.

Opening Solution

The movies that both Alice and Kurt like.
a1 and a4 through a8 are on both lists.

The movies that Kurt likes, but not Alice.
With so many favorites in common, there are only two that each likes but not the other. k6 and k7 are unique to Kurt.

The genres that Alice likes.
There are five: drama, scifi, fantasy, action, and comedy.

The movies longer than two hours that Kurt likes.
Only three: k1, k4, and k6.

Could we use SQLite to help us find our answers?

To find out, I created a database containing the two tables, using this SQL script.

Even with our limited experience, we can write queries to solve the third and fourth problems. For #4:

    sqlite> SELECT * FROM Kurt WHERE length > 120;
    Black Panther|2018|134|action
    Star Trek|1979|132|scifi
    Return of the Jedi|1983|131|scifi

If we want only the names, we can select only the names:

    sqlite> SELECT name FROM Kurt WHERE length > 120;
    Black Panther
    Star Trek
    Return of the Jedi

For #3, we can select the genre from Alice's set without a condition:

    sqlite> SELECT genre from Alice;
    drama
    scifi
    fantasy
    action
    action
    fantasy
    comedy
    scifi

That doesn't seem right... Relations are sets. When we select the genres for Alice, we should get a set, but we don't. This is one of those places where SQL does not implement the relational algebra correctly. We need to use a new modifier, DISTINCT:

    sqlite> SELECT DISTINCT genre from Alice;
    drama
    scifi
    fantasy
    action
    comedy

We'll occasionally learn a little new SQL because SQL does not behave the way we expect it to. Welcome to programming.

Where Are We?

Last week, we reviewed the basics of SQL and SQLite, which are the tools we will use to implement our database models in executable code. Then we learned the basic terms of relational algebra, a special-purpose algebra that enables use to construct new relations from existing relations. It is the theory that underlies SQL, the language we will use to query our relational databases.

Today we go a bit deeper with relational algebra, exploring its operators and ways we can express them in SQL.

Review: The Operators of Relational Algebra

Relational algebra is limited yet surprisingly powerful language for expressing relational queries. Its limitations both increase programmer productivity the run-time performance of our database management systems. It consists of a set of set operations over the domain of relations, which are a particular kind of set. We will use these operations to construct simple and compound expressions known as queries.

Last time, we saw that there are four broad classes of relational operators:

basic set operators
operators that remove parts of a relation
operators that combine two relations
an operator that renames objects

Let's consider each in more detail and begin to think about how the help us express queries.

Basic Set Operators

We learn three basic set operations in grade school math:

R ⋃ S : union, the set of all items that are in R, in S, or both
R ⋂ S : intersection, the set of all items that are both in R and in S
R – S : difference, the set of items that are in R but not in S

Consider these two small instances of the Movie relation:

    name                 year   length   genre
    --------------------------------------------     R
    The Big Chill        1983   105      drama
    Blade Runner         1982   117      scifi
    --------------------------------------------     S
    Blade Runner         1982   117      scifi
    The Princess Bride   2020   98       fantasy

What are (R ⋃ S), (R ⋂ S), (R – S), (S – R)?

The first two questions on the opening exercise involve these basic operations: intersection and difference, respectively. What question would a union answer?

Perhaps: the movies liked by either Alice or Kurt. This is the home library containing everyone's favorites.

Operations that Remove Information

Relations are two-dimensional tables. Rows are tuples, and columns are attributes. The two basic ways to remove information from a relation are to eliminate rows or eliminate columns.

In relational algebra, projection takes one relation and returns a new relation containing a subset of its columns. We use lowercase pi, π, as the projection operator:

     π_a1,...(R)

For example, using the relations R and S above, π_name,length(R) produces:

    name                 length
    ---------------------------
    The Big Chill        105   
    Blade Runner         117

The third question on the opening exercise asked for a projection: π_genre(Alice). In the relational algebra, all relations are sets. The result is {drama, scifi, fantasy, action, comedy}.

Selection is an operation that takes a relation R and returns a new relation containing a subset of its rows. The tuples in the new relation satisfy some condition C involving R's attributes.

We use lowercase sigma, σ, as the selection operator:

     σ_C(R)

C is a conditional expression of the sort you use in traditional programming languages. It returns true or false when applied to a tuple. In relational algebra, a condition can refer only to constants or to attributes in the relation.

For example, using the relations R and S above, σ_{genre='scifi'}(S) produces:

    name                 year   length   genre
    --------------------------------------------
    Blade Runner         1982   117      scifi

The same selection applied to the Alice relation from our exercise returns a relation with two tuples:

    name                 year   length   genre
    --------------------------------------------
    Blade Runner         1982   117      scifi
    Star Trek            1979   132      scifi

The fourth question on the opening exercise asked for a selection: σ_length>120(Kurt).

We can also write compound conditions. To find the set of scifi movies longer then two hours that Kurt likes, we could say σ_{length>120 AND genre='scifi'}(Kurt).

The little bit of SQL we learned last week gave us tools for implementing simple projections and selections:

To project, we can write SELECT attribute with no WHERE clause.
To select, we can write SELECT * WHERE and express the condition C as the condition on the WHERE clause.

We will be extending our knowledge of how to express relational algebra using SQL in the coming sessions. We already extended it once today: to ensure that a projection returns a set, we need to use the DISTINCT clause.

Operations that Combine Relations

Back in school, you may have learned another set operation that combines two sets: R ⨯ S, the product of two sets. This is also called the "Cartesian product" or the "cross product". R ⨯ S creates a set of all possible pairs where the first item in the pair is in R and the second item is in S. For example, the product of {a, b} and {c, d, e} is {(a,c), (a,d), (a,e), (b,c), (b,d), (b,e)}.

In the relational algebra, the sets are relations and work much the same. Rather than create an ordered pair, though, the product creates a longer tuple, containing all the attributes of R and all the attributes of S.

To see how this works, consider these relations:

    A B C           C D
    -----  R        ---  S
    1 2 3           7 8
    4 5 6           3 9 
                    6 0

The result of R ⨯ S will be this relation:

    A    B   R.C  S.C   D
    ---------------------  R ⨯ S
    1    2    3    7    8
    1    2    3    3    9
    1    2    3    6    0
    4    5    6    7    8
    4    5    6    3    9 
    4    5    6    6    0

Note that this relation pairs each tuple from R with every tuple from S. Both R and S have an attribute named C, so the relation has to invent new names. We will usually disambiguate conflicts of this sort with member-style dot notation.

Note also that this relation has six tuples and five attributes. There are two tuples in R paired with each of the three tuples in S, for |A| * |B| = 6 rows. R has three attributes and S two, which gives us |A attributes| + |B attributes| columns.

We saw a free-range example of the product at the end of Week 5's second session. Here is a simpler version:

    sqlite> SELECT * FROM R, S;
    1    2    3    7    8
    1    2    3    3    9
    1    2    3    6    0
    4    5    6    7    8
    4    5    6    3    9 
    4    5    6    6    0

We'll have to create SQL tables for R and S to see this result!

We could do the same thing with Alice and Kurt's favorite movies:

    sqlite> SELECT * FROM Alice, Kurt;
    The Big Chill|1983|105|drama|Black Panther|2018|134|action
    The Big Chill|1983|105|drama|The Princess Bride|1987|98|fantasy
    ...
    Star Trek|1979|132|scifi|48 Hrs.|1982|96|comedy
    Star Trek|1979|132|scifi|The Big Chill|1983|105|drama

That's a long list. How many tuples does the new relation contain? How many attributes?

We will learn about other ways to combine relations in the next few sessions. You will learn about an expecially valuable one in this week's online session: the natural join.

Summary: Expressing Relational Algebra Operators in SQL

At this point, we have seen simple "atoms" of SQL for expressing three of the relational algebra operators we learned today:

    π_a1,...(R)   ~= SELECT a1,... from R;
    σ_C(R)       ~= SELECT * FROM R WHERE C;
    R ⨯ S       ~= SELECT * FROM R, S;

What about those basic set operators, union, intersection, and difference? SQL has specific operators for all three!

It would be nice if it were as simple as Alice ⋂ Kurt, but SQL doesn't allow us to refer to relations that way. We have to use SELECT expressions to create a relation to operate on:

    sqlite> SELECT * FROM Alice
            UNION
            SELECT * FROM Kurt;
    48 Hrs.|1982|96|comedy
    Black Panther|2018|134|action
    Blade Runner|1982|117|scifi
    Lethal Weapon|1987|109|action
    Return of the Jedi|1983|131|scifi
    Star Trek|1979|132|scifi
    The Big Chill|1983|105|drama
    The Princess Bride|1987|98|fantasy
    The Princess Bride|2020|98|fantasy
    The Replacements|2000|118|comedy

    sqlite> SELECT * FROM Alice
            INTERSECT
            SELECT * FROM Kurt;
    Black Panther|2018|134|action
    Lethal Weapon|1987|109|action
    Star Trek|1979|132|scifi
    The Big Chill|1983|105|drama
    The Princess Bride|1987|98|fantasy
    The Replacements|2000|118|comedy

The operator for set difference draws more on natural language mathematical terminology:

    sqlite> SELECT * FROM Alice
            EXCEPT
            SELECT * FROM Kurt;
    Blade Runner|1982|117|scifi
    The Princess Bride|2020|98|fantasy

    sqlite> SELECT * FROM Kurt
            EXCEPT
            SELECT * FROM Alice;
    48 Hrs.|1982|96|comedy
    Return of the Jedi|1983|131|scifi

So:

    π_a1,...(R)   ~= SELECT a1,... from R;
    σ_C(R)       ~= SELECT * FROM R WHERE C;
    R ⨯ S       ~= SELECT * FROM R, S;

    R ⋃ S       ~= SELECT * FROM R  UNION      SELECT * FROM S;
    R ⋂ S       ~= SELECT * FROM R  INTERSECT  SELECT * FROM S;
    R – S       ~= SELECT * FROM R  EXCEPT     SELECT * FROM S;

We won't use some of these expressions often, but perhaps they will help you as you learn and use relational algebra.

Eugene Wallingford ..... wallingf@cs.uni.edu ..... September 29, 2020