Debugged instances of the Movie relation from Exam 1, Problem 8 are tables containing the favorite movies of Alice and Kurt:
name year length genre -------------------------------------------- Alice a1 The Big Chill 1983 105 drama a2 Blade Runner 1982 117 scifi a3 The Princess Bride 2020 98 fantasy a4 Lethal Weapon 1987 109 action a5 Black Panther 2018 134 action a6 The Princess Bride 1987 98 fantasy a7 The Replacements 2000 118 comedy a8 Star Trek 1979 132 scifi -------------------------------------------- Kurt k1 Black Panther 2018 134 action k2 The Princess Bride 1987 98 fantasy k3 The Replacements 2000 118 comedy k4 Star Trek 1979 132 scifi k5 Lethal Weapon 1987 109 action k6 Return of the Jedi 1983 131 scifi k7 48 Hrs. 1982 96 comedy k8 The Big Chill 1983 105 drama
Identify...
Could we use SQLite to help us find our answers?
To find out, I created a database containing the two tables, using this SQL script.
Even with our limited experience, we can write queries to solve the third and fourth problems. For #4:
sqlite> SELECT * FROM Kurt WHERE length > 120;
Black Panther|2018|134|action
Star Trek|1979|132|scifi
Return of the Jedi|1983|131|scifi
If we want only the names, we can select only the names:
sqlite> SELECT name FROM Kurt WHERE length > 120;
Black Panther
Star Trek
Return of the Jedi
For #3, we can select the genre from Alice's set without a condition:
sqlite> SELECT genre from Alice;
drama
scifi
fantasy
action
action
fantasy
comedy
scifi
That doesn't seem right... Relations are sets. When we select
the genres for Alice, we should get a set, but we don't. This is
one of those places where SQL does not implement the relational
algebra correctly. We need to use a new modifier,
DISTINCT:
sqlite> SELECT DISTINCT genre from Alice;
drama
scifi
fantasy
action
comedy
We'll occasionally learn a little new SQL because SQL does not behave the way we expect it to. Welcome to programming.
Last week, we reviewed the basics of SQL and SQLite, which are the tools we will use to implement our database models in executable code. Then we learned the basic terms of relational algebra, a special-purpose algebra that enables use to construct new relations from existing relations. It is the theory that underlies SQL, the language we will use to query our relational databases.
Today we go a bit deeper with relational algebra, exploring its operators and ways we can express them in SQL.
Relational algebra is limited yet surprisingly powerful language for expressing relational queries. Its limitations both increase programmer productivity the run-time performance of our database management systems. It consists of a set of set operations over the domain of relations, which are a particular kind of set. We will use these operations to construct simple and compound expressions known as queries.
Last time, we saw that there are four broad classes of relational operators:
Let's consider each in more detail and begin to think about how the help us express queries.
We learn three basic set operations in grade school math:
Consider these two small instances of the Movie relation:
name year length genre
-------------------------------------------- R
The Big Chill 1983 105 drama
Blade Runner 1982 117 scifi
-------------------------------------------- S
Blade Runner 1982 117 scifi
The Princess Bride 2020 98 fantasy
What are (R ⋃ S), (R ⋂ S), (R – S), (S – R)?
The first two questions on the opening exercise involve these basic operations: intersection and difference, respectively. What question would a union answer?
Perhaps: the movies liked by either Alice or Kurt. This is the home library containing everyone's favorites.
Relations are two-dimensional tables. Rows are tuples, and columns are attributes. The two basic ways to remove information from a relation are to eliminate rows or eliminate columns.
In relational algebra, projection takes one relation and returns a new relation containing a subset of its columns. We use lowercase pi, π, as the projection operator:
πa1,...(R)
For example, using the relations R and S above, πname,length(R) produces:
name length
---------------------------
The Big Chill 105
Blade Runner 117
The third question on the opening exercise asked for a projection: πgenre(Alice). In the relational algebra, all relations are sets. The result is {drama, scifi, fantasy, action, comedy}.
Selection is an operation that takes a relation R and returns a new relation containing a subset of its rows. The tuples in the new relation satisfy some condition C involving R's attributes.
We use lowercase sigma, σ, as the selection operator:
σC(R)
C is a conditional expression of the sort you use in traditional programming languages. It returns true or false when applied to a tuple. In relational algebra, a condition can refer only to constants or to attributes in the relation.
For example, using the relations R and S above, σgenre='scifi'(S) produces:
name year length genre
--------------------------------------------
Blade Runner 1982 117 scifi
The same selection applied to the Alice relation from our exercise
returns a relation with two tuples:
name year length genre
--------------------------------------------
Blade Runner 1982 117 scifi
Star Trek 1979 132 scifi
The fourth question on the opening exercise asked for a selection: σlength>120(Kurt).
We can also write compound conditions. To find the set of scifi movies longer then two hours that Kurt likes, we could say σlength>120 AND genre='scifi'(Kurt).
The little bit of SQL we learned last week gave us tools for implementing simple projections and selections:
Back in school, you may have learned another set operation that combines two sets: R ⨯ S, the product of two sets. This is also called the "Cartesian product" or the "cross product". R ⨯ S creates a set of all possible pairs where the first item in the pair is in R and the second item is in S. For example, the product of {a, b} and {c, d, e} is {(a,c), (a,d), (a,e), (b,c), (b,d), (b,e)}.
In the relational algebra, the sets are relations and work much the same. Rather than create an ordered pair, though, the product creates a longer tuple, containing all the attributes of R and all the attributes of S.
To see how this works, consider these relations:
A B C C D
----- R --- S
1 2 3 7 8
4 5 6 3 9
6 0
The result of R ⨯ S will be this relation:
A B R.C S.C D
--------------------- R ⨯ S
1 2 3 7 8
1 2 3 3 9
1 2 3 6 0
4 5 6 7 8
4 5 6 3 9
4 5 6 6 0
Note that this relation pairs each tuple from R with every tuple
from S. Both R and S have an attribute named C, so the relation
has to invent new names. We will usually disambiguate conflicts
of this sort with member-style dot notation.
Note also that this relation has six tuples and five attributes. There are two tuples in R paired with each of the three tuples in S, for |A| * |B| = 6 rows. R has three attributes and S two, which gives us |A attributes| + |B attributes| columns.
We saw a free-range example of the product at the end of Week 5's second session. Here is a simpler version:
sqlite> SELECT * FROM R, S;
1 2 3 7 8
1 2 3 3 9
1 2 3 6 0
4 5 6 7 8
4 5 6 3 9
4 5 6 6 0
We'll have to create SQL tables for R and S to see this result!
We could do the same thing with Alice and Kurt's favorite movies:
sqlite> SELECT * FROM Alice, Kurt;
The Big Chill|1983|105|drama|Black Panther|2018|134|action
The Big Chill|1983|105|drama|The Princess Bride|1987|98|fantasy
...
Star Trek|1979|132|scifi|48 Hrs.|1982|96|comedy
Star Trek|1979|132|scifi|The Big Chill|1983|105|drama
That's a long list. How many tuples does the new relation contain?
How many attributes?
We will learn about other ways to combine relations in the next few sessions. You will learn about an expecially valuable one in this week's online session: the natural join.
At this point, we have seen simple "atoms" of SQL for expressing three of the relational algebra operators we learned today:
πa1,...(R) ~= SELECT a1,... from R;
σC(R) ~= SELECT * FROM R WHERE C;
R ⨯ S ~= SELECT * FROM R, S;
What about those basic set operators, union, intersection, and difference? SQL has specific operators for all three!
It would be nice if it were as simple as Alice ⋂ Kurt, but SQL doesn't allow us to refer to relations that way. We have to use SELECT expressions to create a relation to operate on:
sqlite> SELECT * FROM Alice
UNION
SELECT * FROM Kurt;
48 Hrs.|1982|96|comedy
Black Panther|2018|134|action
Blade Runner|1982|117|scifi
Lethal Weapon|1987|109|action
Return of the Jedi|1983|131|scifi
Star Trek|1979|132|scifi
The Big Chill|1983|105|drama
The Princess Bride|1987|98|fantasy
The Princess Bride|2020|98|fantasy
The Replacements|2000|118|comedy
sqlite> SELECT * FROM Alice
INTERSECT
SELECT * FROM Kurt;
Black Panther|2018|134|action
Lethal Weapon|1987|109|action
Star Trek|1979|132|scifi
The Big Chill|1983|105|drama
The Princess Bride|1987|98|fantasy
The Replacements|2000|118|comedy
The operator for set difference draws more on natural language
mathematical terminology:
sqlite> SELECT * FROM Alice
EXCEPT
SELECT * FROM Kurt;
Blade Runner|1982|117|scifi
The Princess Bride|2020|98|fantasy
sqlite> SELECT * FROM Kurt
EXCEPT
SELECT * FROM Alice;
48 Hrs.|1982|96|comedy
Return of the Jedi|1983|131|scifi
So:
πa1,...(R) ~= SELECT a1,... from R;
σC(R) ~= SELECT * FROM R WHERE C;
R ⨯ S ~= SELECT * FROM R, S;
R ⋃ S ~= SELECT * FROM R UNION SELECT * FROM S;
R ⋂ S ~= SELECT * FROM R INTERSECT SELECT * FROM S;
R – S ~= SELECT * FROM R EXCEPT SELECT * FROM S;
We won't use some of these expressions often, but perhaps they will help you as you learn and use relational algebra.