Session 3: An Introduction to HTML
HTML as a Markup Language
HTML is the first of three languages that we will learn this semester for creating web sites. It is the easiest of the three languages to learn, because at it has only a few rules and a manageable vocabulary. Most importantly, as its name says, it is a markup language, not a programming language.
Consider this simple file. We say that it is a plaintext file, a loose term for data that contains only characters that humans can read. If we look at it in a web browser, it does not appear to have any structure, and all the data appears the same.
If we "mark up" the text with some special characters, though, the new file looks different in the browser. Some of the text displays as larger, more important. Some text is in a bullet list, and some is italicized. The document has structure.
... download file or use devtools to see source code
This is the idea of a markup language: we add text to a file that is about the file, to communicate information beyond the file's content.
HTML is not the only markup language, nor was it the first or last. In the early 2000s, XML became a popular way to structure data for the web and even gave rise to a version of HTML known as XHTML. These days, perhaps the most common markup language other than HTML is Markdown.
Markdown is even more "human" than HTML. It uses a few simple
tokens such as #
to indicate headings and lets
people write text that looks almost like plaintext.
... create or open a file using Markdown
People have written computer programs to translate Markdown into other formats, including HTML. The popular online hosting site Github automatically renders Markdown into HTML for display. VS Code also knows how to render Markdown:
... use the shortcut shift‑cmd‑v
to see the rendered page
If there are simpler markup languages and tools that can generate HTML, why do we learn to write HTML? Because HTML is a more expressive way to describe the structure of a document. It balances power and simplicity. Other languages and tools are more limited in what they can do, because otherwise they would become too complex. More importantly, HTML drives the web, so a deep knowledge of HTML is essential to creating web pages and being able to read and modify existing code.
Most computer languages are formally defined, which mean there
is a set of rules that govern how the language works. These
rules are either decreed by an authority in charge of the
language or agreed upon by a community of users. HTML is a
communal standard guided by the World Wide Web Consortium (W3C)
and agreed upon by all the major tool creators. If you google
"HTML5 standard", you will find the standard online at
https://html.spec.whatwg.org
The standard states that every HTML document must include two required elements:
- a
<!doctype>
element - an
<html>
element
These elements are the backbone of the basic HTML5 web page from this week's reading.
The Vocabulary of HTML
As I said last week, we won't work through every line of the week's reading in class. Let's highlights some key points and expand on some practical matters of writing HTML code. We can use our opening HTML file of the day as a specimen.
The main unit of HTML mark up is the tag, a
named item between <
and >
characters. Some tags, such as <br>
, work
solo. Others, such as <h1>
, are paired with
a closing tag that contains a slash: </h1>
.
An element consists of everything from the
opening tag to the closing tag, including the tags themselves.
An element can include text, other elements, or both. In our
example, the <h1>
and
<li>
elements contain only text, while the
<body>
and <p>
elements
contain other elements.
An HTML element is text that has meaning, both structure and content. HTML is about structure, not about how a page looks (that's what CSS is for) or how it behaves (that's what JavaScript is for).
An attribute is a name or a name/value pair
inside a tag that provide information to specialize the tag. If
we wanted to be able to refer to the last line of the file as
naming the instructor of the course, we could add an "id"
attribute with a value of "instructor",
id="instructor"
, to the <p>
tag.
Some items cannot be written as they are. There may not be a
character on the keyboard, such as — (the em-dash) or the
© symbol. For others, there is a key, but the character means
something in HTML. Think about the "<" and "> characters
we use to write tags! If I write the characters
<symbol>
in my document, the browser will see
that as a tag named "symbol".
An entity is a construct to express a character
that cannot otherwise be expressed in HTML. We write entities
as &NAME;
, where the &
starts
the entity and the ;
ends it. For example, we can
write ©
for the © symbol, and
&
for the & symbol. To write the angle
brackets that delimit our tage, we use <
and
>
, respectively. In this way I can add
<symbol>
to my document using
<symbol>
.
I only know a few entities by heart, including the ones above. You can look them up when you need them.
Whitespace is the name for all the blank spaces, tabs, and newline characters in an HTML document. The browser treats all whitespace as a single character. We use whitespace in our HTML files to help people (including ourselves) read and understand the document. We should never use whitespace to format the document for display in the browser. That is the job of CSS.
Likewise, we use comments to make our HTML files more readable for humans. The browser ignores them entirely. We often use them to explain or set off a section of code.
You will find over time that even though the browser ignores whitespace and comments, we will want to use them judiciously to create HTML files that can be understood and modified over time.
A Basic HTML5 Document
Now that we've looked at the basic units out of which we build web pages, let's consider the structure of the basic HTML5 web page from the reading.
It contains the two elements required by the standard: a <doctype> tag and an <html> element. The latter is the root element of the page.
Notice the use of separate lines and indentation (whitespace) to communicate the document's structure to human readers. Even better, VS Code is savvy about HTML and can help us with indentation whenever we paste code or create new tags.
... show how VS Code can collapse multi-line elements...
The root element contains two sub-elements:
- <head> — contains information about the page. This is sometimes called metadata.
- <body> — contains the content of the page.
... review HTML vocabulary we just learned ...
The header defines the <title>
of the page,
which is used by the browser to label tabs. It also contains
metadata about the character set used in the file,
UTF-8. We
will include this metadata and a page-specific title in each of
our documents this semester.
The body here is simple, a single <h1>
element. Each page should have exactly one
<h1>
element. More useful web pages will
include more content in their bodies.
Creating a Simple HTML5 Document via Copy and Paste
I selected some text from a rendered web page (*), pasted it into a new "empty" file, and slowly turned it into a valid HTML document by adding tags and required elements. We saw how VS Code could help us with automatic indentation and completion of tags.
(*) say, from Session 3, the heading and first two paragraphs of The Vocabulary of HTML, or the text on this movie page!
Closing
Now that you have learned the basic vocabulary of HTML and a few of its tags, attributes, and entities, the best way to learn more is by reading and writing web pages. Look for web pages you like, and use devtools to examine the HTML source code. Check out the source code for the pages on the CS 1100 website: the session notes, the homework assignments, and any other pages you see. They demonstrate patterns of good HTML structure.
The reading for this week includes a Recommended Reading section. These are references, not textbooks. You don't need to read them like a text or a novel. Instead, browse the pages, familiarizing yourself with what's available and learning a few things along the way. You will want to refer back to these resources during the semester whenever you want to find a tag or attribute you need, or want to refresh your memory on how something works.
There are also links embedded in the readings and the session notes. You don't have to follow those; they are usually optional pointers to more detailed information about the topic.

Homework 1 is available and due on Friday. Its goal is to let you get experience using the various tools we need for the rest of the course. Let me know if you have any questions!
Bonus: We can add extensions to VS Code that make our editing experience even better. Click on the extensions icon in the sidebar (see image at right) to bring up the extensions panel.
Search for items you might like. Here are two useful extensions:
- Code Spell Checker, which checks spelling for both text and code
- Live Server, a web server that refreshes the page in the web browser automatically when you save the file in your editor