Topic 6C - Decision Trees

Learning Objectives

By the end of this topic, you should be able to:

Explain what a decision tree is, how it classifies data, and how it is built from labeled training examples.
Explain what overfitting is and why it is a problem in machine learning.

To help you meet the learning objectives, we have prepared three readings. Please complete them in order.

Reading 1 - What Is a Decision Tree? — the structure of a decision tree, how to read and trace one, and a fully worked classification example
Reading 2 - Building a Tree from Data — where decision trees come from, what makes a good split, and the overfitting problem
Reading 3 - Decision Trees in the World — where decision trees appear in real systems, their strengths and limits, and the bridge to supervised learning

These readings intentionally build on each other, so please complete them in order.

Review the Learning Objectives at the top of this page. The questions below will help you check your understanding before moving on to Topic 6D.

Consider a simple decision tree for recommending whether a student should attend after-school tutoring:
- Root split: Is the student's current grade below 70%?
- If Yes → Is the student currently missing more than two assignments?
  - If Yes → Recommend tutoring
  - If No → Schedule a check-in conversation
- If No → No action needed
Trace this tree for a student with a 65% grade and all assignments submitted. What recommendation does the tree produce? Does it seem reasonable?
What is the difference between an internal node and a leaf node in a decision tree? What kind of information does each one represent?
A decision tree produces a classification for every input, even inputs that look nothing like anything in its training data. Is that a strength or a weakness? Explain your reasoning.

A loan officer is building a decision tree to classify loan applications as approve or deny. The available attributes are: credit score, annual income, existing debt, employment status, loan amount, and loan purpose (car, home, personal). Which attribute would you expect to produce the best first split, and why? Which attribute would you expect to produce the worst first split?
A teacher builds a decision tree from three years of student data to predict which students will struggle on the final exam. The tree perfectly classifies every student in those three years. When applied to this year's students, it performs no better than a coin flip. What is the most likely explanation? What does this tell us about what "learning" means for an AI system?
In your own words, explain why a simpler decision tree often generalizes better to new data than a very complex one. How is this similar to the way a good teacher writes a test — testing understanding rather than memorized facts?

Credit scoring, medical diagnosis support, spam filtering, and student early warning systems all use decision trees or related structures. Pick one of these and describe what it is classifying, what attributes it likely uses to split, and what the leaf node outcomes represent.
Decision trees are an example of supervised learning. In one or two sentences, explain what "supervised" means in this context — what role do the labeled training examples play in producing the tree?

It is completely fine to revisit the readings as you work through these questions.

These optional topics go beyond the core learning goals but are rich avenues for deeper understanding.

Random forests
- A random forest builds many decision trees on random subsets of the training data and combines their votes. The result is dramatically more accurate and less prone to overfitting than any single tree — at the cost of being much harder to interpret.
Information gain and entropy
- The mathematical framework behind choosing the best split — measuring how much a question reduces uncertainty. These concepts come from information theory and underpin the ID3 and C4.5 algorithms that build decision trees automatically.
Decision trees in education technology
- Early warning systems, adaptive learning platforms, and graduation risk models all use decision trees or tree-based methods. The Week 5 SEC scenario about graduation risk scores connects directly here.
Explainable AI
- Decision trees are one of the few AI models that can explain their own reasoning in plain language. The emerging field of explainable AI asks how to give other, more powerful models the same property — a challenge that turns out to be surprisingly hard.