Jump to Block: (About) 01 02 03 04 05 06 07 08 09 10 11 12 (Assessments)
06 Decision Trees and Random Forests
In this block we cover:
- Decision Trees
- The Classification and Regression Tree (CART) approach
- Decision loss functions: ID3 vs Gini impurity
- Pruning trees to reduce overfitting
- Regression trees
- Random Forests
- Ensembles of trees
- Bagging features
- Forests vs Boosted Decision Trees
- Feature importance
Lectures:
Workshop:
The workshop is split into two sections. The first of these is in R, and generates the data (so you should run it first). The second of these in in Python and compares to the R content. Note that the content is exported to the DST github and the code below grabs it from there, so it is possible to run it out of order.
- 6.2.1 Workshop on Random Forests in R (.Rmd)
- 6.2.1 Workshop on Random Forests in R (.html)
- 6.2.2 Workshop on Random Forests in Python
Assessments:
- Portfolio 06 of the full Portfolio.
- Block06 on Noteable via Blackboard:
References:
- Tree methods:
- Chapter 9.2 of The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Friedman, Hastie and Tibshirani).
- Penn State U Applied Data Mining and Statistical Learning How to prune trees
- Decision Tree Algorithms: Deep Math ML
- Regression Trees:
- Karalic A, “Employing Linear Regression in Regression Tree Leaves” (1992) ECAI-92
- Boosted Decision Trees:
- J. Elith, J. Leathwick, and T. Hastie “A working guide to boosted regression trees” (2008). British Ecological Society.
- CART:
- CART = Classification and Regression Trees. Breiman, Leo; Friedman, J. H.; Olshen, R. A.; Stone, C. J. (1984). Classification and regression trees.
- Wei-Yin Loh’s 2011 Review is popular.
- ID3: Quinlan, J. R. 1986. Induction of Decision Trees. Mach. Learn. 1, 1 (Mar. 1986), 81-106.
- Random Forests:
- Chapter 15 of The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Friedman, Hastie and Tibshirani).
- Implement a Random Forest From Scratch in Python
- A Gentle Introduction to Random Forests at CitizenNet
- DataDive on Selecting good features
- Cosma Shalizi on Regression Trees
- Gilles Louppe PhD Thesis: Understanding Random Forests
- Kroese et al’s Data Science & Machine Learning free ebook looks pretty helpful.