Jump to Block: (About) 01 02 03 04 05 06 07 08 09 10 11 12
04 Non-parametrics and Missing Data
In Block 4 we cover:
- Non-parametric statistics:
- Transforms and their uses:
- Fourier Transform
- Hadamard Transform
- Kernel Density Estimation
- k-Nearest Neighbour Density Estimation
- The Kernel Trick
- Kernel PCA as an example of the Kernel Trick
- Transforms and their uses:
- Handling Outliers and Missing Data:
- Outlier detection and/or removal
- Robust algorithms
- Classes of Missing data
- Approaches for filtering data based on missingness
- Approaches for imputing missing data
- How to undersstand the consequences of these choices
Lectures:
- Non-parametric Statistics:
- Missing Data:
Worksheets:
Workshop:
Reference material:
Non-parametric Statistics:
- Transforms:
- Nonparametric Statistics by Eduardo García Portugués
- Basis Expansions: Chapter 5 of The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Friedman, Hastie and Tibshirani).
- Density Estimation:
- Kernel Smoothing: Chapter 6 of The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Friedman, Hastie and Tibshirani).
- For kNN Yen-Chi Chen’s notes on kNN and the Basis
- The Kernel Trick and its applications:
- For the Kernel Trick Dave Krebs’ Intro to Kernels
- For the Kernel PCA: Rita Osadchi’s Kernel PCA notes
- Hofmann, Schoelkopf, & Smola (2008) “Kernel Methods in Machine Learning” (Ann. Stat.)
- Schoelkopf B., A. Smola, K.-R. Mueller (1998) “Nonlinear component analysis as a kernel eigenvalue problem”.
- Outlier detection:
- “A Survey of Outlier Detection Methodologies” by Victoria Hodge & Jim Austin, Artificial Intelligence Review 22:85–126 (2004).
- Outlier Analysis by Charu C. Aggarwal. NB: Not freely available.
- Chapter 10 of The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Friedman, Hastie and Tibshirani) discusses the robustness to outliers for various methods.
Missing data:
- Chapter 9.6 of The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Friedman, Hastie and Tibshirani).
- I would recommend Andrew Gelman’s Missing Data Notes.