Jump to Block: (About) 01 02 03 04 05 06 07 08 09 10 11 12
03 Latent Structures, PCA, and Clustering
In Block 3 we cover:
- Motivation for latent structures
- Principal Components Analysis
- How to calculate PCA
- What PCA is good for
- Relationship to SVD and other Spectral Embeddings
- Clustering
- Algorithmic Clustering
- Hierarchical Clustering
- Model based clustering
- Implementations in R:
- Spectral Clustering as a pipeline element for classification of Cyber security data
Lectures:
- 3.1 Latent Structures and PCA (41:12)
- 3.2.1 Clustering Part 1 (32:34)
- 3.2.2 Clustering Part 2 (34:32)
Worksheets:
Workshop:
Assessments:
- Assessment 1 will be set in this week; see Assessments. This is a summatieve assessment (i.e. does contribute to your grade) and will be due in Week 12.
Reference material:
For PCA:
- Cosma Shalizi’s Advanced Data Analysis, Lecture 18
- Boyd and Vandenberghe: Convex Optimization is an excellent and thorough resource.
- I showed Kalman: Leveling with Lagrange: An Alternate View of Constrained Optimization
For Clustering:
- Tibsherani’s Data Mining lecture notes (Lecture 2 and Lecture 5)
- 5 clustering algorithms you need to know
- The fastcluster packages for R and python implements “fastest” \(O(N^2)\) versions of hierarchical clustering.
- Python resources comparing hdbscan
- Scikit Learn Diagram