Host: Bristol Mathematics Lecturer: Dr Daniel Lawson

Data Science Toolbox

Coursebook index Timetable Home

Jump to Block: (About) 01 02 03 04 05 06 07 08 09 10 11 12 (Assessments)

12 Parallel Infrastructure and Spark

This Block is unassessed except where it overlaps with other blocks. You may find the Parallel Data lecture helpful for Block 10 on parallel algorithms.

In this block we cover:

Big Data
Streaming
Hadoop Distributed file system (HDFS)
Hadoop MapReduce
Spark overview
Resilient Distributed Datasets (RDDs)
Spark
Accessing Spark through pyspark

Lectures

Worksheets:

Worksheet 12.1 Parallel Infrastructure

Workshop:

The workshop this week involves considerable setup.
- You are advised to do this first - it is discussed in the first video.
All Workshop Content is accessed via the GitHub repository:
- 12.2 Python Notebook: Workshop on Coding Parallel Algorithms
Workshop Videos:

References

General parallel algorithms:
- Streaming and Sketching
- Parallel algorithms for dense matrix multiplication
Map Reduce
References:
Laws governing data science:
Privacy:
- The Algorithmic Foundations of Differential Privacy by Dwork and Roth (2014).
- ONS policy on disclosure control.
- Sweeney 1997. Weaving technology and policy together to maintain confi-dentiality. Journal of Law, Medicines Ethics, 25:98–110.
- Narayanan and Shmatikov 2008. Robust de-anonymization of largesparse datasets (how to break anonymity of the netflix prize dataset). IEEE Sec. and Priv.
- Statistical Disclosure Attacks by George Danezis.
Interpretability and fairness:
- Book: “Interpretable Machine Learning: A Guide for Making Black Box Models Explainable.” Christoph Molnar 2019.
- Algorithmic Bias Tutorial by Francesco Bonchi with Slides from KDD 2016
- Hardt, Price and Srebo Equality of Opportunity in Supervised Learning 2016 explored in https://blog.acolyer.org/2018/05/07/equality-of-opportunity-in-supervised-learning/.
Spark:

Previous: Block 11.

12 Parallel Infrastructure and Spark

Lectures

Worksheets:

Workshop:

References

References: