Jump to Block: (About) 01 02 03 04 05 06 07 08 09 10 11 12 (Assessments)
12 Parallel Infrastructure and Spark
This Block is unassessed except where it overlaps with other blocks. You may find the Parallel Data lecture helpful for Block 10 on parallel algorithms.
In this block we cover:
- Big Data
- Streaming
- Hadoop Distributed file system (HDFS)
- Hadoop MapReduce
- Spark overview
- Resilient Distributed Datasets (RDDs)
- Spark
- Accessing Spark through pyspark
Lectures
Workshop:
- The workshop this week involves considerable setup.
- You are advised to do this first - it is discussed in the first video.
- All Workshop Content is accessed via the GitHub repository:
- Workshop Videos:
References
- General parallel algorithms:
- Map Reduce
Worksheets (unassessed)
Navigation:
Previous: Block 11.