Jump to Block: (About) 01 02 03 04 05 06 07 08 09 10 11 12
11 Parallel Infrastructure and Spark
In this block we cover:
- Big Data
- Streaming
- Hadoop Distributed file system (HDFS)
- Hadoop MapReduce
- Spark overview
- Resilient Distributed Datasets (RDDs)
- Spark
- Accessing Spark through pyspark
Lectures
- Parallel data with MapReduce and Spark
Worksheets:
Workshop:
- The workshop this week involves considerable setup, and the use of BlueCrystal Phase 4 (or setting up your own environment…).
- You are advised to do this first - it is discussed in the first video.
- There are two ways to achieve the learning. The first is by getting Spark working on your local machine so that you can access
- All Workshop Content is accessed via the GitHub repository:
- Video Lectures:
References
- General parallel algorithms:
- Map Reduce
- Spark: