Jump to Block: (About) 01 02 03 04 05 06 07 08 09 10 11 12 (Assessments)
09 Algorithms for Data Science
In this block we cover:
- Algorithmic Complexity
- Big Oh/ Omega/ Theta notation
- Time complexity
- Space complexity
- Communication complexity
- Analysing Algorithms
- Approximate Algorithms
- Complexity classes
- Turing Machines
- P vs NP
- Important dynamic data structures, including:
- Stacks
- Queues
- Linked Lists
- Binary Trees
- Hash Tables
- Heaps
- Important Algorithms:
- Sampling
- Hash functions
- Filtering
- Bloom Filter
- Sketching (count-min-sketch)
- MinHash
Lectures
- 09.1 Analysing Algorithms: Complexity, Algorithms and Turing Machines
- 09.2 Algorithms for Data Science
Workshop:
- 09.3 Workshop on the Empirical evaluation of computational complexity
- Block09 on Noteable via Blackboard:
Assessment:
- Portfolio 09 of the full Portfolio.
References
General textbooks
- Cormen et al 2010 Introduction to Algorithms is very accessible and is the recommended starting point, especially for Data Structures.
- Arora and Barak 2007 Computational Complexity: A Modern Approach is also useful but more advanced.
Communication Complexity
- Toniann Pitassi Lecture on Communication Complexity: Applications and New Directions
- Raznorov 2015 Communication Complexity Lecture
- (Arora and Barak give it some time too)
Turing Machines and details on complexity
- Hopcroft and Ullman Introduction to Automata Theory, Languages, and Computation
- Specific algorithm
- Interesting further detail:
- Annie Raymond’s Lecture notes on bipartite matching
- Fan et al 2010 Graph Pattern Matching: From Intractable to Polynomial Time
- Stack Exchange Polynomial Time algorithms with huge exponent
Algorithms for Big Data
- The Mining of Massive Datasets book and course.
- Open DSA - Data Structures and Algorithms
- Bill Mill’s excellent Bloomfilter practical
- Chris McCormick’s Minhash tutorial
- Risto Tuomainen Data Sampling for Big Data, covering sampling, filtering, sketching, etc.
- Streaming histogram implementation
- Python inplementation of Count Min Sketch by Rafael Carrascosa (part of PyPI)
- Leo Martel notes on Streaming Data Algorithms which is notes on the paper
- Cormode’s notes on Count-Min Sketch
- Chakrabarti’s Lecture Notes on Data Stream Algorithms
- Broder & Mitzenmacher “Network Applications of Bloom Filters: A Survey” (2003) Internet Mathematics 1:485-509
- Geravand & Ahmadi “Bloom filter applications in network security: A state-of-the-art survey” (2013) Computer Networks 57:4047-4064
- Goyal, Daume & Cormode “Sketch Algorithms for Estimating Point Queries in NLP” (2012) Proc. EMNLP.