|
|
"... This book will be a great resource for both readers looking to implement existing algorithms in a scalable fashion and readers who are developing new, custom algorithms using Spark. ..." Dr. Matei Zaharia Original Creator of Apache Spark FOREWORD by Dr. Matei Zaharia |
This directory contains all of the chapter codes for "Data Algorithms with Spark".
- Chapter 01: Introduction to Data Algorithms
- Chapter 02: Transformations in Action
- Chapter 03: Mapper Transformations
- Chapter 04: Reductions in Spark
- Chapter 05: Partitioning Data
- Chapter 06: Graph Algorithms
- Chapter 07: Interacting with External Data Sources
- Chapter 08: Ranking Algorithms
- Chapter 09: Fundamental Data Design Patterns
- Chapter 10: Common Data Design Patterns
- Chapter 11: Join Design Patterns
- Chapter 12: Feature Engineering in PySpark
The following directories are bonus chapters:
| Bonus Chapter | Description |
|---|---|
| Word Count | Provided multiple solutions for word count problem using reduceByKey() and groupByKey() reducers. |
| Anagrams | Find words, which are anagrams: provided multiple solutions for anagrams problem using reduceByKey(), groupByKey(), and combineByKey() reducers. |
| Lambda Expressions | How to use Lambda Expressions in PySpark programs |
| TF-IDF | Term Frequency - Inverse Document Frequency |
| K-mers | K-mers for DNA Sequences |
| Correlation | All vs. All Correlation |
mapPartitions() Transformation |
mapPartitions() Complete Example |
| UDF | User-Defined Function Example |
| DataFrames Transformations | Examples on Creation and Transformation of DataFrames |
| DataFrames Tutorials | DataFrames Tutorials: from collections and CSV text files |
| Join Operations | Examples on join of RDDs |
| PySpark Tutorial 101 | Examples on using PySpark RDDs and DataFrames |
| Physical Data Partitioning | Tutorial of Physical Data Partitioning |
| Monoid: Design Principle | Monoid as a Design Principle |