Название: Spark in Action, Second Edition (Final) Автор: Jean-Georges Perrin Издательство: Manning Publications Год: 2020 Формат: true pdf Страниц: 577 Размер: 16.7 Mb Язык: English
Spark in Action, Second Edition is an entirely new book that teaches you everything you need to create end-to-end analytics pipelines in Spark. Rewritten from the ground up with lots of helpful graphics, you’ll learn the roles of DAGs and dataframes, the advantages of “lazy evaluation”, and ingestion from files, databases, and streams. By working through carefully-designed Java-based examples, you’ll delve into Spark SQL, interface with Python, and cache and checkpoint your data. Along the way, you’ll learn to interact with common enterprise data technologies like HDFS and file formats like Parquet, ORC, and Avro. You’ll also discover interesting Spark use cases, like interactive reporting, machine learning pipelines, and even monitoring players in online games. You’ll even get a quick look at machine learning techniques you can apply without a PhD in mathematics! All examples are available in GitHub for you to explore and adapt as you learn. The demand for Spark-savvy developers is so steep, they’re among the highest paid in the industry today! what's inside Lots of examples based in the Spark Java APIs using real-life dataset and scenarios Examples based on Spark v3.0 Ingestion through files, databases, and streaming Building custom ingestion process Querying distributed datasets with Spark SQL Deploying Spark applications Caching and checkpointing your data Interfacing with data scientists using Python Applied machine learning Spark use cases including Lumeris, CERN, and IBM