Dataproc Cookbook: Running Spark and Hadoop Workloads in Google CloudКНИГИ » ОС И БД
Название: Dataproc Cookbook: Running Spark and Hadoop Workloads in Google Cloud Автор: Narasimha Sadineni, Anuyogam Venkataraman Издательство: O’Reilly Media, Inc. Год: 2025 Страниц: 500 Язык: английский Формат: epub Размер: 19.1 MB
Want to build Big Data solutions in Google Cloud? Dataproc Cookbook is your hands-on guide to mastering Dataproc and the essential GCP fundamentals—like networking, security, monitoring, and cost optimization--that apply across Google Cloud services. Learn practical skills that not only fast-track your Dataproc expertise, but also help you succeed with a wide range of GCP technologies.
Written by data experts Narasimha Sadineni and Anu Venkataraman, this cookbook tackles real-world use cases like serverless Spark jobs, Kubernetes-native deployments, and cost-optimized data lake workflows. You'll learn how to create ephemeral and persistent Dataproc clusters, run secure data science workloads, implement monitoring solutions, and plan effective migration and optimization strategies.
The evolution of distributed systems for data processing has progressed from the constraints of single VMs, through the power of specialized Massively Parallel Processing (MPP) systems, to the revolutionary breakthrough of Hadoop utilizing clusters of commodity hardware—a shift that fundamentally redefined the scale of data we could handle. Technologies like Apache Hadoop (MapReduce, HDFS, Hive) allowed us to tackle data problems at a scale previously unimaginable, and to do so within practical time frames. Spark, with its in-memory processing capabilities, pushed the boundaries even further, enabling large-scale data operations in mere seconds.
Google Cloud Dataproc sits right at the heart of this exciting intersection. It provides a managed service designed to let you run your familiar Hadoop and Spark workloads (and other tools like Flink and Presto) seamlessly on GCP’s robust infrastructure. This means you can migrate existing applications with minimal-to-no code changes, shedding the burden of infrastructure management and focusing instead on extracting value from your data. Dataproc makes leveraging the power and flexibility of the cloud for big data workloads incredibly straightforward—and that’s something to be genuinely excited about! Until now, practical, consolidated resources beyond official documentation have been scarce, and this book aims to be your definitive guide. Packed with practical, tested recipes, it’s your go-to guide for exploring the real-world power of Dataproc. While Dataproc is our primary focus, the underlying Google Cloud fundamentals explored here—including resource organization, IAM, logging, monitoring, and security—provide valuable, transferable knowledge applicable across the GCP ecosystem. Let’s dive into harnessing the capabilities of Google Cloud Dataproc for your data.
Create Dataproc clusters on Compute Engine and Kubernetes Engine Run data science workloads on Dataproc Execute Spark jobs on Dataproc Serverless Optimize Dataproc clusters to be cost effective and performant Monitor Spark jobs in various ways Orchestrate various workloads and activities Use different methods for migrating data and workloads from existing Hadoop clusters to Dataproc
Who Should Read This Book: This is a handy cookbook on Dataproc that will help you accelerate your Hadoop migration and Dataproc learning journey and optimize your workloads. It is designed for data engineers, data scientists, cloud architects, and more:
Data engineers Professionals responsible for designing, building, and maintaining data processing pipelines using Dataproc. This book will help you learn about the various features, best practices, and optimization techniques for managing big data workflows.
Data scientists Researchers and analysts who work with large datasets and need to perform advanced analytics and machine learning tasks. This book will help you understand how to leverage Dataproc’s capabilities to process and analyze data effectively.
Cloud architects Professionals responsible for designing and implementing data processing solutions on Google Cloud Platform. This book will help you understand how to integrate Dataproc with other services and architectures to create scalable and efficient data processing systems.
Data analysts Individuals who work with data to derive insights and make informed business decisions. This book will help you learn how to leverage Dataproc’s capabilities to process and transform data for analysis and reporting.
Students and researchers People studying data engineering, data science, or related fields who want to gain a comprehensive understanding of data processing technologies and how to use Dataproc effectively.
IT managers and decision makers Executives and managers responsible for making decisions regarding data infrastructure and processing solutions. This book will help you understand the benefits, costs, and use cases of adopting Dataproc for your organization.
Скачать Dataproc Cookbook: Running Spark and Hadoop Workloads in Google Cloud