MLOps with Ray: Best Practices and Strategies for Adopting Machine Learning OperationsКНИГИ » ПРОГРАММИНГ
Название: MLOps with Ray: Best Practices and Strategies for Adopting Machine Learning Operations Автор: Hien Luu, Max Pumperla, Zhe Zhang Издательство: Apress Год: 2024 Страниц: 342 Язык: английский Формат: pdf (true), epub (true) Размер: 11.8 MB
Understand how to use MLOps as an engineering discipline to help with the challenges of bringing Machine Learning models to production quickly and consistently. This book will help companies worldwide to adopt and incorporate Machine Learning into their processes and products to improve their competitiveness.
The book delves into this engineering discipline's aspects and components and explores best practices and case studies. Adopting MLOps requires a sound strategy, which the book's early chapters cover in detail. The book also discusses the infrastructure and best practices of Feature Engineering, Model Training, Model Serving, and Machine Learning Observability. Ray, the open source project that provides a unified framework and libraries to scale Machine Learning workload and the Python application, is introduced, and you will see how it fits into the MLOps technical stack.
Machine Learning (ML) has proven to be a very powerful tool to learn and extract patterns from data. The ability to generate, store, and process a large amount of data, and easily access computing power in the last decade has contributed to many advancements in the ML field, such as image recognition, language translation, and large language models (LLMs), that is, BERT, DALLE, ChatGPT, and more.
Once data scientists have access to the needed dataset or available features, they will start analyzing them and evaluating whether they are suitable for the ML task at hand. To perform medium- to large-scale data analysis, they will need access to compute resources beyond their laptop so those data crunching needs will be completed in a short amount of time. This is where distributed data computation engines come into the picture. Examples of these engines are Apache Spark, Dask, and Ray. All three are popular distributed computing frameworks that are widely adopted for diverse data processing needs. Spark is the most mature one and has a large ecosystem. It provides a robust unified platform for large-scale data processing with high-level APIs in multiple languages, such as Java, Scala, and Python. Dask excels in parallel computing and seamlessly integrating with popular Python libraries, making it well suited for scalable and complex data computations. Ray, on the other hand, is a compute framework to enable efficient distributed execution of Python and AI workloads, boasting a simple programming model and automatic parallelization.
A few specific tools that are tremendously valuable to improve the model development experience are the TensorBoard from the TensorFlow project and the model agnostic methods to understand model interpretability, such as LIME, SHAP, and ICE Plots. TensorBoard is a widely popular and powerful visualization tool for machine learning experimentation provided by TensorFlow, an open source Machine Learning project developed by Google. Many popular machine learning libraries now provide integration with TensorBoard, such as Pytorch, XGBoost, Ray Train, HuggingFace, and more.
This book is intended for Machine Learning practitioners, such as machine learning engineers, and data scientists, who wish to help their company by adopting, building maps, and practicing MLOps.
What You'll Learn: Gain an understanding of the MLOps discipline Know the MLOps technical stack and its components Get familiar with the MLOps adoption strategy Understand feature engineering
Who This Book Is For: Machine learning practitioners, data scientists, and software engineers who are focusing on building machine learning systems and infrastructure to bring ML models to production
Скачать MLOps with Ray: Best Practices and Strategies for Adopting Machine Learning Operations