Название: Data Science for Public Policy Автор: Jeffrey C. Chen, Edward A. Rubin Издательство: Springer Серия: Springer Series in the Data Sciences Год: 2021 Страниц: 365 Язык: английский Формат: pdf (true) Размер: 19.6 MB
This textbook presents the essential tools and core concepts of data science to public officials, policy analysts, and economists among others in order to further their application in the public sector. An expansion of the quantitative economics frameworks presented in policy and business schools, this book emphasizes the process of asking relevant questions to inform public policy. Its techniques and approaches emphasize data-driven practices, beginning with the basic programming paradigms that occupy the majority of an analyst’s time and advancing to the practical applications of statistical learning and machine learning. The text considers two divergent, competing perspectives to support its applications, incorporating techniques from both causal inference and prediction. Additionally, the book includes open-sourced data as well as live code, written in R and presented in notebook form, which readers can use and modify to practice working with data.
What do programming languages mean for data science? Data science tends to have the greatest payoff when learning patterns at a large scale. To achieve broad impact, programming is a must. Two languages have proven themselves to be the workhorses of data science: R and Python. R is a common language among statisticians, epidemiologists, social scientists, and natural scientists as it is designed for mathematical computation. The language is optimized specifically for all things math and statistics, ranging from matrix algebra to more complex optimization. The implications are that the language is quite extensible. In fact, thousands of code libraries—sets of functions that work together—extend R’s usefulness from visualizations to machine learning. The language does not naturally lend itself to creating standalone web applications, but recent advancements have vastly improved R’s ability to support production-grade use cases.
Which one language should a beginner choose first? The decision rests on the tasks you need to accomplish. For settings where data informs strategy—identifying which levers can be pulled to affect change, learning R is a sound choice. It is more than able to conduct data analyses, build sophisticated prediction models, visualize insights, and develop interactive tools. For cases where data-driven applications need to be scaled to a large number of users—like streaming media, social media ads, or financial services—starting with Python makes more sense.
Contents:
1. An Introduction 2. The Case for Programming 3. Elements of Programming 4. Transforming Data 5. Record Linkage 6. Exploratory Data Analysis 7. Regression Analysis 8. Framing Classification 9. Three Quantitative Perspectives 10. Prediction 11. Cluster Analysis 12. Spatial Data 13. Natural Language 14. The Ethics of Data Science 15. Developing Data Products 16. Building Data Teams
Скачать Data Science for Public Policy
|