Название: Advances and Innovations in Statistics and Data Science Автор: Wenqing He, Liqun Wang, Jiahua Chen, Chunfang Devon Lin Издательство: Springer Серия: ICSA Book Series in Statistics Год: 2022 Страниц: 338 Язык: английский Формат: pdf (true), epub Размер: 29.3 MB
This book covers a variety of topics, including methodology development in Data Science, such as methodology in the analysis of high dimensional data, feature screening in ultra-high dimensional data and natural language ranking; statistical analysis challenges in sampling, multivariate survival models and contaminated data, as well as applications of statistical methods. With this book, readers can make use of frontier research methods to tackle their problems in research, education, training and consultation.
Automatically ranking comments by relevance plays an important role in text mining. In Chapter 5, Yuyang Zhang and Hao Yu present a new text digitization method, the bag of word clusters model, by grouping semantic-related words as clusters using pre-trained word2vec word embeddings and representing each comment as a distribution of word clusters. This method extracts both semantic and statistical information from texts. They then propose an unsupervised ranking algorithm that identifies relevant comments by their distance to the “ideal” comment. This “ideal” comment is the maximum general entropy comment with respect to the global word cluster distribution. The “ideal” comment highlights aspects of a product that many other comments frequently mention and thus is regarded as a standard to judge a comment’s relevance to this product.
In Chapter 8, Yanqing Sun and Fang Fang study several profile estimation methods for the generalized semiparametric varying-coefficient additive model for longitudinal data by utilizing the within-subject correlations. The model is flexible in allowing time-varying effects for some covariates and constant effects for others, and in having the option to choose different link functions which can be used to analyze both discrete and continuous longitudinal responses. They investigated the profile generalized estimating equation (GEE) approaches and the profile quadratic inference function (QIF) approach. The profile estimations are assisted with the local linear smoothing technique to estimate the time-varying effects. Several approaches that incorporate the within-subject correlations are investigated, including the quasi-likelihood (QL), the minimum generalized variance (MGV), the quadratic inference function, and the weighted least squares (WLS). The proposed estimation procedures can accommodate flexible sampling schemes. These methods provide a unified approach that works well for discrete longitudinal responses as well as for continuous longitudinal responses.
Contents: Part I. Methodology Development in Data Science Part II. Challenges in Statistical Analysis
Скачать Advances and Innovations in Statistics and Data Science
|