Machine Learning
PWP-CCPR-2019-001
Abstract
Machine learning is a statistical and computational approach to extracting important patterns and trends in data. This entry is an overview of machine learning methods for social science research. Supervised learning methods are discussed, including generalized linear models, support vector machines, k-nearest neighbor, artificial neural networks and deep learning, decision trees, and ensemble methods based on decision trees. Several important considerations relevant to supervised learning algorithms are noted, including the use of training and test data and cross-validation, loss optimization and evaluation metrics, bias-variance tradeoff, and overfitting and regularization strategies. Unsupervised learning methods are also discussed, including k-means clustering, hierarchical clustering, network community detection, principal component analysis, and t-distributed stochastic neighbor embedding. A section on text analysis incorporates supervised and unsupervised learning of documents and neural networks. New developments at the intersection of machine learning methods and causal inference are discussed. Key limitations and considerations for adopting these methods in empirical social science research concludes the entry.