QMSS provides students with a basis in quantitative skills for social science research through its core curriculum, with the flexibility to engage in interdisciplinary pursuits through their elective selections. Students can select these classes from the elective offerings by QMSS, or draw from the wider university through cross-registration in the many departments across Columbia that QMSS has developed strong relationships with over the years. Below are descriptions of the courses offered by QMSS as well as a selection of courses around the university that may be of interest. Students are advised to work with the program to determine a course of study that suits goals for program design as well as to receive guidance on their options for study throughout the university.
Core QMSS Courses
The five courses below or their equivalents are completed by ALL QMSS Students.
QMSS Theory and Methodology (QMSS GR5010)
This interdisciplinary course, taken in the fall semester, is a comprehensive introduction to quantitative research in the social sciences. The course focuses on foundational ideas of social science research, including strengths and weaknesses of different research designs, interpretation of data drawn from contemporary and historical contexts, and strategies for evaluating evidence. The majority of the course is comprised of two-week units examining particular research designs, with a set of scholarly articles that utilize that design. Topics include: the “science” of social science and the role of statistical models, causality and causal inference, concepts and measurement, understanding human decision making, randomization and experimental methods, observation and quasi-experimentation, sampling, survey research, and working with archival data.
QMSS Data Analysis Requirement (QMSS GR5015)
The data analysis course covers specific statistical tools used in social science research using the statistical program R. Topics to be covered include statistical data structures, and basic descriptives, regression models, multiple regression analysis, interactions, polynomials, Gauss-Markov assumptions and asymptotics, heteroskedasticity and diagnostics, models for binary outcomes, naive Bayes classifiers, models for ordered data, models for nominal data, first difference analysis, factor analysis, and a review of models that build upon OLS. Prerequisite: introductory statistics course that includes linear regression.
ECON W4412: Advanced Econometrics (Economics Concentration Only)
Students who are planning on pursuing the economics concentration are required to take this course. This course is intended for students who already have a firm grasp of introductory level econometrics, and interested in advanced topics including asymptotic theory. The pre-requisites are linear algebra, intermediate microeconomics, macroeconomics, and econometrics. Topics to be covered include OLS in matrix form, finite sample and asymptotic properties, hypothesis testing, GLS, maximum likelihood, endogeneity, stationary time series, nonstationary time series, panel data, and discrete choice models.
STAT W5701: Probability and Statistics (Data Science CONCENTRATION Only)
Students who are planning on pursuing the data science concentration are required to take this course. This course is a self-contained introduction to probability and statistics with a focus on data science. The topics covered include fundamentals of probability theory and statistical inference, including: probabilistic models, random variables, useful distributions, expectations, the law of large numbers, the central limit theorem, point and confidence interval estimation, maximum likelihood methods, hypothesis tests, and linear regression (as time permits).
QMSS Seminar Series (QMSS GR5021 & GR5022)
This course is designed to expose students in the QMSS degree program to different methods and practices of social science research. Seminar presentations are given on a wide range of topics by faculty from Columbia and other New York City universities, as well as researchers from private, government, and non-profit settings. QMSS students participate in a weekly seminar. Speakers include faculty from Columbia and other universities, and researchers from the numerous corporate, government, and non-profit settings where quantitative research tools are used. Topics have included: Now-Casting and the Real-Time Data-Flow; Art, Design & Science in Data Visualization; Educational Attainment and School Desegregation: Evidence from Randomized Lotteries; Practical Data Science: North American Oil and Gas Drilling Data.
QMSS Thesis (GR5999)
All students must complete an MA thesis, which involves original statistical analysis, under the supervision of the student's advisor and the QMSS program director. Students should register for this course in the last semester of their program
QMSS Students typically take between 4 and 6 elective courses. Any 4000-level or above course offered by QMSS, Computer Science, Economics, Statistics, Political Science, Sociology, Mathematics, or SIPA will satisty one of these requirements. 4000-level courses outside these departments MAY satisfy an elective requirment but require approval by the Director of QMSS.
Each concentration has its own guidlines regarding elective distribution, so be sure to read your Degree Requirements worksheet carefully.
Some popular elective courses are listed below. Be aware that course listings are always subject to change. You should always check in the the Columbia Directory of Classes for the most up-to-date information.
Traditional Track QMSS Students must take TWO Research Methods Electives. We Stringly encourage they fulfill this requirement through QMSS Department Electives.
Time Series, Panel Data, and Forecasting (QMSS GR5016)
This course will introduce students to the main concepts and methods behind regression analysis of temporal processes and highlight the benefits and limitations of using temporally ordered data. Students study the complementary areas of time series data and longitudinal (or panel) data. There are no formal prerequisites for the course, but a solid understanding of the mechanics and interpretation of OLS regression will be assumed (we will briefly review it at the beginning of the course). Topics to be covered include regression with panel data, probit and logit regression of pooled cross-sectional data, difference-in-difference models, time series regression, dynamic causal effects, vector autoregressions, cointegration, and GARCH models. Statistical computing will be carried out in R.
Advanced Analytic Techniques (QMSS GR5018)
This course is meant to train students in advanced quantitative techniques in the social sciences. Statistical computing will be carried out in R. Topics include: review of multiple/linear regression, review of logistic regression, generalized linear models, models with limited dependent variables, first differences analysis, fixed effects, random effects, lagged dependent variables, growth curve analysis, instrumental variable and two stage least squares, natural experiments, regression discontinuity, propensity score matching, multilevel models or hierarchical linear models, and text-based quantitative analysis.
Introduction to Missing Data (QMSS GR5059)
The goal of this course is to provide students with a basic knowledge of the potential implications of missing data for their data analyses as well as potential solutions. Students will looks at different types of mechanisms that can generate missing data. This will lay the groundwork for discussions of what types of missing data scenarios can be accommodated by each missing data method discussed subsequently. Finally, students will learn how to deal with missing data in Stata. More advanced techniques will be covered in a course later on, using R and Stan. Any QMSS student is presumed to have sufficient background. Any non-QMSS students interested in taking this course should have sufficient background in regression modeling of discrete variables. Topics to be covered are probability theory, endogenous selection, mechanisms of missing data, single imputation methods, multiple imputation methods, multivariate normal imputation, conditional imputation, and post-imputation diagnostics.
Social Network Analysis (QMSS GR5062)
The course is designed to teach students the foundations of network analysis including how to manipulate, analyze and visualize network data themselves using statistical software. We will focus on using the statistical program R for most of the work. Topics will include measures of network size, density, and tie strength, measures of network diversity, sampling issues, making ego-nets from whole networks, distance, dyads, homophily, balance and transitivity, structural holes, brokerage, measures of centrality (degree, betweenness, closeness, eigenvector, beta/Bonacich), statistical inference using network data, community detection, affiliation/bipartite networks, clustering and small worlds; positions, roles and equivalence; visualization, simulation, and network evolution over time.
Data Visualization (QMSS GR5063)
Bayesian Statistics for the Social Sciences (QMSS GR5065)
An introduction to Bayesian statistical methods with applications to the social sciences. Considerable emphasis will be placed on regression modeling and model checking. The primary software used will be Stan, which students do not need to be familiar with in advance. Student in the course will access the Stan library via R, so some experience with R would be helpful but not required. Any QMSS student is presumed to have sufficient background. Any non-QMSS students interested in taking this course should have a comparable background to a QMSS student in basic probability. Topics to be covered are a review of calculus and probability, Bayesian principles, prediction and model checking, linear regression models, Bayesian data collection, Bayesian calculations, Stan, the BUGS language and JAGS, hierarchical linear models, nonlinear regression models, missing data, stochastic processes, and decision theory.
Experimentation in the Social Sciences (QMSS GR5068)
The course is designed to provide students with a basic introduction to the use of experimental methods in political and social sciences. Students will be exposed to methodological, theoretical and practical aspects of experimentation. No prior knowledge of experimental methods is required. Topics to be covered are causal inference randomization and validity; reporting experimental research; experimental design and analysis; ethics, human subjects research and the IRB; laboratory experiments; survey experiments; field experiments; quasi-experimentation, natural experiments and regression discontinuity designs; and integrating experimental research.
Applied Data Science for Social Science (QMSS GR5069)
In his now classic Venn diagram, Drew Conway described Data Science as sitting at the intersection between good hacking skills, math and statistics knowledge, and substantive expertise. By training, social scientists possess a IUD combination of all three, but also bring an additional layer to the mix. We have acquired slightly divergent training, skills and expertise tailored to understand human behavior and to explain why things happen the way they do. Social scientists are, thus, a particular kind of data scientist. This course is not intended to teach you how to code, create visualizations, or estimate models. It presumes you have learned that in other classes. This course is intended to take you to the next level in becoming a data scientist.
GIS Spatial Analysis (QMSS GR5070)
This course introduces students to basic spatial analytic skills. It covers introductory concepts and tools in Geographic Information Systems (GIS) and database management. As well, the course introduces students to the process of developing and writing an original spatial research project. Topics to be covered include: social theories involving space, place and reflexive relationships; social demography concepts and databases; visualizing social data using geographic information systems; exploratory spatial data analysis of social data and spatially weighted regression models, spatial regression models of social data, and space-time models. Use of open-source software (primarily the R software package) will be taught as well.
Advanced GIS Spatial Analysis (QMSS GR5071)
This course builds upon foundational spatial analysis concepts and skills built in the introductory GIS course through the application of advanced spatial statistical modeling tools. Topics covered include 1) Graphical and quantitative description of spatial data, 2) Kriging, block kriging and cokriging, 3) Common variogram models, 4) Spatial autoregressive models, estimation and testing, 5) Spatial non-stationarity and associated modeling procedures and 6) Spatial sampling procedures. Use of open-source software (Primarily the R software package) with emphasis on analysis of real data from the environmental and social sciences will be the substantive focus of the class. Students will do a series of in-class labs and develop a final research project from these labs or an independent project.
Modern Data Structures (QMSS GR5072)
This course is intended to provide a detailed tour on how to access, clean, “munge” and organize data, both big and small. (It should also give students a flavor of what would be expected of them in a typical data science interview.) Each week will have simple, moderate and complex examples in class, with code to follow. Students will then practice additional exercises at home. The end point of each project would be to get the data organized and cleaned enough so that it is in a data-frame, ready for subsequent analysis and graphing. Therefore, no analysis or visualization (beyond just basic tables and plots to make sure everything was correctly organized) will be taught; and this will free up substantial time for the “nitty-gritty” of all of this data wrangling.
Machine Learning for Social Sciences (QMSS GR5073)
This course will provide a comprehensive overview of machine learning as it is applied in a number of domains. Comparisons and contrasts will be drawn between this machine learning approach and more traditional regression-based approaches used in the social sciences. Emphasis will also be placed on opportunities to synthesize these two approaches. The course will start with an introduction to Python, the scikit-learn package and GitHub. After that, there will be some discussion of data exploration, visualization in matplotlib, preprocessing, feature engineering, variable imputation, and feature selection. Supervised learning methods will be considered, including OLS models, linear models for classification, support vector machines, decision trees and random forests, and gradient boosting. Calibration, model evaluation and strategies for dealing with imbalanced datasets, n on-negative matrix factorization, and outlier detection will be considered next. This will be followed by unsupervised techniques: PCA, discriminant analysis, manifold learning, clustering, mixture models, cluster evaluation. Lastly, we will consider neural networks, convolutional neural networks for image classification and recurrent neural networks. This course will primarily us Python. Previous programming experience will be helpful but not requisite. Prerequisites: basic probability and statistics, basic linear algebra, and calculus.
Internship (QMSS GR5050 & QMSS GR5051)
Students enrolled in the Quantitative Methods in the Social Sciences MA program have a number of opportunities for internships with various organizations in New York City. All internships will be graded on a pass/fail basis.
An internship must meet the following criteria:
- It is related to the core issues of concern to the MA Program in Quantitative Methods in the Social Sciences.
- The work is substantive (although students may perform some administrative tasks, we want to ensure that they receive experience in substantive research).
- It is a practical, professional experience.
Practicum in Data Analysis (QMSS GR5052)
This practicum course is meant to offer valuable training to students. Specifically, this practicum will mimic the typical conditions that students would face in an internship in a large data-intense institution. The practicum will focus on four core elements involved in most internships: (1) Developing the intuition and skills to properly scope ambiguous project ideas; (2) practicing organizing and accessing a variety of large-scale data sources and formats; (3) conducting basic and advanced analysis of big data; and (4) communicating and “productizing” results and findings from the earlier steps, in things like dashboards, reports, interactive graphics, or apps. The practicum will also give students time to reflect on their work, and how it would best translate into corporate, non-profit, start-up and other contexts.
Independent Study (QMSS GR5998)
Students develop a course of study under the supervision of a faculty member. Please see the QMSS program coordinator for more details.
COMS W4170x User interface design
Prerequisites: COMS W3137. Introduction to the theory and practice of computer user interface design, emphasizing the software design of graphical user interfaces. Topics include basic interaction devices and techniques, human factors, interaction styles, dialogue design, and software infrastructure. Design and programming projects are required.
COMS W4705x Natural language processing 3 pts.
Prerequisites: COMS W3133, or W3134, or W3137, or W3139, or the instructor's permission. Computational approaches to natural language generation and understanding. Recommended preparation: some previous or concurrent exposure to AI or Machine Learning. Topics include information extraction, summarization, machine translation, dialogue systems, and emotional speech. Particular attention is given to robust techniques that can handle understanding and generation for the large amounts of text on the Web or in other large corpora. Programming exercises in several of these areas.
ECON G4301x Economic Growth and Development
Prerequisites: Econ W3211 and W3213. Empirical findings on economic development, theoretical development models; problems of efficient resource allocation in a growing economy; balanced and unbalanced growth in closed and open economic systems; the role of capital accumulation and innovation in economic growth.
ECON W4415: Game Theory
Prerequisites: ECON W3211 and W3213.
Introduction to the systematic treatment of game theory and its applications in economic analysis.
ECON W4020: Economics of Uncertainty and Information
Prerequisites: ECON W3211, W3213 and STAT 1201.
Topics include behavior uncertainty, expected utility hypothesis, insurance, portfolio choice, principle agent problems, screening and signaling, and information theories of financial intermediation.
MATH W4061x Introduction to Modern Analysis
Prerequisites: MATH V1202 or the equivalent and V2010. The second term of this course may not be taken without the first.
Real numbers, metric spaces, elements of general topology. Continuous and differential functions. Implicit functions. Integration. Change of variables. Function spaces.
MATH W5010 Introduction to the Mathematics of Finance
Prerequisites: MATH V1202, MATH V3027, STAT W5203, SIEO W3001, or their equivalents.
The mathematics of finance, principally the problem of pricing of derivative securities, developed using only calculus and basic probability. Topics include mathematical models for financial instruments, Brownian motion, normal and lognormal distributions, the BlackûScholes formula, and binomial models.
POLS W4700x Mathematical Methods for Political Science
Provides students of political science with a basic set of tools needed to read, evaluate, and contribute in research areas that increasingly utilize sophisticated mathematical techniques.
POLS W4710x Principles of Quantitative Political Research
Introduction to the use of quantitative techniques in political science and public policy. Topics include descriptive statistics and principles of statistical inference and probability through analysis of variance and ordinary least-squares regression. Computer applications are emphasized.
POLS W4714x Multivariate Political Analysis
Prerequisite: basic data analysis through multiple regression (e.g., POLS W4910) and knowledge of basic calculus and matrix algebra. More mathematical treatment of topics covered in POLS W4911. Examines problems encountered in multivariate analysis of cross-sectional and time-series data.
STAT W3026 Applied Data Mining
Data Mining is a dynamic and fast growing field at the interface of Statistics and Computer Science. The emergence of massive datasets containing millions or even billions of observations provides the primary impetus for the field. Such datasets arise, for instance, in large-scale retailing, telecommunications, astronomy, computational and statistical challenges. This course will provide an overview of current practice in data mining. Specific topics covered will include databases and data warehousing, exploratory data analysis and visualization, descriptive modeling, predictive modeling, pattern and rule discovery, text mining, Bayesian data mining, and causal inference. The use of statistical software will be emphasized.
STAT W4291 Advanced Data Analysis
This is a course on getting the most out of data. The emphasis will be on hands-on experience, involving case studies with real data and using common statistical packages. The course covers, at a very high level, exploratory data analysis, model formulation, goodness of fit testing, and other standard and non-standard statistical procedures, including linear regression, analysis of variance, nonlinear regression, generalized linear models, survival analysis, time series analysis, and modern regression methods. Students will be expected to propose a data set of their choice for use as case study material.
STAT W4282 Linear Regression and Time Series Methods
A one semester course covering: Simple and multiple regression, including testing, estimation, and confidence procedures, modeling, regression diagnostics and plots, polynomial regression, colinearity and confounding, model selection, geometry of least squares, linear time series models, auto-regressive, moving average and ARIMA models, estimation and forecasting with time series models, confidence intervals, and prediction error. Students may not receive credit for more than two of STAT W4315, W4437, and W4440.
We encourage students to explore course offings outside GSAS. Some popular options are listed below. Check out each schools' website for comprehensive listsings.
Be aware that these schools have their own distinct registration procedures. Visit the Registration page for full inscruitions.
School of International and Public Affairs
INAF U6045x or y International Capital Markets 3 pts.
The course will acquaint you modern international capital markets. You can expect to learn a substantial amount of up-to-date detail and some useful theory. Specifically, we will survey global markets for credit, equity, foreign exchange, foreign exchange derivatives, futures, interest rate swaps, credit default swaps, and asset-backed securities. In each case, students will learn the highlights of payments and settlement, documentation, regulation, applications for end-users, related economic theory, and pricing models. The class will cover options and asset pricing theory; however, the treatment will be informal and designed to help develop intuition. One lecture each will be devoted to international banking (with an emphasis on changing capital regulation), investment banks, and hedge funds.
EDPA 4050 001 Logic and Design of Research in Education Policy and Social Analysis
An introduction to understanding, designing, and conducting empirical research for education policy and the social sciences. Students explore philosophical foundations of research, the relationship between theory and evidence in research, and the mechanics of designing and conducting research, including strategies for sampling, data collection, and analysis. Quantitative, qualitative, and mixed methods approaches to research are addressed. This course is appropriate for students with little prior exposure to social science research.
Mailman School of Public Health
BIST P6104 - Introduction to Biostatistical Methods
Course pre-requisites: Placement exam required, and the instructor's permission
Enrollment priorities: Priority given to BIO students
Like many fields of learning, biostatistics has its own vocabulary often seen in medical and public health literature. Phrases like "statistical significance", "p-value less than 0.05", "95% confident", and "margin of error" can have enormous impact in a world that relies on statistics to make decisions: Should Drug A be recommended over Drug B? Should a national policy on X be implemented? Does Vitamin C truly prevent colds? However, do we really know what these terms and phrases mean? Understanding the theory and methodology behind study design, estimation and hypothesis testing is crucial to ensuring that findings and practices in public health and biomedicine are supported by reliable evidence. This course covers the basic tools for the collection, analysis, and presentation of data. Central to these skills is assessing the impact of chance and variability on the interpretation of research findings and subsequent recommendations for public health practice and policy. Topics covered include: general principles of study design; estimation; hypothesis testing; several methods for comparison of discrete and continuous data including chi square test of independence, ttest, ANOVA, correlation, regression and logistic regression. This introductory course is a Core Course for the Biostatistics Department and is mandatory for all MS in Biostatistics students.
Columbia Business School
B8131-001: Sports Analytics
Sports analytics refers to the use of data and quantitative methods to measure performance and make decisions to gain advantage in the competitive sports arena. This course builds on the Business Analytics core course and is designed to help students to develop and apply analytical skills that are useful in business, using sports as the application area. These skills include critical thinking, mathematical modeling, statistical analysis, predictive analytics, game theory, optimization and simulation. These skills will be applied to sports in this course, but are equally useful in many areas of business.There will be three main topics in the course: (1) measuring and predicting player and team performance, (2) decision-making and strategy in sports, and (3) fantasy sports and sports betting. Typical questions addressed in sports analytics include: How to rank players or teams? How to predict future performance of players or teams? How much is a player on a team worth? How likely are extreme performances, i.e., streaks? Are there hot-hands in sports performances? Which decision is more likely to lead to a win (e.g., attempt a stolen base or not in baseball, punt or go for it on fourth down in football, dump and chase or not in hockey, pull the goalie or not in hockey)? How to form lineups in daily fantasy sports? How to manage money in sports betting? How to analyze various "prop'' bets?