Bayesian Non-Parametric Modelling and Case Studies in Bayesian Data Science

Overview
Bayesian methods offer an approach to inference, prediction and decision-making that allows you to synthesize all relevant sources of information in drawing conclusions and making decisions in the presence of uncertainty: you can bring together information both internal to, and external to, your data set to create a logically consistent summary of all that's known about the problem you're studying.
This course will:

Introduce you to the discipline of data science, which (broadly speaking) aims to help people find meaningful and verifiable facts and relationships in data sets of enormous size
Provide a brief introduction to Bayesian non-parametric methods, (a) which offer a flexible approach to model specification that may permit you to avoid false modeling assumptions and (b) which have recently been subject to research that permits them to be used on very large data sets
Illustrate a variety of best-practice data science methods in a series of real-world case studies ranging in size up to experiments with tens of millions of observations.

Who Should Attend?
Statisticians, biostatisticians, epidemiologists, data analysts, data-miners, machine-learning specialists and data scientists who wish to broaden and deepen:

Their understanding of Bayesian methods and
Their toolkits for using Bayesian models to find meaningful patterns, arrive at statistically sound inferences and predictions, and make better decisions.

Some graduate coursework in statistics (or an allied field such as biostatistics, epidemiology or machine learning) will provide sufficient mathematical background for participants. To get the most out of the course, participants should be comfortable with hearing the course presenter discuss:
However, all necessary concepts will be approached in a sufficiently intuitive manner that rustiness on these topics will not prevent understanding of the key ideas.

Differentiation and integration of functions of several variables and
Discrete and continuous probability distributions (joint, marginal, and conditional) for several variables at a time.

Participants interested in attending this course should ideally have had exposure to the ideas covered in the three previous courses in this series: Bayesian Modelling, Inference, Prediction and Decision-Making, Bayesian Hierarchical Modelling and Bayesian Model Specification.

How You Will Benefit
You will:

Gain a deeper understanding of maximum-likelihood-based methods and when they can be expected to behave in a sub-optimal manner;
Broaden and deepen your facility in the fitting and interpretation of Bayesian models to solve important problems in science, public policy and business; and
Learn how to write your own programs in WinBUGS and R to fit Bayesian models in your own work.

What Do We Cover?

The history of data science and ``Big Data” from 1944 to the present
A review of the terminology for data set size, from bytes to yottabytes (2**80 bytes)
(Bayesian non-parametrics) placing prior distributions on curves, such as cumulative distribution functions (CDFs) and regression surfaces, and following through to posterior distributions on those curves, with particular attention to Dirichlet process priors and how they may be fit in a computationally efficient way, even with gigabyte data sets

and a subset of the following topics (as time permits):

Large-scale A/B testing, which is the data-science term for randomized controlled trials on customers and web sites: how to do this well, and how to do it badly
Design and analysis of very-large-scale observational studies (because you can’t always randomize) and
Methods for simultaneous forecasting of tens of millions of related time series

Software
Practical work will be done in: A mixture of R and WinBUGS
Note: For practical work, participants must bring their own laptop with fully licensed versions of the software.

Extra Information
Related Courses: This course is part of a series of Bayesian Modelling courses, presented by Professor David Draper:

Bayesian Modelling, Inference, Prediction and Decision-Making (2 days)
Bayesian Hierarchical Modelling (1 day)
Bayesian Model Specification (1 day)
Bayesian Non-Parametric Modelling and Case Studies in Bayesian Data Science (1 day)

Course Dates
Next run to be announced

Duration: 1 day
Price: £TBC
An academic discount is available for this course

Apply Now
(terms and conditions apply)

Return to full course listing

Please note that there will not be any printed notes for this course. Materials for 2017 can be seen on the following links:

Bayesian Modeling, Inference, Prediction and Decision-Making (external link)
Bayesian Hierarchical Modeling (external link)
Bayesian Model Specification (external link)
Bayesian Non-Parametric Modelling and Case Studies in Bayesian Data Science (external link)

Other Related Courses:

Guest Presenter (last updated September 2017): David Draper
David Draper is a Professor of Statistics in the Department of Applied Mathematics and Statistics at the University of California, Santa Cruz (USA); he has also been a Senior Principal Research Scientist at both eBay Research Labs and at Amazon Research, where he developed new methods for Bayesian analysis of very large data sets in the discipline of data science, and he is currently in a consultative role as Senior Analyst at the social finance company SoFi.
He is a Fellow of the American Association for the Advancement of Science, the American Statistical Association (ASA), the Institute of Mathematical Statistics (IMS), and the Royal Statistical Society (RSS); from 2001 to 2003 he served as the President-Elect, President, and Past President of the International Society for Bayesian Analysis (ISBA).
He is the author or co-author of about 150 contributions to the methodological and applied statistical literature, including articles in the Journal of the Royal Statistical Society (Series A, B and C), the Journal of the American Statistical Association, the Annals of Applied Statistics, Bayesian Analysis, Statistical Science, the New England Journal of Medicine, and the Journal of the American Medical Association; his 1995 JRSS-B article on assessment and propagation of model uncertainty has been cited more than 1,600 times, and taken together his publications have been cited about 14,000 times.
His research is in the areas of Bayesian inference and prediction, model uncertainty and empirical model-building, hierarchical Modelling, Markov Chain Monte Carlo methods, Bayesian nonparametric methods and data science, with applications mainly in medicine, health policy, education, environmental risk assessment and eCommerce.
His short courses have received Excellence in Continuing Education Awards from the American Statistical Association on two occasions, corresponding to days 1 and 2 of this week of courses (20-24 November 2017). He has won or been nominated for major teaching awards everywhere he has taught (the University of Chicago; the RAND Graduate School of Public Policy Studies; the University of California, Los Angeles; the University of Bath (UK); and the University of California, Santa Cruz).
He has a particular interest in the exposition of complex statistical methods and ideas in the context of real-world applications.

Vertical Divider