Statistics and Statistical Data Mining

This module aims to cover the key statistical concepts and techniques you will need to interpret the results you might generate through data analysis.

The areas covered in this module include probability theory, likelihood, common distributions, confidence intervals, hypothesis tests, parametric and non-parametric tests.

Upon successful completion of this module, you will be able to:

  • demonstrate the ability to critically appraise and evaluate mathematical and statistical techniques for the given empirical/data analysis.
  • understand the physical significance of the given mathematical and statistical technique.
  • use the optimisation techniques in decision making.
  • use the statistically significant conclusions from the sample data.

Topics covered

  • Exploratory Data Analysis (EDA)
  • Data Pre-processing, Correlation and Probability Overview
  • Sampling and Hypothesis Tests
  • Significance Tests
  • Linear Regression
  • Logistic Regression (LR)
  • Extreme Gradient Boosting (XGBoost)
  • Working with Imbalanced Data
  • Unsupervised Learning and Feature Selection
  • Machine Learning on the Cloud (AWS as an example)

Credits

15 (150 hours)

Assessment

  • Summative coursework (50%)
  • Written examination (50%)