It also provides practical skills related to working with Big Data computing resources as well as the conceptual context of how Big Data relates to methods and technologies in statistics and computing. The module is complementary to Machine Learning and other modules.
By taking this module, you will gain an in-depth understanding of the technology and methods used for Big Data analysis and how they relate to concepts in statistics and computing more generally. The technologies will include distributed file systems, SQL and NoSQL databases. You will learn what the key challenges are in Big Data analysis, how they relate to privacy and security and how these are addressed with current technologies. You will work in Python, a modern language widely used in Big Data analysis. You will use querying to extract data, then design data processing and analysis pipelines to analyse the data. You will learn how to apply these techniques to data in business and scientific applications.
- Introduction to Big Data
- Big Data in context: statistical methods and computing technologies
- Data privacy and security
- Environment for Big Data Programming
- Distributed File systems & MapReduce
- SQL and NoSQL Databases
- Big Data Pipelines
- Examples of Big Data Applications
- Cluster, Grid & Cloud Computing
- Optimisation Techniques
15 (150 hours)
Summative coursework (30%)
- Written examination (70%)