Programming for data science ST2195

In the last decade the demand for programming skills related to managing and visualising data has grown remarkably. Python, R and SQL feature consistently in the top skills listed in data science and data analyst jobs. Knowing how to write efficient software code to handle and visualise data is an essential skill for any modern data scientist.

Topics covered:

This course will cover the main principles of computer programming with a focus on data science applications by following the entire pathway from raw data to databases, data wrangling and visualisation, machine learning frameworks up to software development. Students will gain knowledge on the main principles of programming in the data science context and develop the ability to handle and visualise data. This course assumes no prior programming knowledge and will provide training in state-of-the-art tools, e.g. SQL, Python, R and Git. Students will apply computational thinking in various applications domains and learn to communicate data analysis results to stakeholders.

Learning outcomes:

At the end of the course and having completed the essential reading and activities students should be able to:

  • convert raw data to relational databases such as SQL
  • import data to Python and R, apply data manipulation and visualisation
  • program in Python and R
  • develop software using version control via Git

Assessment:

This course is assessed by an individual case study piece of coursework (50%) and a two-hour unseen written examination (50%)

Essential reading:

Download the course information sheets from the LSE website.