Skip to content Search
Search our website:

Data Analytics using R

Short name: DAR
SITS code: BUCI042H7
Credits: 15
Level: 7
Module leader: Cen Wan
Lecturer(s): Cen Wan
Online material: https://moodle.bbk.ac.uk/

Module outline

This module covers the principle concepts and techniques of data analytics and how to apply them to real-life data sets. Students develop the core skills and expertise needed by data scientists, including the use of techniques such as linear regression, classification and clustering. The module will show you how to use the popular and powerful data analysis language and environment R to solve practical problems based on use cases extracted from real domains.

Aims

To study advanced aspects of data analytics, applying appropriate machine learning techniques to analyse data sets, assessing the statistical significance of data analytical results, and using the open-source tool R to perform basic data analytical tasks on real-life data.

Syllabus

  • Introduction to data analytics: data and dataset overview, data pre-processing, concepts of supervised and unsupervised learning.
  • Basic statistics: mean, median, standard deviation, variance, correlation, covariance, graphical and tabular diagrams.
  • Linear regression: simple linear regression, introduction to multiple linear regression.
  • Classification: logistic regression, decision trees, SVM.
  • Ensemble methods: bagging, random forests.
  • Clustering: K-means, hierarchical clustering.
  • Principal Component Analysis.
  • Evaluation and validation: cross-validation, assessing the statistical significance of data analysis results.
  • Real-life case studies.
  • Tools: R.

Prerequisites

Experience with a modern programming language.

Timetables

Indicative timetables can be found in the handbooks available on programme pages. Personalised teaching timetables for students are available via My Birkbeck.

Coursework

Two pieces practical exercises involving learning and analysing data sets using the tool R.

Assessment

Coursework (20%). Examination (80%).

Recommended reading

  • An Introduction to Statistical Learning: With Applications in R: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani.