Skip to content Search
Search our website:

Natural Language Processing and Information Retrieval

Short name: NLP
SITS code: COIY064H7
Credits: 15
Level: 7
Module leader: Dell Zhang
Lecturer(s): Dell Zhang

Module outline

Due to the explosive growth of digital information in recent years, modern Natural Language Processing (NLP) and Information Retrieval (IR) systems such as search engines have become more and more important in almost everyone's work and life (e.g. see the phenomenal rise of Google). NLP & IR research and development are one of the hottest research areas in academia as well as industry. This module will convey the basic principles of modern NLP & IR systems to students.


The aim of this module is to introduce modern NLP & IR concepts and techniques, from basic text indexing to advanced text analysis. Both theoretical and practical aspects of NLP & IR systems will be presented and the most recent issues in the field of NLP & IR will be discussed. This will give students an insight into how modern search engines work and are developed.


  • Boolean Retrieval
  • The Term Vocabulary and Postings Lists
  • Regular Expressions and Text Normalization
  • Dictionaries and Tolerant Retrieval
  • Edit Distance
  • Index Compression
  • Scoring, Term Weighting and the Vector Space Model
  • Evaluation in Information Retrieval
  • Probabilistic Information Retrieval
  • Language Models for Information Retrieval
  • Language Modeling with N-Grams
  • Spelling Correction and the Noisy Channel
  • Text Classification, Naive Bayes, and Sentiment Analysis
  • Vector Space Classification
  • Logistic Regression
  • Matrix Decompositions and Latent Semantic Indexing
  • Vector Semantics
  • Neural Nets and Neural Language Models
  • Sequence Processing with Recurrent Networks




Indicative timetables can be found in the handbooks available on programme pages. Personalised teaching timetables for students are available via My Birkbeck.


The coursework includes two assignments.


Coursework (20%). Examination (80%).

Recommended reading