Natural Language Processing and Information Retrieval
Due to the explosive growth of digital information in recent years, modern Natural Language Processing (NLP) and Information Retrieval (IR) systems such as search engines have become more and more important in almost everyone's work and life (e.g. see the phenomenal rise of Google). NLP & IR research and development are one of the hottest research areas in academia as well as industry. This module will convey the basic principles of modern NLP & IR systems to students.
The aim of this module is to introduce modern NLP & IR concepts and techniques, from basic text indexing to advanced text analysis. Both theoretical and practical aspects of NLP & IR systems will be presented and the most recent issues in the field of NLP & IR will be discussed. This will give students an insight into how modern search engines work and are developed.
- Boolean Retrieval
- The Term Vocabulary and Postings Lists
- Regular Expressions and Text Normalization
- Dictionaries and Tolerant Retrieval
- Edit Distance
- Index Compression
- Scoring, Term Weighting and the Vector Space Model
- Evaluation in Information Retrieval
- Probabilistic Information Retrieval
- Language Models for Information Retrieval
- Language Modeling with N-Grams
- Spelling Correction and the Noisy Channel
- Text Classification, Naive Bayes, and Sentiment Analysis
- Vector Space Classification
- Logistic Regression
- Matrix Decompositions and Latent Semantic Indexing
- Vector Semantics
- Neural Nets and Neural Language Models
- Sequence Processing with Recurrent Networks
The coursework includes two assignments.
Coursework (20%). Examination (80%).
- Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze, Introduction to Information Retrieval, Cambridge University Press, 2008.
- Dan Jurafsky and James H. Martin, Speech and Language Processing, 3rd ed draft.