Skip to content Search
Search our website:

Latent Dimension Discovery from Large-Scale Natural Language Text Collections

  • Speaker: Dr Shoaib Jameel, School of Computing, University of Kent
  • Date: Tuesday, 20 March 2018 from 14:00 to 15:00
  • Location: Room 403

Broadly speaking, my work has centered around learning
low-dimensional representations of natural language text on a large
scale. Among others, I have developed a variety of probabilistic topic
models, which have seen applications in text mining and information
retrieval, as well as vector space embeddings, which have shown
promising results in tasks such as knowledge base completion and
commonsense reasoning.
In the context of machine learning from text representations, a common
approach is to use a cascading framework, where e.g. in a first step
some latent features are computed, and these features are then used as
the input to some classifier. Similarly, when different types of
latent features need to be learned, often these are learned
independently, and then simply concatenated into a single
high-dimensional feature vector at the end. For example, this is the
approach taken by most learning-to-rank methods for information
retrieval. Such approaches are rarely optimal as errors from earlier
steps are propagated and amplified in later steps, and correlations
between different aspects of the input (e.g. term frequencies, link
structure, social network tags) are not taken into account. Most of
the models I have proposed in the past instead aim to solve such
problems in a unified manner. By formulating novel mathematical
models, it is indeed often possible to integrate the step of learning
latent representations with the step of learning the resulting
classifier. In this way, the labelled data that is available to train
the classifier can implicitly help to learn more task-specific latent
In my work on vector space embeddings, I have similarly aimed to
develop models that are closely matched to downstream applications.
For example, I have developed an entity embedding model that is
interpretable, in the sense that properties of entities have a direct
geometric representation. Such embeddings have shown a lot of
potential for entity retrieval, as query terms can be directly
interpreted in the vector space. Another approach which I have
explored is to represent words or entities as probability
distributions over vectors. In this way, our certainty about the
vector space embedding is explicitly modelled, which avoids "guessing"
vectors for rare words or entities, thus preventing subsequent error
propagation in downstream applications.

Short Biography: Shoaib Jameel is currently a Lecturer in the School
of Computing, University of Kent. He has obtained his Bachelor’s
degree in Computer Science and Engineering in India. Based on his
academic and research accomplishments at the undergraduate level, he
gained admission into the PhD programme directly in the Department of
Systems Engineering and Engineering Management, The Chinese University
of Hong Kong. He graduated with a PhD in the year 2014 under the
supervision of Prof. Wai Lam. His research interests include
probabilistic topic models, Bayesian nonparametric statistics,
artificial intelligence, vector space embeddings, and information
retrieval. Shoaib has collaborated with researchers from both
industries and academia, for example, Microsoft Research, Carnegie
Mellon University, NTT Communications, Japan and Institute of
Infocomm, Singapore. Shoaib has consistently published in several
highly selective conferences and journals, for example, SIGIR, TOIS,
CIKM, ECAI, AAAI, etc. He has also served as a programme committee
member in several top-tier conferences and reviews for journals too.
Shoaib has over five years of teaching experience including teaching
his own batchmates during his undergraduate study. He has
co-supervised two PhD and an MPhil student.