VVZ API is not affiliated with ETH Zurich. Data might be outdated or incorrect. Please view the official ETHZ Vorlesungsverzeichnis for binding information.
Information Retrieval
Last Updated: 2026-02-05 16:22:56
Abstract
This course gives an introduction to information retrieval with a focus on text documents and unstructured data.Main topics comprise document modelling, various retrieval techniques, indexing techniques, query frameworks, optimization, evaluation and feedback.
Objective
We keep accumulating data at an unprecedented pace, much faster than we can process it. While Big Data techniques contribute solutions accounting for structured or semi-structured shapes such as tables, trees, graphs and cubes, the study of unstructured data is a field of its own: Information Retrieval. After this course, you will have in-depth understanding of broadly established techniques in order to model, index and query unstructured data (aka, text), including the vector space model, boolean queries, terms, posting lists, dealing with errors and imprecision. You will know how to make queries faster and how to make queries work on very large datasets. You will be capable of evaluating the quality of an information retrieval engine. Finally, you will also have knowledge about alternate models (structured data, probabilistic retrieval, language models) as well as basic search algorithms on the web such as Google's PageRank.
Content
1. Introduction 2. Boolean retrieval: the basics of how to index and query unstructured data. 3. Term vocabulary: pre-processing the data prior to indexing: building the term vocabulary, posting lists. 4. Tolerant retrieval: dealing with spelling errors: tolerant retrieval. 5. Index construction: scaling up to large datasets. 6. Index compression: how to improve performance by compressing the index in various ways. 7. Ranked retrieval: how to ranking results with scores and the vector space model 8. Scoring in a bigger picture: taking ranked retrieval to the next level with various improvements, including inexact retrieval 9. Probabilistic information retrieval: how to leverage Bayesian techniques to build an alternate, probabilistic model for information retrieval 10. Language models: another alternate model based on languages, automata and document generation 11. Evaluation: precision, recall and various other measurements of quality 12. Web search: PageRank 13. Wrap-up. The lecture structure will follow the pedagogical approach of the book (see material). The field of information retrieval also encompasses machine learning aspects. However, we will make a conscious effort to limit overlaps, and be complementary with, the Introduction to Machine Learning lecture.
Resources
Literature
C. D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval, Cambridge University Press.
Learning Materials (Links)
General Information
- Language
- English
- Levels
- BSC , DZ , SHE
- Frequency
- Yearly recurring
Examination
- Type
- session examination
- Mode
- written 180 minutes
- Aids
- General dictionaries are allowed!
- Digital
- The exam takes place on devices provided by ETH Zurich.
Course Components
| Type | Title | Time & Place | Hours |
|---|---|---|---|
| lecture | Information Retrieval |
|
2 h weekly |
| exercise |
Information Retrieval
Groups are selected in myStudies.
|
|
1 h weekly |
Offered In
-
-
Electives (Students may also choose courses from the Master's program in Computer Science. It is their responsibility to make sure that they meet the requirements and conditions for these courses.)
-
-
Computer Science TC (Detailed information on the programme at: )
-
Computer Science Teaching Diploma (More informations at : )