MA-INF 1319 - Cluster Analysis

News

Date of the 2nd exam: The re-exams will take place on the 11th and 12th of March in room 2.074. Please contact Clemens Rösner to schedule the time of your exam.

Things that are NOT relevant for the exam: Chapter 2.7 (outliers) will not be discussed in the oral exams. The same is true for proofs that were deferred to the tutorials, and tutorial questions. These are there to improve the general understanding of the material, but they will not be discussed in the oral exams.

Exams: The exams are oral exams. We offer times for oral exams on the 4th, 6th and 8th of February. After enrolling for the exam in BASIS, please contact Christiane Andrade to schedule the time of your exam.

Lecture Notes: The lecture notes have last been updated on the 25th of February and cover the whole lecture!

General Information

The content of this lecture is the theoretical analysis of approximation algorithms for cluster analysis, in particular covering the following areas:

  • Approximation algorithms for k-center, k-median and k-means
  • Different techniques for approximation algorithms, including ILP-based techniques and local search
  • Clustering of Big Data and in Data Streams
  • Analysis of common clustering heuristics
  • Practically efficient methods with theoretical guarantees

Lecture

When Where Start Lecturer
Tuesday, 10:15-11:45 INF / Room 2.078 October 09th Schmidt

Tutorials

When Where Start Lecturer
Wednesday, 12:30-14:00 INF / Room 2.050 October 17th Rösner
Friday, 14:00-15:30 INF / Room 2.050 October 19th Rösner

Problem Sets

  • Problem Set 1 (hand in until October 16th, to be discussed October 17th and October 19th)
  • Problem Set 2 (hand in until October 23th, to be discussed October 24th and October 26th)
  • Problem Set 3 (hand in until October 30th, to be discussed October 31st and November 2nd)
  • Problem Set 4 (hand in until November 6th, to be discussed November 7th and November 9th)
  • November 14th and November 16th tutorials were for repetition and questions and without a new exercise sheet
  • Problem Set 5 (hand in until November 20th, to be discussed November 21st and November 23rd)
  • Problem Set 6 (hand in until November 27th, to be discussed November 28th and November 30th)
  • Problem Set 7 (hand in until December 4th, to be discussed December 5th and December 7th)
  • Problem Set 8 (hand in until December 11th, to be discussed December 12th and December 14th)
  • Problem Set 9 (hand in until December 18th, to be discussed December 19th and December 21th)
  • Problem Set 10 (hand in until January 8th, to be discussed January 9th and January 11th)
  • Problem Set 11 (hand in until January 15th, to be discussed January 16th and January 18th)
  • Problem Set 12 (hand in until January 22nd, to be discussed January 23th and January 25th)

Schedule

Date
October 09 1 Introduction
2 The happy world of k-center
2.1 Definition
2.2 A simple and elegant 2-approximation
October 16 2.3 A matching lower bound
2.4 Incremental and hierarchical clustering
October 23 2.4 continued: Proof
2.5 Another elegant 2-approximation
October 30 2.6 A streaming algorithm for k-center
November 6 –cancelled due to illness–
November 13 2.7 The k-center problem with outliers
2.8 Fair k-center
November 20 2.8 Fair k-center (ctd)
3 The exciting world of k-means
3.1 Definition
3.2 Lloyd's algorithm
November 27 3.3 The k-means++ algorithm
3.3.1 D2-sampling as a bicriteria approximation
December 4 3.3.2 A glimpse on the analysis of k-means++
3.4 Dimensionality reduction
December 11 3.4.1 The Johnson-Lindenstrauss Lemma
December 18 3.4.2 The Singular Value Decomposition
January 8 3.5 A Coreset Construction
January 15 3.5 A Coreset Construction (ctd)
3.6 Merge&Reduce
January 22 3.6 Merge&Reduce (ctd)

Material

The lecture notes cover the content of the lecture.


Page Tools