Grades
- First draft for final grades
- Draft for final grades after exam review
- Final grades before oral examinations (students marked with Oral should contact us)
Exams
- 14/01/2019 Exam A and Exam B
- 14/01/2019 Exam solutions
- 14/01/2019 Exam grades (answers)
- Exam 28/01/2019
- NEW 30/01/2019 2nd Exam grades
Office Support Schedule
Office hours before second exam to be released soon
NEW Prof. Cláudia Antunes = Exams Season
- [new] second season: Alameda: 25th January at 11:00 @ Informática building II, office 1
- [new] second season: Tagus: 25th January at 09:00 @ room 2N3.15
- Alameda: 7th and 11th January at 11:00 @ Informática building III, room 0.09
- Tagus: 7th and 11th January at 14:30 @ office 2N1.25
- [new] second season: Alameda 22nd January at 11:00 @ Informática building III, room 0.09
- [new] second season: Tagus 23rd January at 11:00 @ office 2N1.25
Bibliography Support
- Mohammed J. Zaki, Wagner Meira, Jr. Data Mining and Analysis: Fundamental Concepts and Algorithms. 2014 Cambridge University Press
- Jiawei Han, Micheline Kamber, and Jian Pei. Data Mining: Concepts and Techniques, 3rd edition, Morgan Kaufmann, 2011
- Top 10 algorithms in data mining
- Naren Ramakrishnan. C4.5
- Dan Steinberg. CART: Classifaction and Regression Trees
Exercises
- NEW Exercises Book Part I by Cláudia Antunes and Rui Henriques
- NEW Exercises Book Part II v4 by Rui and Cláudia Antunes (stable version uploaded 24/12)
- Changes in version 5 (Book Part II v5 uploaded 7/Jan)
- FSelection 2.1:(5), Clustering 2.1:(3), Regression 1.2:(1),
- Biclustering 4:(2) and x8=<0,1,2,0.5>, Forecasting 1:(3), 3:(4)
- FSelection 2.1:(5), Clustering 2.1:(3), Regression 1.2:(1),
- Important note on temporal data mining question 1
- distance matrix using traditional DTW (the one required for exam) is [[0,22,3,11],[22,0,11,8],[3,11,0,9],[11,8,9,0]]
- the R-variant of DTW (not required for exam) was considered here, to answer 1.1 and 1.2 assume distance matrix to be [[0,8,3,5],[8,0,5,6],[3,5,0,5],[5,6,5,0]] (where [[d(x1,x1),d(x1,x2)...]...])
- distance matrix using traditional DTW (the one required for exam) is [[0,22,3,11],[22,0,11,8],[3,11,0,9],[11,8,9,0]]
- Answer key for version 4:
- FSelection: 1.1:(2), 1.2:(4), 2.1:the incorrect is (5), 2.2:(2&4), 3.1:(3), 3.2:(1&3&4)
- Clustering: 1.1:(4), 1.2:(2), 1.3:(2), 2.1:(1&3), 2.2:(3), 3:(3)
- PCA: 1:(2), 2:(3), 3:(4), 4:(4)
- Regression: 1.1.b:(3), 1.2:(1&3), 1.3:(6), 2.1:(3), 2.2:(3), 3:(4)
- Biclustering: 1:(4), 2:(3&4&5&7), 3:(3), 4:(1&2) and x8=<0,1,2,0.5>
- Sequential pattern mining: 1:(2), 2:(3)
- Time series forecasting: 1:(2&3), 2:(5), 3:(1&4)
- Time series representations: 1.1:(3&4), 1.2:(5), 1.3(4), 2.1:(4), 3:(1&2&4)
- Temporal data mining: 1.1:(1&3), 1.2:(2), 2.1:(3), 2.2:(all)
- FSelection: 1.1:(2), 1.2:(4), 2.1:the incorrect is (5), 2.2:(2&4), 3.1:(3), 3.2:(1&3&4)
Slides
- T1a - Introduction and organization
- T1b - Data Science (Han 1, Zaki 1)
- T2 - Data Exploration (Han 2, Zaki 2 & 3)
- T3 - Classification: Analogizers (Han 9.5 & 9.3, Zaki 18.3 & 21.1)
- T4 - Classification: Bayesian (Han 8.3 & 8.5, Zaki 18)
- T5 - Evaluation, Data Balancing (Han 12, Zaki 22)
- T6,7,8 - Classification: Decision Trees (Han 8.2, Zaki 19)
- T9 - Classification: Ensemble Methods (Han 8.6)
- T10 - Classification: other approaches (Han 9.2)
- T11 - Pattern Mining (Han 6 & 7, Zaki 8 & 9 & 10 & 12)
- T12 - Clustering (Han 10 & 11.1, Zaki 13-17)
- T13 - Dimensionality and Data Reduction (Han 3.4 & 3.5)
- T14 - Data Transformation (Zaki 6 & 7)
- T15,16 - Social Netowrk Analysis
- T17 - Regression (not covered in bibliography, Gelman&Hill part 1A)
- T18 - Biclustering (not covered in bibliography, Madeira&Oliveira, Henriques&Madeira)
- T19 - Time Series Description and Forecasting [p.46 updated] (not covered in bibliography, Bisgaard&Kulahci)
- T20a - Time Series Representations (not covered in bibliography, Lin&Keogh)
- T20b - Fourier and Wavelet Transform (not covered in bibliography, Boggess&Narcowich 3 & 4 & 5)
- T21 - Temporal Data Mining (not covered in bibliography, Esling&Agon, Mörchen)
- T22 - Outlier Analysis (not covered in bibliography, Aggarwal)
- T23 - Biomedical Data Analysis (covers principles from all lectures)
- T24 - Complex Data Mining (not covered in bibliography, Atluri&Karpatne&Kumar and Dzeroski)
- T25 - Notes on Big Data (not covered in bibliography, Zaki and Aggarwal)
- T26 - Closing remarks
Exam preparation
- T12 - slides 4,12-36,38-69,72,80,84-95,101-103,109-111
- T13 - slides 4-31,38-39
- T14 - slides 5-25,29-34,43-44
- T17 - slides 4-29,34-49
- T18 - slides 4-36,47-53
- T19 - slides 3-21,31-28,42,44,46,51-53,57-58,61,64,66,69,72,76,83-85
- T20a - slides 3-5,8-32
- T20b - slides 3-10,16-21,23-29,31,33
- T21 - slides 4-18,20-35,37,38,40,48-51
- T22 - slides 3-13,15,20,22
- T23 - slides 10,46,62,98,103
- T24 - slides 4,8-10,18,21-23,25,34,54,63,66-67,71-72
- T25 - slides 5,7,11-14,20,22
- Remaining lectures: full PPT as study reference
- Final exam will not ask for definitions or a memory-based listing of slide contents