Bibliography Support
- Mohammed J. Zaki, Wagner Meira, Jr. Data Mining and Analysis: Fundamental Concepts and Algorithms. 2014 Cambridge University Press
- Jiawei Han, Micheline Kamber, and Jian Pei. Data Mining: Concepts and Techniques, 3rd edition, Morgan Kaufmann, 2011
GDPR.
Doing good data science, by DJ Patil, Hilary Mason and Mike Loukides. O'Riley. July 10, 2018.
The ethical side of data science and AI, by Shreyas S. Medium. Oct 21, 2018
Physiognomy?s New Clothes, by Blaise Aguera y Arcas. Medium. May 7, 2017
How big data is unfair - Understanding unintended sources of unfairness in data driven decision making, by Moritz Hardt. Medium. Sep 26, 2014
Slides
- T1a - Introduction and organization
- T1b - Data Science (Han 1, Zaki 1)
- T2 - Data Exploration (Han 2, Zaki 2 & 3)
- T3 - Data Visualization (Few)
- T4 - Classification: Analogizers (Han 9.5 & 9.3, Zaki 18.3 & 21.1)
- T5 - Classification: Bayesians (Han 8.3 & 8.5, Zaki 18)
- T6 - Overfitting, Training strategies, Data Balancing, ROC charts (Han 12, Zaki 22)
- T7,8 - Classification: Decision Trees (Han 8.2, Zaki 19, Wu&Kumar 10)
- T9 - Classification: Ensemble Methods (Han 8.6)
- T10 - Classification: connectionists (Han 9.2)
- T11 - Classification: genetic algorithms
- T12 - Pattern Mining (Han 6 & 7, Zaki 8 & 9 & 10 & 12)
- T13,14 - Clustering (Han 10 & 11.1, Zaki 13-17)
- T15 - Data Reduction: Feature Selection (Han 3.4 & 3.5, Zaki 6 & 7)
- T16 - Data Reduction: Principal Component Analysis (check T15 references)
- T17 - Regression (not covered in bibliography, Gelman&Hill part 1A)
- T18 - Biclustering (not covered in bibliography, Madeira&Oliveira, Henriques&Madeira)
- T19 - Outlier Analysis (not covered in bibliography, Aggarwal)
- T20 - Network Data Analysis (Aggarwal2011-1&2, Wu&Kumar 6)
- T21 - Temporal Data Mining (not covered in bibliography, Esling&Agon, Mörchen)
- T22 - Time Series Description and Forecasting (not covered in bibliography, Bisgaard&Kulahci)
- T23 - Complex Data Mining (not covered in bibliography, Atluri&Karpatne&Kumar and Dzeroski)
- T24 - Notes on Big Data Analysis (not covered in bibliography, Zaki and Aggarwal)
- T25 - Privacy an Ethical Concerns
- T26 - Closing remarks (no slides)
Exercises
- Exercises Book: Part I
- Exercises Book: Part II
- PAA, SAX, SDFT and Wavelet transformations will not be covered in the final exam
- PAA, SAX, SDFT and Wavelet transformations will not be covered in the final exam
Tips about Rui's classes for exam preparation
- T13,14 (clustering): slides 5,6,12,14,15,18,20,22-40,49-71,83-98,108-112,115,120,121
- T15 (data reduction): slides 4-32,36-43
- T16 (PCA): slides 5-27,33,34,45,46
- T17 (regression): slides 3-43,49-52
- T18 (biclustering): slides 3-35,49,53,54
- T19 (outlier analysis): 3-11,14,19,21,23,26,27
- T21 (temporal data mining): 4-20,24-30,34-45,48
- T22 (time series forecasting): 4-6,10-15,18-37,40-51,53,55-63,66-73
- T23 (complex data mining): 4,12-14,17,20,21,24-29,32,34-36,39,41,43,44,48,50-54,60,61
- T24 (big data analysis): 3-13,15,18,21,22,31,33
- Remaining lectures: full PPT as study reference
- Final exam will not ask for definitions or a memory-based listing of slide contents
Exams
- 2019/2020 - 27 January 2020
- 2019/2020 - 13 January 2020
- 2018/2019 - 14 January 2019
- I.5 False (per
folditeration)
- I.5 False (per
- 2018/2019 - 28 January 2019
- III.3 False
- III.3 False