Last Lab
- Quizz to submit here until December 20th at 23:59 (one quizz per team)
- PDF with the questions in the quizz (if for some reason you are not able to see the questions in advance)
- Data to explore
Labs Schedule
Project review form
Schedule for attending labs
Students can opt to be evaluated weekly.
In order to do that, each group has to show their results (summary charts) over both datasets in the project, at their lab, concerning the topics listed for each week.
- Week 2 (Sep 28th) Project schedule and resource allocation - not graded - Deliver Schedule Form @ Google Classroom until following Monday
- Week 3 (Oct 5th) - Data profiling - dimensionality, distribution and granularity
- Week 4 (Oct 12th) - Data profiling - sparsity and correlation
- Week 5 (Oct 19th) - Naive Bayes and KNN
- Week 6 (Oct 26th) - Decision trees and Overfitting
- Week 7 (Nov 2nd) - Random forests, Feature engineering, Balancing and Overfitting
- Week 8 (Nov 9th) - Regression, Gradient Boosting
- Week 9 (Nov 16th) - Clustering and Feature extraction
- Week 10 (Nov 23rd) - Pattern Mining
- Week 12 (Dec 7th) - Anomaly detection and SNA
- Week 13 (Dec 14th) - Time series and Forecasting
Guides
- DSLabs - Python Tutorial for Data Science by Cláudia Antunes
Packages
Python
- NumPy/SciPy for scientific computing
- pandas for data manipulation
- scikit-learn for machine learning
- matplotlib and Seaborn to visualize data
- tslearn for time series data distances, representations and mining
- imblearn (http://imbalanced-learn.org/en/stable/)
- apyori for pattern mining
R
- dplyr, plyr and data.table to easily manipulate data,
- stringr to manipulate strings
- zoo to work with regular and irregular time series
- ggvis, lattice and ggplot2 to visualize data
- caret for machine learning: train, create_data_partition
- smotefamily.SMOTE, RWeka.J48, rpart