Optional Evaluation
Each group should show their results (summary charts) concerning the topics listed for each week for the
1st dataset in the project, at their lab. Evaluation points:
- Week 3 (Sep 30th) - Univariate analysis
- Week 4 (Oct 7th) - Multivariate analysis
- Week 5 (Oct 14th) - Naive Bayes and KNN
- Week 6 (Oct 21st) - Decision trees
- Week 7 (Oct 28th) - Ensembles
- Week 8 (Nov 4th) - Pattern Mining
- Week 9 (Nov 11th) - Clustering
The evaluations on the last three labs will cover the topics exercised during the class:
- Week 11 (Nov 25th) - Regression
- Week 12 (Dec 2nd) - Forecasting
- Week 13 (Dec 9th) - Time series data analysis
- Extra mark - Biclustering ( if you miss any previous point)
Guides
Packages
Python
- NumPy/SciPy for scientific computing
- pandas for data manipulation
- scikit-learn for machine learning
- model_selection : train_test_split, StratifiedKFold
- neighbors.KNeighborsClassifier
- naive_bayes.GaussianNB
- naive_bayes.MultinomialNB
- tree.DecisionTreeClassifier
- ensemble.RandomForestClassifier
- metrics: confusion_matrix, roc_curve, auc
- matplotlib and Seaborn to visualize data
- tslearn for time series data distances, representations and mining
- imblearn (http://imbalanced-learn.org/en/stable/)
- under_sampling: RandomUnderSampler
- over_sampling: RandomOverSampler, SMOTE
R
- dplyr, plyr and data.table to easily manipulate data,
- stringr to manipulate strings
- zoo to work with regular and irregular time series
- ggvis, lattice and ggplot2 to visualize data
- caret for machine learning: train, create_data_partition
- smotefamily.SMOTE, RWeka.J48, rpart
Tutorials
PythonR