Optional Evaluation

Each group should show their results (summary charts) concerning the topics listed for each week for the 1st dataset in the project, at their lab. Evaluation points:
  • Week 3 (Sep 30th) - Univariate analysis
  • Week 4 (Oct 7th) - Multivariate analysis
  • Week 5 (Oct 14th) - Naive Bayes and KNN
  • Week 6 (Oct 21st) - Decision trees
  • Week 7 (Oct 28th) - Ensembles
  • Week 8 (Nov 4th) - Pattern Mining
  • Week 9 (Nov 11th) - Clustering
The evaluations on the last  three labs will cover the topics exercised during the class:
  • Week 11 (Nov 25th) - Regression
  • Week 12 (Dec 2nd) - Forecasting
  • Week 13 (Dec 9th) - Time series data analysis
  • Extra mark - Biclustering ( if you miss any previous point) 

Guides

Packages

Python
  • NumPy/SciPy for scientific computing
  • pandas for data manipulation
    • read_csv
  • scikit-learn for machine learning
    • model_selection : train_test_split, StratifiedKFold
    • neighbors.KNeighborsClassifier
    • naive_bayes.GaussianNB
    • naive_bayes.MultinomialNB
    • tree.DecisionTreeClassifier
    • ensemble.RandomForestClassifier
    • metrics: confusion_matrix, roc_curve, auc
  • matplotlib and Seaborn to visualize data
  • tslearn for time series data distances, representations and mining
  • imblearn (http://imbalanced-learn.org/en/stable/)
    • under_sampling: RandomUnderSampler
    • over_sampling: RandomOverSampler, SMOTE

    • dplyrplyr and data.table to easily manipulate data, 
    • stringr to manipulate strings
    • zoo to work with regular and irregular time series 
    • ggvislattice and ggplot2 to visualize data
    • caret for machine learning: train, create_data_partition
    • smotefamily.SMOTE, RWeka.J48,  rpart

    Tutorials

    PythonR