Last Lab

  • Data to explore

Labs Schedule

Project review form


Schedule for attending labs

Students can opt to be evaluated weekly.

In order to do that, each group has to show their results (summary charts) over both datasets in the project, at their lab, concerning the topics listed for each week. 

  • Week 2 (Sep 28th) Project schedule and resource allocation - not graded - Deliver Schedule Form @ Google Classroom until following Monday
  • Week 3 (Oct 5th) - Data profiling - dimensionality, distribution and granularity
  • Week 4 (Oct 12th) - Data profiling - sparsity and correlation
  • Week 5 (Oct 19th) - Naive Bayes and KNN
  • Week 6 (Oct 26th) - Decision trees and Overfitting
  • Week 7 (Nov 2nd) - Random forests, Feature engineering, Balancing and Overfitting
  • Week 8 (Nov 9th) - Regression, Gradient Boosting
  • Week 9 (Nov 16th) - Clustering and Feature extraction
  • Week 10 (Nov 23rd) - Pattern Mining
  • Week 12 (Dec 7th) - Anomaly detection and SNA 
  • Week 13 (Dec 14th) - Time series and Forecasting

Guides

  • DSLabs - Python Tutorial for Data Science by Cláudia Antunes

Packages

Python
  • NumPy/SciPy for scientific computing
  • pandas for data manipulation
  • scikit-learn for machine learning
  • matplotlib and Seaborn to visualize data
  • tslearn for time series data distances, representations and mining
  • imblearn (http://imbalanced-learn.org/en/stable/)
  • apyori for pattern mining

    • dplyrplyr and data.table to easily manipulate data, 
    • stringr to manipulate strings
    • zoo to work with regular and irregular time series 
    • ggvislattice and ggplot2 to visualize data
    • caret for machine learning: train, create_data_partition
    • smotefamily.SMOTE, RWeka.J48,  rpart

    Tutorials

    PythonR