**Homeworks****:**

**H1**: Model evaluation and Bayesian learning- deadline: October 18th 23h59

**H2**: Decision trees and regression models- deadline: October 27th 23h59

**H3**: Neural networks- deadline: November 5th 23h59

**H4**: Clustering and VC dimension- deadline: November
**14**th 23h59

- deadline: November

**Materials**

**:**

- kin8nm dataset
- breast cancer dataset
- report template guidelines in .doc and .pdf

**FAQ**

Please always consult the FAQ before posting questions to the faculty hosts.**H4**

- Question 1 (a), you need only one iteration.
- Question 3, you need only to estimate the VC dimension approximately. You do not need to prove it.
- Part II: normalization is not essential since all the input variables have the same domain.

**H3**

**1.**Correction on the statement: "consider a 5-CV with a fixed zero seed to answer~~(3)~~(2) and~~(4)~~(3)"**2.**In questions (2) and (3), l2 regularization can be simulated by increasing alpha (regularization term): https://scikit-learn.org/stable/auto_examples/neural_networks/plot_mlp_alpha.html**3.**Which learning rate should I fix in Part II? Please consider using the sklearn default, i.e. learning_rate=0.001

**H2**

**1.**Programming guidelines for selecting features using multual information in sklearn: https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html**2.**Which evaluation schema should we use in Part II?*Answer*: We suggest you to preserve CV from first delivery as you have to establish comparisons between settings.**3.**On Part II: you can as well show two separate plots: one for 5i (feature selection impact) and another for 5ii (tree depth impact).

**H1**

**1.**The Breast cancer dataset has 16 observations with a missing value. How should I proceed?*Answer*: You can simply remove these observations from the dataset.**2.**Which observations should we use to train and test the Bayesian classifier (Part I)?*Answer*: The training is to be performed on the 10 training observations. The testing should also consider the 10 training observations, hence "confusion matrix with the*training*observations" and "*training*F1 score".**3.***Hint*: in contrast with mass probability functions, density probability functions can take values higher than 1, e.g. y~Uniform(0.5,1).