Please always consult the FAQ before posting questions to the faculty hosts.
  • Question 1 (a), you need only one iteration.
  • Question 3, you need only to estimate the VC dimension approximately. You do not need to prove it.
  • Part II: normalization is not essential since all the input variables have the same domain.

  • 1. Correction on the statement: "consider a 5-CV with a fixed zero seed to answer (3) (2) and (4) (3)"
  • 2. In questions (2) and (3), l2 regularization can be simulated by increasing alpha (regularization term): https://scikit-learn.org/stable/auto_examples/neural_networks/plot_mlp_alpha.html
  • 3. Which learning rate should I fix in Part II? Please consider using the sklearn default, i.e. learning_rate=0.001

  • 1. Programming guidelines for selecting features using multual information in sklearn: https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html
  • 2. Which evaluation schema should we use in Part II?
    Answer: We suggest you to preserve CV from first delivery as you have to establish comparisons between settings.
  • 3. On Part II: you can as well show two separate plots: one for 5i (feature selection impact) and another for 5ii (tree depth impact).

  • 1. The Breast cancer dataset has 16 observations with a missing value. How should I proceed?
    Answer: You can simply remove these observations from the dataset.
  • 2. Which observations should we use to train and test the Bayesian classifier (Part I)?
    Answer: The training is to be performed on the 10 training observations. The testing should also consider the 10 training observations, hence "confusion matrix with the training observations" and "training F1 score".
  • 3. Hint: in contrast with mass probability functions, density probability functions can take values higher than 1, e.g. y~Uniform(0.5,1).