Solutions to some of the pen-and-paper exercises from the labs.

LAB 2 - The TF-IDF representations for the documents and for the query would be as follows, and document 2 would be the most relevant.

1: < 0.3 , 0.2 , 0.6 , 0.3 , 0.0 , 0.3 , 0.0 , 0.0 , 0.0 , 0.0 >
2: < 0.0 , 0.1 , 0.0 , 0.0 , 0.0 , 0.0 , 0.6 , 0.6 , 0.3 , 0.1 >
3: < 0.3 , 0.1 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.3 , 0.3 , 0.1 >
4: < 0.0 , 0.0 , 0.0 , 0.3 , 0.0 , 0.3 , 0.0 , 0.0 , 0.0 , 0.1 >
Q: < 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 1.0 , 0.0 , 1.0 >

LAB 3 - The classification would result that the individual is not a subscriber. Without smoothing, the probability for the "not subscriber" class would be 0.04.

LAB 4 - The solutions for each of the questions are as follows.

a) The total probability would be 0.003046

b) The occurrence probability would be 0.0009

c) The most likely sequence of states would be 1212

LAB 5 - The solutions for each of the questions are as follows.

1a) 0.60

1b) 0.50

1c) The precision at each recall point would be 1.00 ; 1.00 ; 0.67 ; 0.60 ; 0.57 ; 0.56 ; 0.55 ; 0.54 ; 0.53 ; 0.53 ; 0.53

2a) Confusion matrix:

M = |TN	FP| = |4 1|
    |FN TP|   |2 3|
2b) accuracy=0.7 ; precision=0.75 ; recall=0.6 ; F1=0.67

3a) The values are as follows:

Precision_ORG = 1
Recall_ORG = 2/4
F1_ORG = 0.66

Precision_POS = 1
Recall_POS = 1
F1_POS = 1

Precision_PER = 0.5
Recall_PER = 0.75
F1_PER = 0.6

3b)  MacroAvgPr = 0.83  ; MicroAvgPr = 0.73