Época Especial Project Description

Projects should be made by teams of 3 students, but students may do it alone or in teams of 2 or 3. In all cases, teams (even with just one student) have to enrol and deliver the work report through Fenix

Project Description

Projects have to be made by teams of 3 students, the exception are "Trabalhadores-estudantes", who may work alone or in teams of 2 or 3. In all cases, teams (even with just one student) have to enrol and deliver the work report through Fenix

FAQ

Please always consult the FAQ before posting questions to the faculty hosts. We encourage the use of the forum to post any question about the project. This media will be prioritised over email communication.

1. Is there any preference on the template to use (Latex vs Word)? No. All students just have to deliver a PDF file with their report. There is no preference for one of them. As usual is strongly recommended that students use the easiest for them.

2. How do we deal with missing values in scikit-learn? scikit-learn doesn't deal with missing values, which means that we are not able to compare the performance when nothing is changed. In order to serve as a baseline, you should create a new value (such as UNKNOWN or NA) and store the performance achieved. Note that the same transformation is required for the test dataset.
3. How do we run pattern mining?As you have seen, pattern mining explodes in the presence of large number of items and low supports.There are some tricks to enlarge the robustness of the running:- to use FP-growth instead of Apriori: they give exactly the same patterns as result- to increase the minimum support- to limit the maximum pattern length- to reduce the memory required  to represent each item, for example translate each binary variable represented by a number into a boolean
4. What about creativity? This kind of project requires the systematic analysis of the data, exploring different preparation techniques and learning methods parameterisations, which isn't a very creative task. However, along with that exploration, is usual to find a subproblem which could benefit from a different approach, involving some combination of techniques or a particular manipulation. In some manner, we are expecting a somehow "out-of-the-box" solution. The exploration of just another learning technique won't be rewarded.