Your partner (for the fluent writings let’s say it’s a woman), the psychology student, got the grant for the research about the relationships. She created a pretty big collection of 12933 questionnaires describing the personalities of the people being in a long-term relationship, and pairs of the people from the broken relationships.
Because your girlfriend knows you’re working with data modeling, she has a small job after hours – let’s create a classifier that will help her to promote the research. Do the two people match?
At the beginning of the project:
You trained a very first model. Errors (100%-Accuracy) are:
Training set | 14.0% |
Dev set | 15.5% |
Which sentence do you agree with?
You ask your girlfriend if there’s something you can define as “human-level performance.” She says that she can try to classify the pairs questionnaires on the knowledge and intuition she has. How many examples should she classify?
Based on your girlfriend's score, and still working on the model, you have:
Human-level performance | 7.0% |
Training set | 12.0% |
Dev set | 12.5% |
Which two of the following options are the most promising?
You also evaluate your model on the test set, and find the following:
Human-level performance | 7.0% |
Training set | 12.0% |
Dev set | 12.5% |
Test set | 20.5% |
Your friend has called you on a telephone and he wants to take the advice from you. Surprisingly he’s having a very similar problem. His results on some dataset are as follows:
Human-level performance | 1.0% |
Training set | 1.2% |
Dev set | 1.2% |
Test set | 0.8% |
What’s the best advice for him?
Your girlfriend found a similar experiment on the internet, but only some of the open questions are the same. You want to use the data. Where can it be added?
After further work and adding the new data to the train set, and creating new, bigger splits, you’re getting the following results:
Human-level performance | 7.0% |
Training set | 7.1% |
Dev set | 12.4% |
Test set | 12.5% |
Based on the table from the previous question, your girlfriend thinks that the
training dataset is easier than the dev and test sets. Do you agree?
After working further on the problem, you’ve noticed that some questionnaires have a lot of empty or almost empty answers on the test set. You think that you should delete these questionnaires form the test set. Should you also delete incorrectly filled questionnaires from the dev or train set?
You’ve filled the questionnaire together with your girlfriend and you have the final score: unfortunately, you don’t fit together, having the 28% probability of fitting. You and your girlfriend feel different, though. What does it mean?