Pairing lovers story

Your partner (for the fluent writings let’s say it’s a woman), the psychology student, got the grant for the research about the relationships. She created a pretty big collection of 12933 questionnaires describing the personalities of the people being in a long-term relationship, and pairs of the people from the broken relationships.

Because your girlfriend knows you’re working with data modeling, she has a small job after hours – let’s create a classifier that will help her to promote the research. Do the two people match?

 

At the beginning of the project:

Correct
Incorrect

 

 

You trained a very first model. Errors (100%-Accuracy) are:

Training set 14.0%
Dev set 15.5%

Which sentence do you agree with?

Correct
Incorrect

 

 

You ask your girlfriend if there’s something you can define as “human-level performance.” She says that she can try to classify the pairs questionnaires on the knowledge and intuition she has. How many examples should she classify?

Correct
Incorrect

 

 

Based on your girlfriend's score, and still working on the model, you have:

Human-level performance  7.0%
Training set  12.0%
Dev set  12.5%

Which two of the following options are the most promising?

Correct
Incorrect

 

 

You also evaluate your model on the test set, and find the following:

Human-level performance  7.0%
Training set  12.0%
Dev set  12.5%
Test set   20.5%

 

Correct
Incorrect

 

 

Your friend has called you on a telephone and he wants to take the advice from you. Surprisingly he’s having a very similar problem. His results on some dataset are as follows:

Human-level performance 1.0%
Training set 1.2%
Dev set 1.2%
Test set  0.8%

 What’s the best advice for him?

Correct
Incorrect

 

 

Your girlfriend found a similar experiment on the internet, but only some of the open questions are the same. You want to use the data. Where can it be added?

Correct
Incorrect

 

 

After further work and adding the new data to the train set, and creating new, bigger splits, you’re getting the following results:

Human-level performance 7.0%
Training set 7.1%
Dev set 12.4%
Test set  12.5%

 

Correct
Incorrect

 

 

Based on the table from the previous question, your girlfriend thinks that the
training dataset is easier than the dev and test sets. Do you agree?

Correct
Incorrect

 

 

After working further on the problem, you’ve noticed that some questionnaires have a lot of empty or almost empty answers on the test set. You think that you should delete these questionnaires form the test set. Should you also delete incorrectly filled questionnaires from the dev or train set?

Correct
Incorrect

 

 

You’ve filled the questionnaire together with your girlfriend and you have the final score: unfortunately, you don’t fit together, having the 28% probability of fitting. You and your girlfriend feel different, though. What does it mean?

Correct
Incorrect

Leave a Comment

Your email address will not be published. Required fields are marked *