A chain of shoe stores asked you to create a clothing recommendation system based on their own database. They have the pictures of shoes of people buying the shoes (standing at the checkout), and the model of the shoes they bought.
A person standing at the checkout:
Shoes he is wearing:
Shoes he bought:
False False True False False
The company got 400 examples of real data from one shop with the system installed. Installation of the cameras is expensive, and the project is risky, so the company decided to use artificial data they already have.
You’ve been given 400 real examples and 2000 images of THE COMPANY WORKERS shoes and their liking. The company thinks it is possible to recommend the shoe from the current offer, based on such artificial dataset.
What metrics seem to be most sensible for the task?
Estimating human level performance will let you set up your goals.
On your order to estimate human level performance the experiment has been made. Salesmen were told to guess what will the customer buy just after he emerged in the shop. The accuracy was 65%!
The company says you’ll be getting 1000 more artificial examples each week. You are thinking about the validation. As a rule of thumb, quality of the system is estimated better using:
In the presented problem, it’s better to use
What split should you do?
After getting more and more data, you collected 400 real examples, and 10000 artificial examples.
The chain of stores is happy with your work because there’s a lot of talk about Neural Networks in media. They decided to gather some more real data for you, and now you have 10 000 of artificial examples, 400 real examples you had, and 10 000 real examples from the current offer.
It turns out ShoeMakers has hired one of your competitors to build a system as well. Your system has higher accuracy! However, when ShoeMakers tries out your and your competitor’s systems, they conclude they actually like your competitor’s system better, because even though you have higher overall accuracy, you have more false positives (happening to recommend definitely wrong shoe). What should you do?
You’ve handily beaten your competitor, and your system is now deployed, and the performance of your system slowly degrades because your data is being tested on a new type of data.
ShoeMakers are very happy with your work. They think you could use your system to recommend the hats for the befriended company.