Adding buildings to Google Maps story – Machine Learning Stories

Your goal is to create a classifier that transcripts numbers of houses from pictures. You have 30,000,000 annotated images, and billions of images to tag by your system, with automatically detected numbers’ areas.

The test set is prepared to imitate production i.e. images are automatically detected by another classifier.

Can you create the transcription system that will allow hundreds of millions of addresses to be transcribed both faster and at a lower cost than would have been possible via human eﬀort?

Choosing the metric is often the first necessary step for building your ML system because:

	Your error metric will guide all your future actions
	It will show you if your classifier is underfitting or overfitting
	At the beginning, you can choose any metric you want, because there are more important things to work on.

Correct

Incorrect

What seems to be a general principle for choosing the metric?

	Choose the metric that can be easily translated as accuracy between 0 and 100% because the business people like to understand what they see.
	Choose the metric that you can achieve the biggest score on.
	Tailor the choice of metric to the business goals for the project.

Correct

Incorrect

What quality of the system do you expect?

	You should drive your system quality to have a zero error so it can be the best.
	If you have a huge amount of data, you can create a perfect system, so zero error is possible.
	You can suspect that your error will never be equal to zero, because of some characteristics of data

Correct

Incorrect

For the Street View task, the goal of the project was to reach human-level transcription accuracy. How would you define human-level performance?

	Based on human performance on the test set
	Based on human performance on the development set
	Based on human performance on the train set

Correct

Incorrect

The human level performance was estimated to be 98% on the test set. It’s time for creating the first model.

Your colleague says that it’s nice to try a brand new fancy Recurrent Boomerang Network that he found in the paper yesterday. It’s 30% better than previous SOTA in a similar task.

	You should implement the best solution possible to start with high accuracy, so Recurrent Boomerang Network is a good choice
	It’s better to take a simple model like logistic regression to check if it’s sufficient for the problem. Moreover, it’s easy to interpret the predictions
	You could download some well-understood architectures or pretrained models that perform well in similar tasks, to create a solid baseline as fast as possible.

Correct

Incorrect

After having a first model, you are thinking of changing the hyperparameters. There are some sensible guesses that learning could happen faster. For now, you are using SGD and batch size equal to 1024. You can extend your batch size and lower the learning rate. What is the best decision that will speed up learning?

	Lower learning rate and extend batch size.
	Try changing each parameter one by one.
	Changing the learning rate is more promising than changing batch size.
	Changing batch size is more promising than changing the learning rate.

Correct

Incorrect

After trying a few models, the score is still below the expectations. The guess in the team is that the network can overfit the data. Train and test losses were printed:

	The network is underfitting
	The network is overfitting

Correct

Incorrect

A lot of efforts were made to increase the capacity of the model: the network is bigger, deeper, and there’s less regularization. After a bunch of experiments, the score is still below the expectations. Train and test losses were printed:

	The network is underfitting
	The network is overfitting
	Training data problem

Correct

Incorrect

You decided to visualize the behavior of the model. Which from below is the best to visualize?

	All the dev data and the predictions so we can debug all the problems in the model
	All the train data and the predictions so we can debug all the problems in the model
	The worst mistakes made on the train set e.g. the most confident network outputs that given an incorrect answer.
	The worst mistakes made on the dev set e.g. the most confident network outputs that given an incorrect answer.

Correct

Incorrect

Error analysis proved mistakes to mostly consist of examples where the input image had been cropped too tightly, with some of the digits of the address being removed by the cropping operation. For example, a photo of an address “169” might be cropped too tightly, with only the “69” remaining visible. After visualizing most confident mistakes it became clear that the errors come from. What next?

Image:

Reference: 169

	Still work on the model so eventually, you will get a high score
	Expand the width of the crop region to be systematically wider than the address number detection system predicted

Correct

Incorrect

The simple previous decision added ten percentage points to the transcription system accuracy. Moreover, data seems to be a lot more reasonable to work on. Is there any space for improvement? Let’s plot the loss again:

Should you:

	Make the model larger
	Make the model smaller

Correct

Incorrect

It is necessary to enlarge the model. However, the computational cost was already so high that the management asks not can exceed the conditions they gave. Taking it into the consideration, it’s best:

	Increase the number of hidden units in every layer
	Increase the number of hidden layers with the same number of hidden units

Correct

Incorrect

Congratulations! Your goal has been scored! Take a look at the summary from Deep Learning Book:

Overall, the transcription project was a great success and allowed hundreds of millions of addresses to be transcribed both faster and at a lower cost than would have been possible via human eﬀort. We hope that the design principles described in this chapter will lead to many other similar successes.

Did you like the story? Leave your opinion!

Leave a Comment Cancel reply