HERE Technologies powers mapping, navigation, and location solutions for enterprise customers. Their mission is to create a digital representation of reality to radically improve the way everyone and everything lives, moves, and interacts. Their maps, data, and products power solutions for self-driving cars, truck fleets, logistics, government, and many more. With a goal of creating three dimensional maps that are accurate down to a few centimeters, HERE has remained an innovator in the space since the mid-80’s, giving hundreds of businesses and organizations detailed, precise and actionable location data and insights.
Mapping at the level of accuracy HERE Technologies strives for requires multiple different approaches and machine learning models. One of those methods leverages road signs.
One of the stated goals of HERE is to create an understanding of every road sign on earth. That includes both what those signs actually say–turn warnings, speed limits, deer crossings, etc.–and, just as importantly, where those signs actually are. Understanding a sign’s precise location relative to the road can be crucial, as it can allow a vehicle to more accurately pinpoint exactly where it is, as well as exactly where certain laws (like the speed limit) change.
HERE does this in an innovative way. They use cars outfitted with four separate cameras: one front-facing, one rear-facing, one left-facing, and one right-facing, with each camera capturing a video feed. Their models are looking for something they call “sign fusion,” where several cameras can see the sign and understand that it’s a single object. Essentially, HERE is trying to identify the physical location of individual signs to take measurements and aggregate their locations for increased map and model accuracy.
Teaching models how to identify signs as vehicles travel the road helps improve their models–and thus their maps’–overall accuracy. The problem is that you need to use video data to achieve the level of precision HERE strives for. They need that data to both train and fine-tune the performance of their sign-detection algorithms. But annotating individual frames from a video is both time-consuming and costly. A single minute of video can contain thousands of frames of still images and annotating each frame can be expensive. That’s why when Appen told HERE Technologies about its Machine Learning Assisted Video Object Tracking Tool, they were eager to give it a try.
HERE has an ambitious goal of annotating tens of thousands of kilometers of driven roads for the ground truth data that powers their sign-detection models. Parsing videos into images for that goal, however, is simply untenable.
Our Machine Learning assisted Video Object Tracking solution presented a perfect solution to this lofty ambition. That’s because it combines human intelligence with machine learning to drastically increase the speed of video annotation. It works like this:
On the first frame of a video, a human labeler annotates the objects in question. For HERE Technologies, that means a contributor is placing bounding boxes over various signs.
Functionally, this step is like a typical image annotation workflow. What’s next is where the machine learning kicks in. Using a deep learning ensemble model, the tool predicts where the annotated object has moved in the next frame. The labels persist, even if there are dozens of instances of the same class. Instead of relabeling the entire image from scratch, a human labeler simply corrects the annotation if necessary, dragging or resizing the persisted label to squarely fit around the annotated object. This carries on for the length of the video, until the final frame.
HERE hopes these labels will help them train and understand the performance of their sign-detection algorithm, learning which signs are hard to predict, if they’re finding too many false positives, what training data is needed to confidently identify new classes, and more.
Though HERE Technologies has only recently begun using our Machine Learning assisted Video Object Tracking, they believe it has the promise to massively speed up the collection of ground truth data for their models. Because frame-by-frame annotation of hundreds of thousands of signs over hours and hours of videos is nearly impossible. But with video object tracking, HERE can create actionable data for the researchers and developers who fine-tune their maps in far less time than ever before.