Want to improve your machine learning training data? Contact Us

Test your machine learning IQ

How much do you really know about machine learning?
Answer 10 quick questions to find out now.

1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
  • Answer: e. None of the above, they all rely on machine learning

    Of all the possible responses, it’s tempting to decide that maybe GPS systems don’t really use machine learning, but the reality is that many do! Machine learning helps your GPS system do predictive analysis and proactively suggest you avoid locations along your route that are typically congested during your time of travel. Systems also use machine learning to recommend a nearby gas station when you’ve been travelling a distance that’s resulted in the need to re-fuel.

    Reference: 9 Applications of Machine Learning from Day-to-Day Life

Image

What is machine learning?

Machine learning is the process of teaching machines how to learn by providing them with guidance that helps them develop logic on their own and giving them access to data you want them to explore. The result is some form of artificial intelligence, or AI.

“Despite its name, there is nothing “artificial” about this technology – it is made by humans, intended to behave like humans and affects humans. So if we want it to play a positive role in tomorrow’s world, it must be guided by human concerns.”


Fei-Fei Li on “human-centered AI”, New York Times

Image

How does machine learning work?

Machines, most often computers, are given rules to follow known as algorithms. They are also given an initial set of data to explore when they first begin learning. That data is called training data.

Computers start to recognize patterns and make decisions based on algorithms and training data. Depending on the type of machine learning being used, they are also given targets to hit or they receive rewards when they make the right decision or take a positive step towards their end goal.

As they build this understanding or “learn”, they work through a series of steps to transform new inputs into outputs which may consist of brand-new data, labeled data, decisions, or even actions.

The idea is that they learn enough to operate without any human intervention. In this way they start to develop and demonstrate what we call artificial intelligence. Machine learning is one of the main ways artificial intelligence is created.

Other examples of artificial intelligence include robotics, speech recognition, and natural language generation, all of which also require some element of machine learning.

There are many different reasons to implement machine learning and ways to go about it. There are also a variety of machine learning algorithms and types and sources of training data.

Image

Why is machine learning growing so quickly?

In recent years, there have been 3 things that have contributed to the widespread interest in machine learning.

  1. Growth in all types of data
  2. Declining cost of storage
  3. Massive improvements in computing power

As with anything, there is evidence of other contributing factors and business drivers, but these three advances have clearly been dominant in terms of paving the way for accelerated use of machine learning and new and innovative applications of artificial intelligence.

2018 Machine Learning

Facts & Figures

Market Growth
Spending in AI and ML
International Data Corporation (IDC) forecasts that spending on AI and ML will grow from $12B in 2017 to $57.6B by 2021.
Source: IDC
Market Growth
# of Projects
Deloitte Global predicts the number of machine learning pilots and implementations will double in 2018 compared to 2017, and double again by 2020.
Source: Deloitte
Market Growth
# of Patents
Machine learning patents grew at a 34% Compound Annual Growth Rate (CAGR) between 2013 and 2017, the third-fastest growing category of all patents granted.
Source: IFI Claims
Key Enablers
Endless amounts of data
90% of the data that exists in the world was generated in the past 2 years.
Source: Forbes

Experts predict a 4,300% increase in annual data production by 2020.
Source: Accenture

From online retail transactions to videos to social media to internet searches, the volume is hard to fathom.

Every MINUTE of every day in 2018:

  • Amazon ships 1,111 packages
  • 400 hours of video are uploaded to YouTube
  • Google conducts 3,877,140 searches
  • 2,083,333 snaps are sent on Snapchat

Source: Domo

Key Enablers
Exponential growth in computing power
The most powerful supercomputer is 200x as powerful as it was 10 years ago

2007

46 trillion, 200 billion calculations per second 478,200,000,000,000

2017

93 quadrillion calculations per second 93,000,000,000,000,000

Key Enablers
Precipitous decline in the cost of storage
HDD $/GB
2000 – $10
2005 – $1
2010 – $.10 (a dime)
Today – $.02 (2 pennies)
Source: mkomo
Top & Trending Use Cases

  • Automated Customer Service Agents
  • Diagnostic and Treatment Systems
  • Intelligent Processing Automation
  • Automated Preventative Maintenance
  • Expert Shopping Advisors
  • Public Safety and Emergency Response

Source: IDC

Case Studies in the News
Netflix saved $1 billion as a result of its machine learning algorithm which recommends personalized TV shows and movies to subscribers.
Source: StatWolf
Case Studies in the News
Same-day shipping from Amazon is available because of machine learning. In fact, their current ML algorithm has decreased the ‘click-to-ship’ time by 225%.
Source: Forbes
Case Studies in the News
Uber uses machine learning to improve arrival times, pick-up locations, and UberEATS’ delivery estimations by 26%.
Source: Red Pixie
Machine Learning Jobs
In 2020, AI will become a positive net job motivator, creating 2.3M jobs while only eliminating 1.8M jobs.
Source: Gartner
Machine Learning Jobs
One AI specialist on staff will cost you a salary worth one 2017 Rolls-Royce Ghost Series II, or more. With salary and company stock, AI specialists can fetch compensation of $300,000 to $500,000.

Why invest in machine learning?


Organizations in both the public and private sector are investing in machine learning because it allows them to improve in the following ways:

  • Speed
    Get answers and perform sophisticated calculations faster.
  • Power
    Process more data and conduct more complex analytics than ever before.
  • Intelligence
    Uncover new insights by tapping into data previously indecipherable.
  • Efficiency
    Conduct more analysis with fewer human resources.

No matter what industry you are in, you will probably find a solid use case for machine learning and be able to justify the investment through anticipated return to top line and/or bottom line revenue numbers.

Machine learning has been proven to reduce and even eliminate manual data entry, detect spam fight fraud, recommend products – to consumers and to your R&D team! It can be used to predict

when maintenance will be needed on all sorts of equipment and infrastructure, it can tell you more about your customers than you’ve ever known before, improve customer satisfaction and diagnose a myriad of health and medical issues.

If you haven’t already invested in machine learning, you need to ask yourself why not? What is preventing you from getting started?

Image

What is machine learning used for?


Retail and eCommerce

Artificial intelligence and machine learning are being used to boost conversion rates, improve customer experience, deliver personalization and more

  • Search relevance

    Online shoppers don’t have the luxury of asking a salesperson where they can find a product. Your onsite search engine fulfills that role. By interpreting search queries, assessing user intent and using that information to train your search algorithm, results become more relevant which results in higher purchase conversion.

  • Personalization

    Customers seek a more personalized experience when shopping online. Providing recommendations to shoppers or search results based on their past behavior can help create stronger user engagement and retention.

  • Enhanced customer service

    Chatbots act as a virtual shopping assistant. Like an employee, they both need to be trained to know not only what you sell, but also the terminology people use for the many products on your site, make sure they have the training data they need.

Tech

Search engines and other leading technology companies use machine learning to deliver innovative products and improve user experience

  • Search relevance

    Search engine algorithms use machine learning to drive stronger user engagement. By interpreting queries and assessing user intent, search results become more relevant, which creates higher user satisfaction.

  • Personalization

    Analyzing data activity and preferences can help search engine and social media providers personalize content feeds and recommendations, enhancing online customer experiences.

  • Natural language processing (NLP)

    NLP can analyze language patterns to understand text, for example on social media. This technology can be used to track customer sentiment and develop engagement strategies.

Financial Services

Leaders in financial services use machine learning and artificial intelligence to improve customer acquisition, retention and overall experience

  • Risk management

    Anti-money laundering (AML), Know Your Customer (KYC), and fraud detection programs require sophisticated tools to spot potential threats. Relying solely on employees to spot patterns in financial records can be both time-consuming and costly. Machine learning and artificial intelligence allow financial institutions to sift through data and find anomalies quickly, preventing illegal activity and saving potential company losses.

  • Revenue generation

    Machine learning algorithms are now being leveraged by financial institutions to create investment strategies, freeing up financial advisors to engage more with their clients.

  • Enhanced customer experiences

    With today’s expectation for on-demand customer service, chatbots have a crucial role to fill. Chatbots help to delight customers with real-time feedback and a streamlined experience.

Automotive

Accelerate machine learning with training data for self-driving cars and improve speech-recognition systems, in-car navigation and user experience with more accurate field testing

  • Autonomous vehicles

    While self-driving cars are extremely complex machines, their Artificial Intelligence (AI) is powered by machine learning. As the car moves forward, it processes a lot of visual data —just like a driver does when looking out the windshield. Vehicles need to assign meaning to large volumes of image data, such as identifying a tree or pedestrian, and then feed that back into the car’s AI to teach it.

  • Voice recognition

    Traditional dashboards—and more recently, mobile devices—take a driver’s hands and eyes off the road. Speech interfaces don’t. Connected cars need access to large-scale speech data collections to train the speech interface, providing consumers around the world with the best user experience.

  • Predicting behavior

    Advances in voice recognition and cameras that can help track driver emotions are an important next step in Human Machine Interface, giving cars the ability to identify speakers’ emotions as well as their words—so they can tell when users get frustrated, and respond accordingly.

Government

Improve emergency response, defense initiatives and law enforcement with secure data services

  • Defense

    Using social media monitoring, computer vision, and data annotation, government agencies are now able to extract information to aid with terrorist surveillance, monitor national security threats, and more.

  • National emergencies

    Emergencies like natural disasters or coordinated attacks can happen without a moment’s notice. When lives are at stake, responding immediately and with coordination is key. Using translation, voice recognition, and text data collection, emergency responders around the world can communicate efficiently with those in harm’s way.

  • Law enforcement

    Secure transcription allows law enforcement to accomplish many objectives, including capturing files from Body Worn Video, official record keeping, and archival record solutions.

Healthcare

Exciting uses of artificial intelligence (AI) and machine learning in Healthcare are transforming patient care

  • Predictive analysis

    Evaluate trends, anticipate outbreaks, and forecast patient needs.

  • Chatbots and virtual healthcare

    Provide faster and better customer service.

  • Advances in underwriting

    Use machine learning to build stronger underwriting models based on a wide variety of data points.

What are the top machine learning methods?

“Most of human and animal learning is unsupervised learning.
If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don’t know how to make the cake. We need to solve the unsupervised learning problem before we can even think of getting to true AI.”

Yann Lecun, Director of AI Research, Facebook

 

Supervised Learning

This type of machine learning is probably the simplest and most accurate form of machine learning.

Basically, machines are given labeled training data that has both inputs and target outputs. The machine’s task is to learn the logic used to derive the target outputs by following the examples it was given. Eventually, after deriving the logic from the labeled training data, when it gets new inputs it will arrive at the targets or outputs independently.

Two examples of models commonly used for supervised machine learning include classification and regression.

  • Classification

    Classification is easiest to understand. The data is evaluated to determine which class it falls into. An example might be a machine learning model which asks a machine to determine if a picture is of a horse or not. That’s a simple yes/no response and an example of binary classification. After providing training data with enough pictures of horses and non-horses that the machine can learn the distinguishing characteristics of a horse, the machine will be able to to look at a picture on its own and tell you if it’s a horse or not.

  • Regression

    Regression is slightly different. Instead of separating the data and assigning it to a class, the machine is asked to predict a response or output based on the responses it got from the initial training data.
     
    An easy to follow example is if the initial inputs of 3 and 5 had a target of 8, the learned logic would be to add the two inputs. Ultimately the model would use regression analysis to predict the target for inputs of 4 and 6 to be 10. Supervised learning is task-oriented “Find me XYZ target.”


Semi-Supervised Learning

As you may have guessed, semi-supervised learning is a hybrid model. Algorithms using semi-supervised learning are trained on a combination of labeled and unlabeled data. This approach can be more practical because because it can be expensive to have a data scientist or data engineer label data. Other times it’s taken because the size of the data is so massive the task of labeling it would be herculean. Another reason teams take a hybrid approach is to avoid any sort of human bias that can happen during data labelling.

“It is a capital mistake to theorize before one has data. Insensibly, one begins to twist the facts to suit theories, instead of theories to suit facts.”

Sherlock Holmes

 
With semi-supervised learning your model may benefit and be able to work faster by having some targets or labeled data, and the work it does to make sense of the unlabeled data may reveal insights and provide you with outputs you hadn’t discovered yet. It’s a win-win in many scenarios and often used approach.


Reinforcement Learning

Reinforcement learning is the most abstract approach and based entirely on the machine, often referred to as the “learning agent”, learning through trial and error. The machine determines which actions to take in order to maximum its performance in a given environment based on a definition it’s been given of a reward. That kind of trial and error activity is called exploration. The knowledge it gains from understanding which actions earn rewards is called exploitation.

Through exploration and exploitation of its environment, the learning agent, fueled by advanced machine learning algorithms, ultimately gains enough knowledge to begin to demonstrate almost human-like levels of artificial intelligence.

Robotics provide the best example of reinforcement learning. Their use in factories relies heavily on their ability to use reinforcement learnings to adapt as needed to their environment and complete human like tasks and behaviors with continually improving error rates.

What kind of data do you need for machine learning?


“Machine learning can only be as good as the data you use to train it.”

Daniel Tunkelang, Led machine learning projects at Endeca, Google, LinkedIn

There is no end to the number of articles that speak to the importance of making sure you have enough of the right data to support your machine learning projects.

As Tunkelang, quoted above, goes on to explain in the article Machine Learning: 10 Facts Everyone Needs to Understand “you can have machine learning without sophisticated algorithms, but not without good data.”

So what kind of data do you need? It depends.

Structured vs. Unstructured Data


Image

Structured Data

Structured data is logically organized and easy for a computer to read and understand. It could be machine generated transactional data pulled from an ERP or CRM system or simple time-stamped data about actions coming from sensors. It could also be human generated data input into a spreadsheet. This type of data is most often used in supervised learning and it can typically be processed very quickly, even with incredibly large volumes.


Image

Unstructured data

According to industry leaders more than 80% of the data in the world is unstructured and the amount of it is growing exponentially. Unstructured data is everywhere. Human generated unstructured data includes MS word and other text files, presentations, videos, images, audio, social media posts, and much more. Examples of machine generated unstructured data includes surveillance footage, satellite imagery and different types of scientific data. Supervised and reinforcement learning are incredible tools that can be applied to gain insights and do more with unstructured data than ever before.



How much data is required for machine learning?

The short answer is, you need a lot of data. The best algorithm in the world will struggle to yield the right results with insufficient data.

“AI techniques require models to be retrained to match potential changing conditions, so the training data must be refreshed frequently. In one-third of the cases, the model needs to be refreshed at least monthly, and almost one in four cases requires a daily refresh.”

McKinsey Global Institute, Notes from the AI Frontier

Why? Greater volume drives greater accuracy.

There are many reasons for that. One reason is that for most machine learning models, you are trying to get a computer to make sense of a data-set with an incredible amount of variation.

As an example, consider voice recognition applications and variation in speech caused by differences in gender, age, dialects and more. Some experts say that a model needs at least 10,000 hours of audio to deliver outputs with modest accuracy levels. Others say that while the total volume of data required depends on the complexity of the model or the problem, 100,000 instances is a minimum requirement for most models.


Does “quality” matter?

Yes! Maybe even more than quantity.

“More data beats clever algorithms, but better data beats more data.”

Peter Norvig, Computer Scientist, Google and Industry Leader

What makes data “bad?” It could be irrelevant to your problem, inaccurately annotated, misleading, or incomplete. In that case, it will require some data cleaning or preparation.

If your model is tasked with classifying data, your training data may have to be properly labeled first. Sometimes formatting is an issue. For example, if you are working with image data those images may need to be resized so the model analyzes vectors of the same length.

Any data that you use will require some level of clean-up. Experts report that the work that needs to be done doesn’t end with the extracting, transforming and loading (ETL) of data. Even after that, the clean-up required to make it suitable for data science typically represents an average of  80% of the total workload in any machine learning project.

“You can have the most appropriate algorithm, but if you train your machine on bad data, then it will learn the wrong lessons.”

Appen, An Introduction to Machine Learning Data

 
Want to read more? Download our white paper

Additional Resources

As use cases continue to expand, you’ll want to stay up to speed on all the ways you can improve your models and create better products for your customers.

McKinsey Global Notes from the AI frontier Insights from hundreds of use cases

Download
McKinsey Paper

Real-World Use Cases for Human Annotated Data

Download

Importance of Data Strategy white paper

Download

A Future That Works: Automation, Employment, and Productivity

Link

How Much Training Data is Required for Machine Learning?

Link

Glossary of Terms




 

Get the Data You Need

By now you know that your machine learning and AI will only be as good as the data you used to train. Let us help. We’ve been in the data business for over 20 years and have developed deep experience working with leading global technology companies, governments, and other organizations, across a variety of data types.

Appen collects and annotates speech, sound, image, video, and text data, which is used to fuel our clients’ many machine learning projects. We also review and annotate data from live products to improve products and their user experience.


Contact Us