Test your machine learning IQ
How much do you really know about machine learning?
Answer 10 quick questions to find out now.
What is machine learning?
Machine learning is the process of teaching machines how to learn by providing them with guidance that helps them develop logic on their own and giving them access to data you want them to explore. The result is some form of artificial intelligence, or AI.
“Despite its name, there is nothing “artificial” about this technology – it is made by humans, intended to behave like humans and affects humans. So if we want it to play a positive role in tomorrow’s world, it must be guided by human concerns.”
Fei-Fei Li on “human-centered AI”, New York Times
How does machine learning work?
Machines, most often computers, are given rules to follow known as algorithms. They are also given an initial set of data to explore when they first begin learning. That data is called training data.
Computers start to recognize patterns and make decisions based on algorithms and training data. Depending on the type of machine learning being used, they are also given targets to hit or they receive rewards when they make the right decision or take a positive step towards their end goal.
As they build this understanding or “learn”, they work through a series of steps to transform new inputs into outputs which may consist of brand-new data, labeled data, decisions, or even actions.
The idea is that they learn enough to operate without any human intervention. In this way they start to develop and demonstrate what we call artificial intelligence. Machine learning is one of the main ways artificial intelligence is created.
Other examples of artificial intelligence include robotics, speech recognition, and natural language generation, all of which also require some element of machine learning.
There are many different reasons to implement machine learning and ways to go about it. There are also a variety of machine learning algorithms and types and sources of training data.
Why is machine learning growing so quickly?
In recent years, there have been 3 things that have contributed to the widespread interest in machine learning.
- Growth in all types of data
- Declining cost of storage
- Massive improvements in computing power
As with anything, there is evidence of other contributing factors and business drivers, but these three advances have clearly been dominant in terms of paving the way for accelerated use of machine learning and new and innovative applications of artificial intelligence.
2018 Machine Learning
Facts & Figures
Every MINUTE of every day in 2018:
- Amazon ships 1,111 packages
- 400 hours of video are uploaded to YouTube
- Google conducts 3,877,140 searches
- 2,083,333 snaps are sent on Snapchat
46 trillion, 200 billion calculations per second 478,200,000,000,000
93 quadrillion calculations per second 93,000,000,000,000,000
- Automated Customer Service Agents
- Diagnostic and Treatment Systems
- Intelligent Processing Automation
- Automated Preventative Maintenance
- Expert Shopping Advisors
- Public Safety and Emergency Response
Why invest in machine learning?
Organizations in both the public and private sector are investing in machine learning because it allows them to improve in the following ways:
Get answers and perform sophisticated calculations faster.
Process more data and conduct more complex analytics than ever before.
Uncover new insights by tapping into data previously indecipherable.
Conduct more analysis with fewer human resources.
No matter what industry you are in, you will probably find a solid use case for machine learning and be able to justify the investment through anticipated return to top line and/or bottom line revenue numbers.
Machine learning has been proven to reduce and even eliminate manual data entry, detect spam fight fraud, recommend products – to consumers and to your R&D team! It can be used to predict
when maintenance will be needed on all sorts of equipment and infrastructure, it can tell you more about your customers than you’ve ever known before, improve customer satisfaction and diagnose a myriad of health and medical issues.
If you haven’t already invested in machine learning, you need to ask yourself why not? What is preventing you from getting started?
What is machine learning used for?
Retail and eCommerce
Artificial intelligence and machine learning are being used to boost conversion rates, improve customer experience, deliver personalization and more
Online shoppers don’t have the luxury of asking a salesperson where they can find a product. Your onsite search engine fulfills that role. By interpreting search queries, assessing user intent and using that information to train your search algorithm, results become more relevant which results in higher purchase conversion.
Customers seek a more personalized experience when shopping online. Providing recommendations to shoppers or search results based on their past behavior can help create stronger user engagement and retention.
Enhanced customer service
Chatbots act as a virtual shopping assistant. Like an employee, they both need to be trained to know not only what you sell, but also the terminology people use for the many products on your site, make sure they have the training data they need.
Search engines and other leading technology companies use machine learning to deliver innovative products and improve user experience
Search engine algorithms use machine learning to drive stronger user engagement. By interpreting queries and assessing user intent, search results become more relevant, which creates higher user satisfaction.
Analyzing data activity and preferences can help search engine and social media providers personalize content feeds and recommendations, enhancing online customer experiences.
Natural language processing (NLP)
NLP can analyze language patterns to understand text, for example on social media. This technology can be used to track customer sentiment and develop engagement strategies.
Leaders in financial services use machine learning and artificial intelligence to improve customer acquisition, retention and overall experience
Anti-money laundering (AML), Know Your Customer (KYC), and fraud detection programs require sophisticated tools to spot potential threats. Relying solely on employees to spot patterns in financial records can be both time-consuming and costly. Machine learning and artificial intelligence allow financial institutions to sift through data and find anomalies quickly, preventing illegal activity and saving potential company losses.
Machine learning algorithms are now being leveraged by financial institutions to create investment strategies, freeing up financial advisors to engage more with their clients.
Enhanced customer experiences
With today’s expectation for on-demand customer service, chatbots have a crucial role to fill. Chatbots help to delight customers with real-time feedback and a streamlined experience.
Accelerate machine learning with training data for self-driving cars and improve speech-recognition systems, in-car navigation and user experience with more accurate field testing
While self-driving cars are extremely complex machines, their Artificial Intelligence (AI) is powered by machine learning. As the car moves forward, it processes a lot of visual data —just like a driver does when looking out the windshield. Vehicles need to assign meaning to large volumes of image data, such as identifying a tree or pedestrian, and then feed that back into the car’s AI to teach it.
Traditional dashboards—and more recently, mobile devices—take a driver’s hands and eyes off the road. Speech interfaces don’t. Connected cars need access to large-scale speech data collections to train the speech interface, providing consumers around the world with the best user experience.
Advances in voice recognition and cameras that can help track driver emotions are an important next step in Human Machine Interface, giving cars the ability to identify speakers’ emotions as well as their words—so they can tell when users get frustrated, and respond accordingly.
Improve emergency response, defense initiatives and law enforcement with secure data services
Using social media monitoring, computer vision, and data annotation, government agencies are now able to extract information to aid with terrorist surveillance, monitor national security threats, and more.
Emergencies like natural disasters or coordinated attacks can happen without a moment’s notice. When lives are at stake, responding immediately and with coordination is key. Using translation, voice recognition, and text data collection, emergency responders around the world can communicate efficiently with those in harm’s way.
Secure transcription allows law enforcement to accomplish many objectives, including capturing files from Body Worn Video, official record keeping, and archival record solutions.
Exciting uses of artificial intelligence (AI) and machine learning in Healthcare are transforming patient care
Evaluate trends, anticipate outbreaks, and forecast patient needs.
Chatbots and virtual healthcare
Provide faster and better customer service.
Advances in underwriting
Use machine learning to build stronger underwriting models based on a wide variety of data points.
What are the top machine learning methods?
“Most of human and animal learning is unsupervised learning.
If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don’t know how to make the cake. We need to solve the unsupervised learning problem before we can even think of getting to true AI.”
Yann Lecun, Director of AI Research, Facebook
This type of machine learning is probably the simplest and most accurate form of machine learning.
Basically, machines are given labeled training data that has both inputs and target outputs. The machine’s task is to learn the logic used to derive the target outputs by following the examples it was given. Eventually, after deriving the logic from the labeled training data, when it gets new inputs it will arrive at the targets or outputs independently.
Two examples of models commonly used for supervised machine learning include classification and regression.
Classification is easiest to understand. The data is evaluated to determine which class it falls into. An example might be a machine learning model which asks a machine to determine if a picture is of a horse or not. That’s a simple yes/no response and an example of binary classification. After providing training data with enough pictures of horses and non-horses that the machine can learn the distinguishing characteristics of a horse, the machine will be able to to look at a picture on its own and tell you if it’s a horse or not.
Regression is slightly different. Instead of separating the data and assigning it to a class, the machine is asked to predict a response or output based on the responses it got from the initial training data.
An easy to follow example is if the initial inputs of 3 and 5 had a target of 8, the learned logic would be to add the two inputs. Ultimately the model would use regression analysis to predict the target for inputs of 4 and 6 to be 10. Supervised learning is task-oriented “Find me XYZ target.”
As you may have guessed, semi-supervised learning is a hybrid model. Algorithms using semi-supervised learning are trained on a combination of labeled and unlabeled data. This approach can be more practical because because it can be expensive to have a data scientist or data engineer label data. Other times it’s taken because the size of the data is so massive the task of labeling it would be herculean. Another reason teams take a hybrid approach is to avoid any sort of human bias that can happen during data labelling.
“It is a capital mistake to theorize before one has data. Insensibly, one begins to twist the facts to suit theories, instead of theories to suit facts.”
With semi-supervised learning your model may benefit and be able to work faster by having some targets or labeled data, and the work it does to make sense of the unlabeled data may reveal insights and provide you with outputs you hadn’t discovered yet. It’s a win-win in many scenarios and often used approach.
Reinforcement learning is the most abstract approach and based entirely on the machine, often referred to as the “learning agent”, learning through trial and error. The machine determines which actions to take in order to maximum its performance in a given environment based on a definition it’s been given of a reward. That kind of trial and error activity is called exploration. The knowledge it gains from understanding which actions earn rewards is called exploitation.
Through exploration and exploitation of its environment, the learning agent, fueled by advanced machine learning algorithms, ultimately gains enough knowledge to begin to demonstrate almost human-like levels of artificial intelligence.
Robotics provide the best example of reinforcement learning. Their use in factories relies heavily on their ability to use reinforcement learnings to adapt as needed to their environment and complete human like tasks and behaviors with continually improving error rates.
What kind of data do you need for machine learning?
“Machine learning can only be as good as the data you use to train it.”
Daniel Tunkelang, Led machine learning projects at Endeca, Google, LinkedIn
There is no end to the number of articles that speak to the importance of making sure you have enough of the right data to support your machine learning projects.
As Tunkelang, quoted above, goes on to explain in the article Machine Learning: 10 Facts Everyone Needs to Understand “you can have machine learning without sophisticated algorithms, but not without good data.”
So what kind of data do you need? It depends.
Structured vs. Unstructured Data
Structured data is logically organized and easy for a computer to read and understand. It could be machine generated transactional data pulled from an ERP or CRM system or simple time-stamped data about actions coming from sensors. It could also be human generated data input into a spreadsheet. This type of data is most often used in supervised learning and it can typically be processed very quickly, even with incredibly large volumes.
According to industry leaders more than 80% of the data in the world is unstructured and the amount of it is growing exponentially. Unstructured data is everywhere. Human generated unstructured data includes MS word and other text files, presentations, videos, images, audio, social media posts, and much more. Examples of machine generated unstructured data includes surveillance footage, satellite imagery and different types of scientific data. Supervised and reinforcement learning are incredible tools that can be applied to gain insights and do more with unstructured data than ever before.
How much data is required for machine learning?
The short answer is, you need a lot of data. The best algorithm in the world will struggle to yield the right results with insufficient data.
“AI techniques require models to be retrained to match potential changing conditions, so the training data must be refreshed frequently. In one-third of the cases, the model needs to be refreshed at least monthly, and almost one in four cases requires a daily refresh.”
McKinsey Global Institute, Notes from the AI Frontier
Why? Greater volume drives greater accuracy.
There are many reasons for that. One reason is that for most machine learning models, you are trying to get a computer to make sense of a data-set with an incredible amount of variation.
As an example, consider voice recognition applications and variation in speech caused by differences in gender, age, dialects and more. Some experts say that a model needs at least 10,000 hours of audio to deliver outputs with modest accuracy levels. Others say that while the total volume of data required depends on the complexity of the model or the problem, 100,000 instances is a minimum requirement for most models.
Does “quality” matter?
Yes! Maybe even more than quantity.
“More data beats clever algorithms, but better data beats more data.”
Peter Norvig, Computer Scientist, Google and Industry Leader
What makes data “bad?” It could be irrelevant to your problem, inaccurately annotated, misleading, or incomplete. In that case, it will require some data cleaning or preparation.
If your model is tasked with classifying data, your training data may have to be properly labeled first. Sometimes formatting is an issue. For example, if you are working with image data those images may need to be resized so the model analyzes vectors of the same length.
Any data that you use will require some level of clean-up. Experts report that the work that needs to be done doesn’t end with the extracting, transforming and loading (ETL) of data. Even after that, the clean-up required to make it suitable for data science typically represents an average of 80% of the total workload in any machine learning project.
“You can have the most appropriate algorithm, but if you train your machine on bad data, then it will learn the wrong lessons.”
Want to read more? Download our white paper
As use cases continue to expand, you’ll want to stay up to speed on all the ways you can improve your models and create better products for your customers.
McKinsey Global Notes from the AI frontier Insights from hundreds of use cases
Real-World Use Cases for Human Annotated Data
Importance of Data Strategy white paper
A Future That Works: Automation, Employment, and Productivity
How Much Training Data is Required for Machine Learning?
Glossary of Terms
Keep up to date with the latest in machine learning by visiting the Appen Blog.
Get the Data You Need
By now you know that your machine learning and AI will only be as good as the data you used to train. Let us help. We’ve been in the data business for over 20 years and have developed deep experience working with leading global technology companies, governments, and other organizations, across a variety of data types.
Appen collects and annotates speech, sound, image, video, and text data, which is used to fuel our clients’ many machine learning projects. We also review and annotate data from live products to improve products and their user experience.