AutoML

What it is, why you need it, and best practices. This guide provides definitions and practical advice to help you understand and practice modern automated machine learning.

AutoML (Automated Machine Learning) diagram showcasing autoloader components for effortless ML model building, training, and deployment.

What is AutoML?

AutoML (short for automated machine learning) refers to the tools and processes which make it easy to build, train, deploy and serve custom machine learning models. AutoML provides both ML experts and citizen data scientists a simple, code-free experience to generate high quality models, make predictions, and test business scenarios. This allows you to quickly apply machine learning across your organization.

Why is it important?

You can use automated machine learning in a variety of applications, such as natural language processing, voice recognition, and recommendation engines. It can also support your BI and analytics needs, by using AutoML models to analyze historical data, find key drivers and patterns across large sets of business metrics, and make smart business predictions based on those patterns.

Citizen data scientists benefit from AutoML tools and processes by quickly and easily developing baseline models and acting on the results of these models. ML experts avoid the traditional trial-and-error workflow process and instead put their time and effort toward customizing models and notebooks.

Here are the high-level benefits of automated machine learning which apply to both types of users:

Quickly apply machine learning across your organization. It allows non-ML-experts to leverage machine learning models and helps ML-experienced developers and data scientists to more quickly produce solutions which are often simpler and even perform better than hand-coded models.

Focus effort on higher impact work. It eliminates time-intensive and monotonous coding throughout the machine learning workflow, from preprocessing and cleaning the data to selecting the algorithm to optimizing and monitoring model parameters. Also, training a computer to identify content can reduce errors and save countless hours of manually curating tables, text, images and videos.

Improve business performance. It makes it faster and easier to give your analytics teams the power of predictive analytics, which can significantly improve business performance. Specific applications include detecting fraud, giving consumers more personalized experiences, and better managing inventory through improved demand forecasting.

Automated machine learning can be used on advanced artificial intelligence applications such as deep learning models, or simple problems in your business that you just don’t have the time or expertise to do.

How it works

Automated machine learning typically maps to the traditional machine learning workflow. As with other data science or data analytics projects, you should first clearly define the question you’re trying to answer or the problem you’d like to solve. This critical step will inform your data requirements.

Depending on your specific use case and type of data (structured, image, video, or language), the details of the AutoML process will vary. But, below is a high-level overview to get you started.

A table displays customer data, including ID, gender, age, zip code, plan type, logins in the first month, average minutes logged in the first month, and churn status in the first year.

Dataset. First you gather the appropriate data and prepare your dataset. Key actions include:

  • Ensuring your dataset is correctly labeled and formatted.

  • Avoiding data leakage and training-serving skew.

  • Cleaning up data which is incomplete, missing, or inconsistent.

  • Reviewing the dataset after you’ve imported it into your automated machine learning platform to ensure accuracy.

Train and Evaluate. Now you’re ready to train your model. AutoML commonly employs techniques such as hyperparameter tuning, data preprocessing, meta-learning, feature engineering, and neural architecture search. We describe the key types of automated machine learning in the next section.

  • Make sure you understand all feature columns you’re including and that you’re not including columns which are not relevant to your analysis and will just create noise.

  • Once you’ve completed training, your tool should provide a metrics report on how your trained model performed on the test dataset. These validation metrics help you to gauge whether your model is ready to use. They include forecasting and regression metrics (such as mean absolute error and observed quantile), and classification metrics (such as prediction outcomes and score threshold).

  • In addition to this metrics report, the best AutoML tools allow you to leverage explainable AI (XAI) so you can understand the rationale behind the output of your ML model. You can also further evaluate your model by using new data to run additional tests and see if the predictions generated meet what you would expect.

Deploy and Serve. Once you’re confident in the performance of your model, you can make it available to use. Usage may mean a one-time project or as part of an on-going production process.

  • For one-time projects, an asynchronous batch prediction approach is probably most appropriate.

  • If your model will be integral to a larger AI analytics process in which other applications depend on fast predictions, you should consider a synchronous, real-time deployment.

  • The best AutoML tools allow you to publish your data to other cloud platforms and directly integrate your models into BI and analytics tools for full interactive analysis. This brings you deeper insights and data-driven decisions which improve your company’s performance.

A group of people working together at computer desks in a modern office setting. One person is pointing at a screen while others observe.

Learn How to Get Started

Download the AutoML guide with 5 factors for machine learning success.

AutoML types

To build predictive models, even experienced data scientists and ML engineers must take several steps such as formulating hypotheses, collecting the right dataset, visualizing data, engineering features, training models with hyperparameters, and designing optimal deep neural network architectures. Here we describe how automated machine learning can bring speed and transparency to your data science pipeline.

Automated hyperparameter optimization

Hyperparameters govern the behavior of your model, including settings like the learning rate, the number of hidden layers in a neural network, and the regularization strength. These parameters are usually configured before model training and can significantly influence your model’s performance. But, fine-tuning hyperparameters can be a challenging and time-intensive process.

AutoML systems automatically search for the optimal combination of hyperparameters for a specific ML model. This involves training your model on the data using various hyperparameter settings and then assessing the performance of each configuration.

Automated feature engineering

Feature engineering involves transforming input data for machine learning models. It’s a critical process that can significantly impact model performance. Automated feature engineering (AFE) explores feature combinations systematically, rather than manually. Manual feature engineering demands a major time investment. Creating a single feature can take hours, and achieving even basic accuracy requires hundreds of features. However, AutoML streamlines this process by automating feature space exploration, reducing your time from many hours to just a few minutes.

Neural architecture search (NAS)

The most intricate and time-consuming aspect of deep learning involves designing the neural architecture. Your data science team would need to invest significant effort in selecting layers and learning rates, which often impact only the model weights. Neural architecture search (NAS) aims to automate this process, effectively using neural networks to design other neural networks.

NAS involves selecting architectures for evaluation. The outcome depends on performance metrics. Common approaches include random testing for small architecture sets and effective gradient-based methods. Your data science team can also explore evolutionary algorithms, evaluating architectures randomly and iteratively improving successful ones.

AutoML example

To illustrate automated machine learning in action, let’s imagine you’re running a SaaS company selling monthly subscriptions to an online platform. Below we look at how you can use automated machine learning in a BI tool to evaluate customer behavior.

Evaluating customer churn

AutoML tables can help you understand the patterns and drivers affecting customer churn in the past. It can also use those same patterns to predict which current customers have the highest risk of leaving in the future.

Accessing a dataset of historic customers reveals what the first 12 rows of such a dataset might look like:

A table displays customer data, including ID, gender, age, zip code, plan type, logins in the first month, average minutes logged in the first month, and churn status in the first year.

Each row in the table above represents a unique, historic customer. Each column represents an attribute about the customer.

Some attributes about each customer became clear the moment that person became a customer. For example, CustomerID, Gender, Age, Zip, Plan_Type. Some attributes about each customer became available later in the customer journey, for example, Logins_1M (the number of times a customer logged into the site during month one), Avg_min_log_1M (the average time – in minutes – that a customer spent on the site during month one), and Churn_1Y (whether or not the customer quit the platform within a year of becoming a customer). Churn_1Y is the column of interest because you want to be able to predict whether or not a given customer is likely to leave the platform during the first 12 months.

Close inspection of AutoML tables reveals three key patterns in the dataset:

  1. Customers over the age of 70 rarely churn during their first year.

    Table displaying customer data, including ID, gender, age, zip code, plan type, logins in the first month, average minutes logged in the first month, and churn status in the first year.

  2. Female customers in their 40s on a "Family" plan rarely churn during their first year.

    A table displaying customer data including ID, gender, age, zip code, plan type, logins in the first month, average minutes logged in the first month, and first month churn status.

  3. Male customers in their 30s on a "Personal" plan are more complicated. They’re likely to churn during their first year if they logged in fewer than 20 times during month one. If, however, they logged in more than 20 times during their first month, they’re not likely to leave.

    A table displays customer data, including customer ID, gender, age, zip code, plan type, logins in the first month, average minutes logged in the first month, and churn status.

Automated machine learning excels at finding patterns such as these. It can even discover significantly more complex patterns over a large number of columns to predict how combinations of values in the feature columns will affect the values in the target columns.

AutoML takes a dataset (like the one in the example) and allows you to specify a target field (e.g., Churn_1Y). It then finds key drivers and patterns in the data that are often impossible to visualize or detect by a human. You can then refine and finalize the model as well as use it to make future predictions for both forward-looking data and scenario planning.

Taking action on your predictions

AutoML tool features

While there are open-source Python libraries for automated machine learning out there, the best machine learning automation platforms include the following end-to-end automation capabilities:

  • Allow you to quickly preprocess, clean, and connect your data.

  • Provide a simple, code-free interface to easily auto-generate and refine ML models, make predictions, and test business scenarios.

  • Automatically scores and ranks multiple ML models to select the best performing model for your data set.

  • Help you influence the predicted outcomes by providing prediction-influencer data to explain outcomes at the record-level and including full explainability data.

  • Allow you to easily publish the data and/or directly integrate your models via APIs into BI and analytics tools to build interactive visualizations and dashboards with predictive insights that give transparency on which metrics drive results.

Learn More About AutoML Tools

See modern analytics in action