Google Vertex AI is a managed machine learning (ML) platform provided by Google Cloud. Vertex AI unifies Google Cloud’s existing ML offerings into a single environment for efficiently building and managing the lifecycle of ML projects.
It provides tools for every step of the machine learning workflow across different model types, for varying levels of expertise. It allows us to reuse existing pre-trained ML models and train custom ML models with no-code or a code-based approach.
In this article, we’ll introduce Vertex AI AutoML, which automates ML workflow — data preparation, model training, and model serving, with no-code needed.
ML Workflow, Pipeline, Vertex AI, AutoML
The ML workflow consists of data preparation, model training, and model serving. As an analogy, we can compare this flow to how a chef creates recipes, cooks ingredients, and serves dishes to their customers.
- Data preparation is like collecting all the raw ingredients and prepping them to explore the different flavors to build the foundation of your dish. In machine learning, data preparation makes sure the data is relevant and of good quality so that it can be fed into the algorithm to train your ML model.
- Model training is like testing different cooking methods and flavor combinations to find the most efficient, and exciting way to execute your recipe. In machine learning, this involves choosing the right ML model for training, achieving the defined objectives, tuning hyperparameters, and validating the model performance for the best result.
- Model serving is like presenting your final dish to your customers. In machine learning, it involves hosting the model and making it accessible so that it can be used for making predictions.
This ML workflow can be manifested as a traditional ML pipeline as illustrated in the diagram below.
- Data readiness — Get the data ready.
- Feature engineering — Perform feature engineering.
- Training/HP-tuning — Train and tune (hyper-parameters) your model.
- Model Serving — Serve your model.
- Understanding/tuning — Understand and interpret predictions made by your machine learning models and tune the model if needed.
- Edge — Present it to any Edge device.
- Model monitoring, Model Management — Monitor your model and manage it.
One of the comprehensive features provided by Vertex AI (Figure 1), Vertex AI AutoML automates the following components in the machine learning pipeline: data readiness, feature engineering, training and hyperparameter tuning, model serving, understanding and interpretability, and finally the ability to deploy to Edge devices.
Exploring AutoML No-Code Approach
The first step would be to use Vertex AI to create a dataset (what type of data we’ll use to train our ML model) and upload data (where’s the data source) to a dataset. Datasets in Vertex AI allow you to create datasets for your machine learning workloads. You can create datasets for structured data (CSV files or BigQuery tables) or unstructured data such as images and text. It is important to notice that Vertex AI datasets just reference your original data and there is no duplication.
We can choose which type of data we’ll use to train our ML model:
We can then select a data source and add data to our dataset, depending on the data type. For example, we can have:
For Tabular data:
- CSV file: Can be uploaded from your computer or on Cloud Storage.
- BigQuery: Select a table or view from BigQuery.
For Image data:
- Upload images: Recommended if you don’t have labels.
- Import files: Recommended if you already have labels. An import file is a list of Cloud Storage URIs to your images and optional data, like labels.
For Text data:
- Upload text documents: Recommended if you don’t have labels.
- Import files: Recommended if you already have labels. An import file is a list of Cloud Storage URIs to your text documents and optional data, like labels.
For Video data:
- Upload videos: Recommended if you don’t have labels.
- Import files: Recommended if you already have labels. An import file is a list of Cloud Storage URIs to your videos and optional data, like labels.
After the data has been loaded, we can generate statistics and get a feeling of if our data is ready for model training: missing values, and distinct values in each column (Figure 2 below).
We will need to improve our data as needed before beginning the model training.
When our data is ready, we can click the “TRAIN NEW MODEL” button to launch the model training wizard. We can choose the objective and training method.
For Tabular data, we have objectives such as:
- Classification — predict a numeric value. For example, predicting home prices or consumer spending.
- Regression — predict a category from a fixed number of categories. Examples include predicting whether an email is spam or not, or classes a student might be interested in attending.
then, we select the target column — the data column that you want to train your model to predict. We can also specify how the data is split for training, validation, and testing.
If needed, we can perform customized transformations. The automatic option will apply the most relevant transformation options: Categorical, Text, Timestamp, and Numeric.
We can also select which columns to be included/excluded as features — each of which represents a measurable piece of data that can be used for analysis. We can also choose which loss function to use to optimize model training.
Finally, you will choose the estimated training cost in maximum node hours.
Once you’ve completed that step you can click “START TRAINING”.
This is the most laborious part of ML model training. Thanks to Vertex AI AutoML, this laborious part is now transparent to us. Behind the scenes, AutoML will do all the heavy lifting for us such as feature encoding, algorithm selection, hyperparameter tuning, model tuning, model training, model evaluation, pipeline orchestration, resource provisioning, etc.
Training a model can take a few minutes, hours, days, or even longer, depending on the training data. With that said you’ll receive an email when the training is complete enabling you to prioritize other projects.
Then we can evaluate the trained model using metrics such as loss, accuracy, precision & recall, area under the ROC curve (AUC), confusion matrix, feature importance, etc.
Based on the evaluation results, we could choose to retrain the model as needed or deploy the model and serve requests.
We can use our trained model for:
- Edge-Optimized — export trained model as a TF Saved Model to run on a Docker container, which can then be served at edge locations.
- Online Prediction — deploy models online to Vertex AI endpoints, which are machine-learning models made available for synchronous online prediction requests. Endpoints are useful for timely predictions from many users (for example, in response to an application request). You can also review the number of nodes, machine type to host the model, traffic splitting, model monitoring, etc.
Models used in production require continuous monitoring to ensure that they perform as expected. Use model monitoring to track training-serving skew or prediction drift, then set up alerts to notify you when thresholds are crossed.
Model monitoring supports AutoML tabular and custom-trained models and incurs additional charges.
- Batch Prediction — no need to deploy models, this enables you to work directly on the model source. This approach intakes a group of prediction requests and outputs the results to a specified location. Use batch prediction when you don’t require an immediate response and want to process accumulated data with a single request. Batch prediction can be used with AutoML models and custom-trained models.
Ready to fuel your digital product development with the power of AI?
In summary, this article took you through a high-level approach to ML workflows, generic ML pipelines, and Vertex AI comprehensive features.
We then walked through Vertex AI AutoML to explore its no-code approach to training an ML model. Vertex AI AutoML democratizes machine learning and can be a good starting point to start learning ML model training for those without extensive machine learning knowledge or can be used as quick model prototyping before investigating more with code-based custom-trained models.
Curious about how AI can transform your enterprise applications and drive your business objectives forward?
For a limited time, my team at Architech Innovation Labs is hosting a FREE 3-hour AI Ideation session where you and your team will ideate, evaluate, strategize, and test AI use cases custom-built to meet your unique business needs.
Don’t wait, our sessions are filling up! — Act now to explore the art of the possible with Architech. www.architech.ca/ai-workshop