Google Cloud’s AutoML first look

9 min readApr 23, 2019

background image credit: commingsoon.net

I’m not new to Automated Machine Learning. It should be no surprise I have something to say regarding Google Cloud’s “Cloud AutoML” [Beta] which covers 5 of the 29 announcements made last week at Google Cloud Next ’19. I’m cautiously a fan of AutoML; although, I also believe that it’s more of a highway to get there and not a final destination.

As a bit of background on me, I was the one who lead Deloitte to earn their global ML Competency last year from AWS. Likewise, I have brought Automated ML to the firm and to dozens of clients using AWS Sagemaker (not currently considered AutoML, but does have auto tuning[3]), DataRobot, and H2O.ai. The firm I joined recently (Maven Wave Partners as Managing Director of Data Science) as named ‘19 Google Service Parters of the Year. Given that: these thoughts are my own. I have colleagues who have published articles from next as [4]. Maven Wave is great place to learn but a reminder that these thoughts are my own.

While all these products are very different from one another, they all attempt to answer one question:

How do we accelerate Machine Learning to make it useful to the organization while not losing explainability or accuracy [or the ability to put into production]

With all that said I am a complete beginner when it comes to Cloud AutoML. This is my “first look” and attempt to add some clarity around the value, myths, and shortcomings. Please note that currently AutoML is in Beta. I’m more than happy to revise this rather informal blog post in any way helpful. Also, feel free to contact me, comment, re-post, complain, praise … I’m writing to share but also to learn.

In this blog I will do the following:

Define Cloud ML
Walk through an example with real data. For illustrative purposes, I will be running through some public data on Emergency Room visits [1]. But later dropped this example because I could not get it to work.
Do some analysis of the results
Do a comparison to some other AutoML ideas/products
Provide some final feedback

In this post, I will mark things that I think are not good as 👎🏽, and good things as 👍🏽, and things that could use some external feedback as 👂🏼.

1. What is Cloud AutoML anyway?

AutoML, in the transitional sense of general supervised classification and regression in the GCP world is AutoML Tables. That is the *normal* jumping off point is you take a flattened set of label data with various features and allow Auto ML do all the magic including cross-validation and accuracy testing to train a model. I call this Table based ML, Traditional Machine Learning that leans more toward explainability and repeatability vs those to handle other sorts of problems including: Video tagging, Image tagging, and time series.

To solve this gap in types of models, much in a Google way, there are several other more specific flavors of AutoML:

AutoML Natural Language — handling things like domain specific sentiment analysis and more
AutoML Vision Object Detection — bounding box smart multi-object detection, basically Google Vision API on steroids
AutoML Vision Edge — the IOT version of Vision detection for Edge Devices
AutoML Video — Video media tagging

All five products are currently in Beta when this blog post was written. Here, we will focus on Cloud Auto ML as it is more generally what we think of when we compare to other uses of the term in the past👂🏼.

2. Walk thru of AutoML Tables

Loading data

That means, much like supporting the use of s3:// (AWS S3 buckets) in Sagemaker you can also use Google Storage or BigQuery. Although, access to the AutoML panel is through a web interface. The two together, seemed a bit like a contradiction: should Auto ML be web-based-self-service or service-base-web-accessible👎🏽?

should Auto ML be web-based-self-service or service-base-web-accessible👂🏼?

You will do it the cloud way!

When uploading my data, it took be several times to get the bucket access correct, and I was never able to use a public bucket where my open data sources were stored (probably because the product does not like reading from public data, I presume). 👎

Does not like empty column names, but this error could use a little work!

But, after a couple fixes, I’m presented with a very handy listing where AutoML guesses types of features 👍🏽.

How many did AutoML guess correctly? It got 46/50 correctly mapped. There were a couple mis-mappings 👎.

As part of the Analyze (or some call it EDA — Exploratory Data Analysis) is a step that gives more details in each. The sorting by feature type seems a little silly👂🏼, there is a column called correlation with target which uses Cramér’s V correlation [5] which is very handy for base-lining feature importance before modeling 👍🏽.

Now the fun part, Training.

You can optimize the model in our case a classification model only provided one option: “Log Loss”. Presumably there will be other metrics to optimize on later👂🏼.

waiting…

waiting … 👂🏼

Is it weird that I want more feedback? Logs? CPU/GPU/TPU usage

getting coffee ….

waiting ….

One suggestion from my team was to check the stackdriver logs to see what is going on. Unfortunately, only two logs I see and they aren’t helpful👎🏽:

so, back to waiting …

Happy to hear suggestions on what to do when you are training your model 👂🏼…

After 8 hours or so my models failed and there was not a good reason why

Yay! I was able to train a model on different data

The data I used was the CloudML Census data[6]:

gs://cloud-samples-data/ml-engine/census/data/adult.data.csv

I did need to add a header:

age,workclass,fnlgt,education,education_num,maritial_status,occupation,relationship,race,sex,capital_gain,capital_loss,hours_per_week,native_country,target

Under GCP data looks like this:

I set the Budget to 24 hours. Please Note: I am trying the same thing now with the other data; could it be that 1 hour Budget was causing the other models to fail?

92.9% AUC for this model. A very clean/usefull display of results.

Production

Going to production was a breeze:

Reminds me a bit of SageMaker where it is just ready.

Side Note: while we are waiting. Is it possible that if we are to use the Python API (yay Python) from this example[2] from @torryyang.

4. Do a comparison to some other AutoML ideas/products

Since they are the products I know the best I will compare:

H2O.ai FLOW AutoML
DataRobot
GCP Cloud AutoML Tables
AWS SageMaker

Usability

With this metric the answer is always, it depends. Who is the user. I can think of three major users: Advanced Data Scientists, JR Data Scientist / Data Engineer, or Highly technical business user.

H2O.ai FLOW AutoML

Advanced Data Scientists: 👍🏽Very Good
JR Data Scientist / Data Engineer: Great
Highly technical business user. OK

DataRobot

Advanced Data Scientists: Good
JR Data Scientist / Data Engineer: 👍🏽Very Good
Highly technical business user. 👍🏽Very Good

GCP Cloud AutoML Tables

Advanced Data Scientists: OK, need more logs [perhaps I need to use the Python API 👂🏼]
JR Data Scientist / Data Engineer: OK, need more logs
Highly technical business user. OK👎🏽 (The errors and the long training wait time really worked against this; however, provided they fix this I would give this a 👍🏽Very Good

AWS SageMaker

Advanced Data Scientists: Very Good
JR Data Scientist / Data Engineer: Great
Highly technical business user. OK

Accuracy

Amazingly the top results were all fractions from each-other; basically, the results were all the same. This may not be true to every problem. This also shows that a lot of model accuracy really depends on the data and feature engineering. The high performing models have become commodity.

H2O.ai FLOW AutoML: StackedEnsemble_BestOfFamily_AutoML_20190423_142023 0.929324653437722 AUC
DataRobot: eXtreme Gradient Boosted Trees Classifier with Early Stopping 0.929 AUC
GCP Cloud AutoML Tables [Unknown Model] 0.929 AUC
AWS SageMaker TBD

Explainability

The problem with chasing accuracy is the often times the winning model lacks transparency or explainability as some like to call it. GCP AutoML has very little built it tool to tell explainability. Perhaps I need to learn here but I can only find a boiled down list of informative features.

H2O.ai FLOW AutoML: Good
DataRobot: 👍🏽Great. DataRobot shines in this space. It comes with lot of very pretty graphs and ways to explore and compare results.
GCP Cloud AutoML Tables: OK
AWS SageMaker: Good

Production-worthiness

To go to production often means hosting inferences or predictions.

H2O.ai FLOW AutoML. Means writing code .There is some code required and it is largely hands on. The good news is the dependencies are usually hand JAR files. Mojo / Pojo http://docs.h2o.ai/h2o/latest-stable/h2o-docs/productionizing.html There is certainly more I need to learn about building inference pipelines in H2O 👂🏼
DataRobot has a couple different ways including: Dedicated prediction server, Shared server, or an interesting feature with DR Prime called Approximation Models (exporting static Python or Java) files.
GCP Cloud AutoML Tables Going to production was super easy and seamless. Basically the end point is auto created. Purely SaaS means no true export.
AWS SageMaker Also very easily hosting a prediction endpoint. Purely SaaS means no true export, but on the other hand you can import models to host.

Please note, I did not mention retraining models as they all have their ways of doing so. It really depends on the use-case and the organization on what needs to be done here.

Summary on Cloud AutoML

The summary is:

It’s too soon to say

There are some problems I experienced and I still do not know why. The philosophy of not know what model delivered what escapes me. There are not enough tools for explainability.

On the plus side, the feeling of AutoML was that some parts are simplistic and useful in their own. Very much in a Google way, what was done was done well; that is with the exception of useful logging and more transparency into what is really going on.

My suggestion is to keep an eye on this one. Perhaps a grown up version can lower the barrier of entry for a Business audience to start using ML. I’m happy to hear other’s feedback as well 👂🏼.