This article is written by Rita Ribeiro, Data Scientist at Devoteam G Cloud Portugal
In this article Data Scientist Rita Ribeiro shares her vision on Google Cloud products in the AI space. This article aims to give you a roadmap of the possibilities within the AI domain.
We are living in the age of Big Data. To give you some perspective around this claim, according to Statista the volume of data estimated to be produced in 2022 is around 97 Zettabytes, which is equivalent to a 97 billion Terabytes of Data to be produced, captured, copied and consumed by humankind this year only. Of this data, only two percent is estimated to be stored for future usage, which is still 1.94 Zettabytes (1.94 billion Terabytes).
With this incredible amount of data being created, exciting new usages for this information appear every day and the AI/ML space is more relevant than ever. So, I want to share with you an overview of the Google products that will help you in this amazing journey into the AI space, regardless if you are starting your career in Data Science and exploring new tools, or if you’re already a Data Pro looking into new capabilities and resources to enhance your day to day work.
Google Colab: If you’re kicking off, this is your starting point
If you don’t have any prior experience with Cloud environments and usually run all your experiments in your local machine, this is definitely where you want to start.
Google Colab allows you to write code and experiments directly from your browser and offers a free tier where you can run experiments in a limited (yet pretty decent) machine with both CPU and GPU available. You can also connect to your Google Drive and manage your data folder directly from the interface, you can share your work with your colleagues very easily from the “Share” button, and you can save your notebook to a Google Drive folder of your choosing so that you can resume your work at any moment. Since Google Colab is built on top of Jupyter Notebooks, you can work with all your familiar Python libraries; if Python is not your main language and you usually use R, Google Colab also offers an R kernel.
Tip: if you decide to give Google Colab a try, I would highly recommend that you enable Corgi or Cat mode (or even both!). It won’t do anything special other than make you smile, but in my opinion is absolutely necessary and I never use Colab without these modes enabled.
BigQuery ML: Reliable models inside your Data Warehouse
BigQueryML was designed to offer a democratised solution that allows you to create and deploy well-known models with well-known use cases within your Data Warehouse using standard SQL. This solution is, in my opinion, best suited for you if SQL is your main programming language and your data is tabular. BigQueryML as a very comprehensive set of models that will help you with early experimentation and, the best part is, it eliminates the need to move data.
If your company is in the early stages of experimentation with predictive analytics and BigQuery is your Data Warehouse, BigQuery ML is the best product to help you create and deploy very good baseline models to showcase the potential of predictive analytics leveraging your Data Warehouse and skills.
VertexAI: the managed platform of your dreams
VertexAI is a relatively recent Google product (it launched in September 2021) but, in my opinion, is one of the best and most comprehensive AI platforms available in the market. Long gone are the days where the Data Scientist needed to be a one (wo)man IT army and develop the entire Data Science process, from Exploratory Data Analysis to Model Monitoring, in a painstaking way due to the complexity of the underlying infrastructure. It is believed that most machine learning models never made it to production and those that do can lead to several issues related to technical debt and is ever so important to have a platform that not only provides several services and integrations, but is also built to centralise and provide visibility to your Data Science artefacts (if you want to know more about the technical debt, I would highly recommend you read the Google paper on “Hidden Technical Debt in Machine Learning Systems”) . VertexAI and all its incredible features will empower you to create end-to-end ML products with much less attrition and, hopefully, less complexity.
In the next paragraphs, I will present to you some modules that will help you create an ML product prototype from scratch to production, but let me note that this is only a small subset of VertexAI capabilities described from a Data Scientist perspective. VertexAI is a very comprehensive platform and there’s a high likelihood that it will have a solution that you can use to expedite your process.
VertexAI Workbench:
- this is your control centre. From this single development environment, you can manage the entirety of your Data Science project. In the workbench you can set up all the requirements for your project, you can start your notebooks and you will find several notebook integrations, from data exploration using SQL within notebook cells to quick model integration with established MLOps workflows without the need to write new code or new flows. Note that you can import your BigQuery ML models to VertexAI and use them in this interface.
AutoML
- if you need to establish a quick and simple baseline model with low to no code involved, this is the feature for you. You can use AutoML for a multitude of use-cases, from tabular to image or video data. After the training process is done, you will have access to its performance metrics, feature importance, and hyperparameters if you wish to replicate this experiment in the future. You can also choose to deploy this model, and it can be easily done within VertexAI.
VertexAI Vizier
- If you choose to write your custom model, you surely will have a step for hyperparameter tuning, which is a step very well-known for its complexity and time-consuming nature. VertexAI offers an incredibly powerful, yet very friendly, service that will run this search for you and will return the best parameters it finds.
Disclaimer: Vizier is a black-box optimization engine. It shouldn’t make much of a difference for most use cases, but you might want to keep this detail in mind.
Vertex AI Feature Store
- during your Exploratory Data Analysis you are very likely to have developed some form of Feature Engineering. These features might be applicable in other models or projects and, by using Feature Store, you can store them in a centralised repository. This service will enable sharing, discovery, and reusability of features at scale. This will, hopefully, help the teams accelerate the development and deployment of new ML applications.
VertexAI Pipelines
- as the name suggests, this service lets you manage your entire end-to-end pipeline, from analysis to deployment. Since doing these steps one by one in a manual way can be very cumbersome and time-consuming, this service will act as a platform where you can orchestrate all these stages as a pipeline to help reduce complexity and time-to-product. It will also help you manage ML artefacts, dependencies and lineage (which are well-known sources of technical debt).
VertexAI Model Monitoring
- as you might have experienced before, once a model hits production it often loses performance at a some point due to either model or data drift. With this in mind, this service acts as a system to automatically monitor model performance, re-train the model when necessary, or issue a warning when data drift is detected.
With these few services you can build and productise your ML models in a fairly elegant way, but I would advise that, for more complex deployment options, you might want to leverage the help of a MLOps colleague. Other services that I also find very useful for more mature product pipelines are Vertex Explainable AI, Vertex ML Metadata, and VertexAI Tensorboard.
Conclusion
With this article, I shared with you my personal vision and experience with Google Cloud products in the AI space. I wrote this article from the perspective of a Data professional that felt a lot overwhelmed when she was starting her journey with Cloud environments, and hopefully this article will give you a roadmap to the possibilities within this incredible space.