Seven Steps To a ML Workflow
- Overview
In recent years, data has become an important currency. This is because much valuable intelligence can be gleaned from the large data sets captured, which is used to make critical business decisions.
But ML goes far beyond simply storing data. It's about capturing, preserving, accessing and transforming data to interpret it and find its meaning - and ultimately its value.
The goal of ML is to teach computers how to behave using the data you input. Instead of writing code to tell the computer what to do, your code provides an algorithm that adapts to similar examples of correct behavior.
ML algorithms are often developed using frameworks such as TensorFlow and PyTorch.
- 7 Steps to a Machine Learning (ML) Workflow
ML workflows determine which phases are used during a ML project. An ML workflow describes the steps of a ML implementation.
It is not recommended to try to fit a model to a workflow that is too rigid. Instead, it’s better to develop flexible workflows. This allows you to start small and then upgrade to a "production-grade" solution (meaning you have the ability to handle heavy use in a commercial or industrial environment).
While ML workflows vary from project to project, these are ML stages. Following are 7 steps to a ML workflow.
- Step 1: Collect Data:
Gathering data begins with defining the problem. Understanding the problem is critical to determining needs and the best solution.
For example, ML projects that use real-time data require IoT systems that use various data sensors. The first data set can be collected from different sources such as databases, archives or sensors.
- Step 2: Prepare Data (or Data Preprocessing)
Data preprocessing is a data mining technique that involves transferring raw data into an understandable format.
This means cleaning and formatting the raw material. Raw data cannot be used to train ML models. Additionally, ML models can only handle numbers, so ordinal and categorical data must be converted into numerical features.
Real-world data is usually incomplete, inconsistent, and lacks certain behaviors or trends. Data preprocessing is the process of giving data some basic transformation so that a model can consume it.
- Step 3: Choose A ML Model
Considerations when selecting a model include performance (the quality of the model's results) and interpretability (the ease of interpreting the model's results).
Other considerations include dataset size (which affects how the data is processed and synthesized) and training time and cost (training the model).
- Step 4: Train The ML Model
There are 3 main steps to training a machine learning model:
- Start with existing data
- Analyze data to discover patterns
- Make predictions
- Step 5: Evaluate ML Models
There are three main ways to evaluate a model:
- Accuracy (percentage of test data predictions correct)
- Precision (predicting applicable cases falling into a specific category)
- Recall (predicting cases belonging to a category involves all exemplars that legitimately belong to that category)
- Step 6: Perform Hyperparameter Tuning
Hyperparameters define the model architecture, so the process of trying to find the ideal model architecture is called hyperparameter tuning.
- Step 7: Deploy ML Models for Predictions
A predictive model is a container for different versions of an ML model. To deploy a model, you first set up a model resource in AI Platform Prediction (which runs your model in the cloud).