Parameters in AI Systems
- Overview
Parameters are internal variables that a machine learning (ML) model adjusts during the training process to improve its ability to make accurate predictions. They act as the "knobs" of the model, fine-tuning it based on the data provided.
In deep learning (DL), parameters consist primarily of weights assigned to connections between small processing units called neurons. Imagine a large network of interconnected neurons where the strength of each connection represents a parameter.
The total number of parameters in a model is affected by a variety of factors. The structure of the model and the number of "layers" of neurons play an important role. In general, more complex models with more layers tend to have more parameters.
Special components of a particular DL architecture can further increase the overall number of parameters. Understanding the number of parameters in a model is critical to designing an effective model.
More parameters can help a model understand complex data patterns, potentially improving accuracy. However, there is a delicate balance to find. If a model has too many parameters, it may memorize specific examples from the training data instead of learning its underlying patterns. As a result, it may perform poorly when presented with new, unseen data. Achieving the right balance of parameters is a key consideration in model development.
In recent years, the AI community has witnessed the emergence of what are often referred to as “mega models.” These models have an astonishing number of parameters, running into billions or even trillions. While these huge models achieve extraordinary performance, they are computationally expensive.
Effectively managing and training such large-scale models has become a prominent and active area of research and discussion in the AI community.
- Hyperparameters in ML/DL
Hyperparameters are parameters whose values control the learning process and determine the values of model parameters that a learning algorithm ends up learning. The prefix ‘hyper_’ suggests that they are ‘top-level’ parameters that control the learning process and the model parameters that result from it.
As a ML engineer designing a model, you choose and set hyperparameter values that your learning algorithm will use before the training of the model even begins.
In ML/DL, a model is defined or represented by the model parameters. However, the process of training a model involves choosing the optimal hyperparameters that the learning algorithm will use to learn the optimal parameters that correctly map the input features (independent variables) to the labels or targets (dependent variable) such that you achieve some form of intelligence.
Model training typically starts with parameters being initialized to some values (random values or set to zeros). As training/learning progresses the initial values are updated using an optimization algorithm (e.g. gradient descent). The learning algorithm is continuously updating the parameter values as learning progress but hyperparameter values set by the model designer remain unchanged. At the end of the learning process, model parameters are what constitute the model itself.
- Parameters vs. Hyperparameters in ML/DL
In ML, the main difference between parameters and hyperparameters is that parameters are part of the resulting model, while hyperparameters are not.
- Parameters: Internal variables that are learned from training data and adjust during training to improve the model's performance. Parameters represent the underlying relationships in the data and are used to make predictions on new data.
- Hyperparameters: Parameters that are used by the learning algorithm during training, but are not part of the resulting model. Hyperparameters control the model's shape and behavior, and determine how and what a model can learn. Hyperparameters are typically set before training.
Hyperparameters are important because they directly impact the model's performance. For example, in Principal Component Analysis (PCA), the hyperparameter n_components determines the number of eigenvalues and eigenvectors that can be considered as model parameters.
[More to come ...]