ML Hyperparameters

: (Jungfrau, Switzerland - Alvin Wei-Cheng Wong)

- Overview

In machine learning (ML), hyperparameters are configuration variables set before training a model, which control the learning process itself and significantly impact the model's performance, unlike parameters which are learned directly from the data during training.

Essentially, hyperparameters are settings that you can adjust to optimize how a model learns from data.

Please refer to the following for more information:

Wikipedia: ML Hyperparameters

- Hyperparameters

Hyperparameters are external configuration variables that data scientists use to manage the training of machine learning models. Sometimes called model hyperparameters, hyperparameters are set manually before training a model. They are different from parameters, which are internal parameters that are derived automatically during the learning process and are not set by the data scientist.

Examples of hyperparameters include the number of nodes and layers in a neural network and the number of branches in a decision tree. Hyperparameters determine key features such as model architecture, learning rate, and model complexity.

Key characteristics about hyperparameters:

Set before training: Unlike model parameters, hyperparameters are not learned from the data but are defined beforehand.
Impact on learning process: They influence how the model learns, affecting its accuracy, generalization ability, and other metrics.
Tuning process: Optimizing model performance often involves "hyperparameter tuning," where you experiment with different values of the hyperparameters to find the best combination.

Examples of hyperparameters:

Learning rate: How much the model updates its parameters at each training step
Number of hidden layers: In a neural network, how many layers are present between the input and output layers
Regularization strength: A technique to prevent overfitting by penalizing complex models
Maximum depth of a tree: In decision tree algorithms, how deep the tree can grow

: [Machine Learning Hyperparameters]

- Hyperparameter Tuning

When training a ML model, each dataset and model requires a different set of hyperparameters, which is a type of variable. The only way to determine these is through multiple experiments, selecting a set of hyperparameters, and then running them into the model. This is called hyperparameter tuning.

Essentially, you are training the model sequentially with different sets of hyperparameters. This process can be manual, or you can choose one of several automated hyperparameter tuning methods.

Regardless of which method you use, you will need to track the results of your experiments. You must apply some form of statistical analysis, such as a loss function, to determine which set of hyperparameters produces the best results. Hyperparameter tuning is a non-trivial and computationally intensive process.

- Hyperparameter Tuning Techniques

Hyperparameter tuning techniques are methods used to find the optimal values for hyperparameters - settings that control the learning process of a machine learning model, but are not learned from the data itself - by systematically trying different combinations of these values to maximize the model's performance on a validation set; common techniques include grid search, random search, and Bayesian optimization which vary in how they explore the space of possible hyperparameter values to find the best combination.

Key methods about hyperparameter tuning techniques:

Grid Search: Exhaustively tests every possible combination of hyperparameter values within a specified range, ensuring a thorough exploration but can be computationally expensive for large parameter spaces.
Random Search: Randomly samples combinations of hyperparameter values, often faster than grid search but may miss the optimal configuration.
Bayesian Optimization: Uses a probabilistic model to intelligently explore the hyperparameter space, leveraging information from previous evaluations to guide the search towards promising areas, balancing speed and accuracy.

Example hyperparameters that can be tuned:

Learning rate in gradient descent algorithms
Number of trees in a Random Forest
Maximum depth of a decision tree
Number of hidden layers in a neural network

[More to come ...]

Document Actions

Send this

Sections