Personal tools

Data Augmentation

Okayama Castle_102922A
[Okayama Castle, Japan]

- Overview

Data augmentation is a technique that uses existing data to create new data samples to train machine learning (ML) models. It's a way to increase the size and diversity of a dataset, which can help improve the performance of ML models. 

Data augmentation is useful for addressing challenges like:

  • Limited training data: It can be difficult to source large, diverse datasets from the real world.
  • Class imbalance: In some classification problems, some classes may be underrepresented in the training data. Data augmentation can help improve the model's ability to classify these underrepresented classes.
  • Overfitting: Data augmentation can help reduce overfitting and improve model robustness.


Some examples of data augmentation techniques include: 

  • Random cropping: Randomly cropping images to create new examples with different scales and aspect ratios
  • Flipping and rotation: Flipping images horizontally or vertically provides new viewpoints
  • Text transformations: Randomly replacing words with synonyms, swapping words in a sentence, or inserting, deleting, or swapping words
  • Time stretching: Altering the speed of audio without changing its pitch
  • Pitch shifting: Modifying the pitch of audio while maintaining the same speed
  • Adding noise: Introducing background noise to simulate real-world environments


Data augmentation is different from synthetic data, which is the automatic generation of entirely artificial data.

 

[More to come ...]

 

 

 
Document Actions