Aprovecha el precio especial y haz tu profesión a prueba de IA

Antes: $249

Currency
$209
Suscríbete

Termina en:

0 Días
14 Hrs
49 Min
48 Seg

Clasificador de imágenes: preparación de la data

3/19
Resources
Transcript

The data augmentation technique is a powerful tool to improve the performance of deep learning models when working with limited datasets. This strategy allows maximizing the information available in small datasets, generating variations of existing images to train more robust and accurate models.

Why do we need data augmentation on small datasets?

When working with small image datasets (such as the one mentioned above with approximately 400 images per class), we face a fundamental problem: the lack of diversity in the data does not allow the model to capture all relevant features. This can lead to overfitting where the model memorizes the training examples instead of learning generalizable patterns.

Data augmentation solves this problem by applying transformations to existing images, effectively creating new training examples without the need to collect more data. These transformations preserve the semantic information of the image (what it represents) while altering aspects such as orientation, size, cropping, and other visual attributes.

Setting up the environment for PyTorch

Before implementing data augmentation techniques, we need to configure our environment:

  1. Installing dependencies: If you are working in Google Colab, PyTorch comes pre-installed. For local environments, you must install it with:
pip install torch torchvision torchvision torchaudio
  1. Import of required libraries:
import torchimport torchvision# Other libraries for image manipulation and visualization.
  1. Enable CUDA for GPU processing:
# Enable CUDA for parallelizing GPU processingtorch.backends.cudnn.enabled = True
 # Define device to usedevice = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

What data augmentation techniques can we apply?

Data augmentation offers multiple morphological operations that we can apply to our images. Some of the most common ones include:

Padding (adding borders).

This consists of adding a border around the original image with a specific amount of pixels. This can help the model to be more robust to variations in the position of objects.

Resize

Modify the dimensions of the original image. This helps the model to recognize objects regardless of their size in the image.

Random crop

Perform random cropping of the original image. For example, if the image shows a helmet, a crop could focus only on a part of the helmet. This teaches the model to recognize objects even when they are partially visible.

Perspective changes

Alter the perspective from which the object is viewed, which helps the model recognize objects from different angles.

Random rotations

Rotate the image at different angles, allowing the model to identify objects regardless of their orientation.

How to implement data augmentation in PyTorch?

The implementation in PyTorch is easy thanks to the predefined functions in torchvision.transforms:

from torchvision import transforms
 # Transforms for the training settrain_transforms = transforms.Compose([ transforms.RandomResizedCrop(224), # Random cropping with resizing transforms.RandomHorizontalFlip(), # Random horizontal flipping transforms.ToTensor(), # Conversion to tensor transforms.Normalize(mean=[0.485, 0.456, 0 . 406],  std=[0.229, 0.224, 0 . 225]) # Normalize])
 # Transformations for the validation set (generally less aggressive)val_transforms = transforms.Compose([ # Other transformations specific to validation transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406],  std=[0.229, 0.224, 0. 225])])

It is important to note that we can apply different transformations to the training and validation sets. Generally, the transformations for validation are less aggressive, since we want to evaluate the model under conditions closer to the real ones.

Tensor conversion and normalization

Two fundamental steps in the process are:

  1. Conversion to tensor: Images must be converted to tensors to be processed by PyTorch. A tensor is essentially a multidimensional vector or set of vectors, the native data type that PyTorch works with.

  2. Normalization: We apply normalization to standardize pixel values to a predefined mean and standard deviation. This helps the model converge faster during training.

Data augmentation is an essential technique that allows maximizing the value of small datasets, generating significant variations that help train more robust and accurate models. With the tools provided by PyTorch, its implementation is accessible even for those who are just starting out in the field of deep learning.

Have you ever used data augmentation techniques in your projects? Share your experience in the comments and tell us which transformations have been most effective for your specific use cases.

Contributions 0

Questions 0

Sort by:

Want to see more contributions, questions and answers from the community?