You don't have access to this class

Keep learning! Join and start boosting your career

Aprovecha el precio especial y haz tu profesión a prueba de IA

Antes: $249

Currency
$209
Suscríbete

Termina en:

0 Días
5 Hrs
20 Min
32 Seg

Configuración de Modelos Preentrenados para Segmentación con YOLO

5/16
Resources

Image segmentation with YOLO represents one of the most advanced techniques in computer vision, making it possible to identify and delimit specific objects in real time. This technology not only detects objects, but also creates precise masks that separate them from the background, offering revolutionary applications in fields such as robotics, surveillance and video analysis.

What is YOLO and why is it important for image segmentation?

YOLO (You Only Look Once) is an object detection algorithm that analyzes an image in a single pass, allowing it to be extremely fast. Currently, the company Ultralytics is developing new versions of YOLO, offering an open source library with a large support community.

The main features of YOLO include:

  • Real-time processing
  • Image segmentation
  • Object detection
  • Identification of body minutiae

YOLO's popularity is due to its efficiency and versatility, allowing implementations on a variety of devices, from powerful GPUs to resource-constrained systems such as standard CPUs.

How to implement segmentation with YOLO in real time?

To implement segmentation with YOLO we need some fundamental tools:

  • OpenCV for image processing
  • NumPy for mathematical operations
  • Ultralytics for accessing YOLO models.

If you don't have the Ultralytics library installed yet, you can do it with a simple pip installation command.

Selecting the appropriate model

For segmentation, we will use YOLOv11 in its "nano" version, which is the smallest model available. This choice is ideal when working with CPUs instead of GPUs, as it requires less computational resources.

# Import the necessary librariesimport cv2import numpy as npfrom ultralytics import YOLO
 # Load the YOLOv11 model nanomodel = YOLO('yolov11n.pt')
 # Set video source (0 for main camera, 1 for second camera)cap = cv2.VideoCapture(1)
 # Set resolutioncap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

Frame processing and latency measurement

An important aspect when working with real-time segmentation is to measure latency, that is, how long it takes the system to process each frame:

while True: # Capture frame ret, frame = cap.read()    
 # Measure start time start_time = time.time()    
 # Process frame with YOLO results = model(frame)    
 # Calculate latency latency = time.time() - start_time    
 # Display FPS cv2.putText(frame, f "Latency: {latency:.3f}s", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)

Visualization of bounding boxes and segmentation

Segmentation with YOLO provides two main elements:

  1. Bounding boxes: Rectangles enclosing the detected objects.
  2. Segmentation masks: Colored areas that delimit exactly the shape of the object.
# Access detectionsfor r in results: boxes = r.boxes masks = r.masks    
 # Process each detection for i, box in enumerate(boxes): # Get coordinates of the bounding box x1, y1, x2, y2 = map(int, box.xyxy[0])        
 # Get confidence and class confidence = float(box.conf[0]) class_id = int(box.cls[0])        
 # Draw bounding box cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)        
 # Add label label = f"{class_names[class_id]}:  {confidence:.2f}" cv2.putText(frame, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

Creating and applying segmentation masks

The most distinctive part of segmentation is the creation of masks that are superimposed on the original image:

# Process segmentation masksif masks is not None: for i, mask in enumerate(masks): # Resize mask to frame size mask_image = mask.data.cpu().numpy() mask_image = cv2.resize(mask_image, (frame.shape[1], frame.shape[0]))        
 # Create Boolean mask mask mask_bool = mask_image > 0.5        
 # Generate random color for mask color = np.random.randint(0, 255, 3,  dtype=np.uint8)        
 # Apply color to mask colored_mask = np.zeros_like(frame) colored_mask[mask_bool] = color        
 # Combine mask with original frame = cv2.addWeighted(frame, 1, colored_mask, 0.5, 0)

How to filter specific objects in segmentation?

One of the advantages of YOLO is the ability to filter specific objects according to our needs. We can do this in two ways:

Filtering by confidence.

We can set a confidence threshold to show only detections with high probability:

# Set confidencethreshold confidence_threshold = 0.7
 # Filter detections by confidenceresults = model(frame,  conf=confidence_threshold).

Filtering by object class

YOLO comes pre-trained to detect 80 different categories of objects. We can filter to show only the classes we are interested in:

# classes of interest (0 = person, 13 = chair)classes_of_interest = [0, 13]
 # Filtering by class and confidenceresults = model(frame,  conf=0.7,  classes=classes_of_interest)

The YOLO pre-trained model assigns a number to each object category. For example:

  • 0: Person
  • 13: Chair
  • 41: Cup
  • 77: Teddy bear

This allows great flexibility for specific applications, such as security systems that only detect people or inventory applications that focus on particular products.

Segmentation with YOLO represents a powerful tool for image analysis, combining speed and accuracy in a single system. Its ability to process video in real time and generate accurate masks opens up a world of possibilities for developers and researchers. Have you tried implementing YOLO in any of your projects? Share your experience in the comments.

Contributions 2

Questions 0

Sort by:

Want to see more contributions, questions and answers from the community?

Este curso cada vez se pone mejor 🧨
La segmentación de imágenes con YOLO permite detectar objetos y crear máscaras en tiempo real, ideal para robótica, vigilancia y análisis de video. Utilizamos YOLOv11 Nano de Ultralytics, ideal para CPUs, junto con OpenCV y NumPy. Se captura video, se procesan los frames con YOLO para obtener bounding boxes y máscaras, se mide latencia y se filtran objetos por confianza o clase. Es rápido, preciso y muy flexible para diferentes aplicaciones.