You don't have access to this class

Keep learning! Join and start boosting your career

Aprovecha el precio especial y haz tu profesi贸n a prueba de IA

Antes: $249

Currency
$209
Suscr铆bete

Termina en:

0 D铆as
15 Hrs
18 Min
47 Seg

Seguimiento y An谩lisis de Miradas con Mediapipe

11/16
Resources

Face detection using artificial intelligence has revolutionized numerous fields, from security to augmented reality applications. In this article, we will explore how to implement an eye-tracking system using Python, OpenCV and MediaPipe, focusing on facial minutiae detection and, specifically, gaze tracking. This technology is fundamental to understanding human-computer interaction and has applications in fields as diverse as psychology, marketing and interface development.

How does facial minutiae detection work?

When we talk about advanced facial detection, we are referring to the identification of specific points or "landmarks" on the human face. MediaPipe offers a robust solution for this task, allowing us to accurately detect elements such as eyes, nose, mouth and facial contours.

To implement a basic eye tracking system, we need:

  • OpenCV for image processing
  • MediaPipe for facial point detection
  • Python as programming language

The code base for detecting eye tips is structured as follows:

# MediaPipe Face Mesh configurationmp_face_mesh = mp.solutions.face_meshface_mesh face_mesh = mp_face_mesh.FaceMesh(  min_detection_confidence=0.5,  min_tracking_confidence=0.5)
 # Video capturecap = cv2.VideoCapture(1) # Use 0 if there is only one camera
while cap.isOpened(): success, frame = cap.read() if not success: break        
 # Convert BGR to RGB rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)    
 # Process frame results = face_mesh.process(rgb_frame)    
 if results.multi_face_landmarks: for face_landmarks in results.multi_face_landmarks: # Get left eye coordinates (landmark 33) left_eye = face_landmarks.landmark[33] x_left = int(left_eye.x * frame.shape[1]) y_left = int(left_eye.y * frame.shape[0])            
 # Get right eye coordinates (landmark 263) right_eye = face_landmarks.landmark[263] x_right = int(right_eye.x * frame.shape[1]) y_right = int(right_eye.y * frame.shape[0])            
 # Draw circles at the ends of the eyes cv2.circle(frame, (x_left, y_left), 3, (0, 255, 0),-1) cv2.circle(frame, (x_right, y_right), 3, (0, 255, 0),-1)    
 # Show the result cv2.imshow('Eye Tracking', frame) if cv2.waitKey(5) & 0xFF == ord('q'): break
cap.release()cv2.destroyAllWindows()

It is important to note that MediaPipe works with relative coordinates (values between 0 and 1), so we must multiply them by the width and height of the frame to obtain the absolute coordinates in pixels.

What are facial landmarks and how are they identified?

Facial landmarks are specific points on the face that MediaPipe can detect. For basic eye tracking, we focus on landmarks 33 and 263, which correspond to the outer edges of the left and right eyes, respectively.

MediaPipe Face Mesh provides a detailed map with hundreds of facial points. Although the detection is robust, it can make probabilistic predictions when part of the face is hidden, meaning that it could estimate the position of an eye even if it is partially covered.

How to improve eye tracking with iris detection?

For more precise applications, we can go beyond the ends of the eyes and detect the position of the iris (the colored part of the eye). MediaPipe offers a functionality to refine the landmarks and detect specific points of the iris.

# Configuration with landmark refinementface_mesh = mp_face_mesh.FaceMesh(  min_detection_confidence=0.5,  min_tracking_confidence=0.5, refine_landmarks=True) # Enable refinement to detect the iris.

With this configuration, we can access additional landmarks (468, 469, 470, 471) that represent points on the contour of the iris of the left eye. Similarly, there are landmarks for the right eye.

To calculate the center of the iris, we average the positions of these four points:

# detect the four points of the left irisleft_iris_landmarks = []for i in range(468, 472): landmark = face_landmarks.landmark[i] x = int(landmark.x * frame.shape[1]) y = int(landmark.y * frame.shape[0]) left_iris_landmarks.append((x, y)) cv2.circle(frame, (x, y), 2, (0, 0, 255),-1) # Red dots
 # Calculate the center of the left irisleft_iris_x = sum(p[0] for p in left_iris_landmarks) // 4left_iris_y = sum(p[1] for p in left_iris_landmarks) // 4cv2.circle(frame, (left_iris_x, left_iris_y), 3, (255, 0, 0, 0),-1) # Blue dot.

This technique provides more accurate detection of where the person is looking, as the iris is a more direct indicator of gaze direction than the ends of the eyes.

What is the most effective strategy for gaze tracking?

Depending on the application, we can opt for different strategies:

  1. Eye tips: Useful for basic detection and less computationally intensive.
  2. Center of iris: More accurate for determining the exact direction of gaze.
  3. Midpoint between extremes: An intermediate solution that works well for close objects.

To implement the third strategy, we calculate the midpoint between the extremes of each eye:

# Calculate midpoint of left eyemid_left_left_x = (x_left_inner + x_left_outer) // 2mid_left_y = (y_left_inner + y_left_outer) // 2cv2.circle(frame, (mid_left_x, mid_left_y), 3, (255, 0, 0, 0),-1) # Blue point.

This approximation is especially useful when the person is looking at close objects, since the midpoint between the ends of the eye usually aligns with the direction of gaze in these situations.

What other applications does face detection have with MediaPipe?

In addition to eye tracking, MediaPipe Face Mesh allows many other facial points to be detected and tracked, such as:

  • The tip of the nose (an interesting challenge to implement).
  • The contours of the lips
  • Eyebrows
  • Facial contours

These capabilities open the door to numerous applications:

  • Augmented reality filters
  • Facial expression analysis
  • Biometric authentication systems
  • Interfaces controlled by facial gestures

Face detection with MediaPipe and Python represents a powerful and accessible tool for developers interested in computer vision. Whether you are building an eye-tracking application or exploring other possibilities for human-computer interaction, these techniques provide a solid foundation for your project.

Have you experimented with face detection in your projects? We invite you to share your experiences and code in the comments, especially if you have implemented the nose tip tracking challenge.

Contributions 0

Questions 0

Sort by:

Want to see more contributions, questions and answers from the community?