Introducción a BI y Data Warehouse
¿Qué es BI y Data Warehousing?
Niveles de analítica y jerarquía del conocimiento
Conceptos de BI: Data Warehouse, Data Mart, Dimensiones y Hechos
Base de datos OLTP vs. OLAP
Metodologías de Data Warehouse
Quiz: Introducción a BI y Data Warehouse
Modelos dimensionales
Data Warehouse, Data Lake y Data Lakehouse: ¿Cuál utilizar?
Tipos de esquemas dimensionales
Dimensiones lentamente cambiantes
Dimensión tipo 1
Dimensión tipo 2
Dimensión tipo 3
Tabla de hechos (fact)
Configuración de herramientas para Data Warehouse y ETL
Modelado dimensional: identificación de dimensiones y métricas
Modelado dimensional: diseño de modelo
Quiz: Modelos dimensionales
ETL para inserción en Data Warehouse
Documento de mapeo
Creación del modelo físico
Extracción: querys en SQL
Extracción en Pentaho
Transformación: dimensión de cliente
Carga: dimensión de cliente
Soluciones ETL de las tablas de dimensiones y hechos
Parámetros en ETL
Orquestar ETL en Pentaho: job
Revisión de todo el ETL
Quiz: ETL para inserción en Data Warehouse
Cierre
Reflexiones y cierre
You don't have access to this class
Keep learning! Join and start boosting your career
Orchestrating an extract, transform and load (ETL) flow is essential to ensure that the process runs in an optimized manner. Let's explore how to create a job to organize data transformations and ensure efficient execution. If you want to learn more about ETL in depth, I recommend the ETL with Python and Pentaho course available on Platzi.
To get started, it is crucial to organize the existing transformations in the right order. Follow these steps:
To calculate the maximum ID, we will create a transformation that works dynamically for any table and field. Here are the basic steps:
SELECT MAX({consecutive}) AS consecutive FROM {table};
The strategy for calculating the maximum date is also based on the use of parameterized variables:
Once the variables for the maximum record ID and date have been defined, they must be set as environment variables so that they are accessible throughout the ETL process. This is achieved by configuring:
I invite you to experiment with setting these variables in your own transformations and see how it improves the efficiency of your ETL loads. Every step taken towards optimizing ETL flows is an investment in Data Engineering skills - keep learning!
Contributions 0
Questions 3
Want to see more contributions, questions and answers from the community?