You don't have access to this class

Keep learning! Join and start boosting your career

Aprovecha el precio especial y haz tu profesi贸n a prueba de IA

Antes: $249

Currency
$209
Suscr铆bete

Termina en:

1 D铆as
4 Hrs
29 Min
51 Seg

Carga: dimensi贸n de cliente

21/26
Resources

How to bring transformed data into a Data Warehouse?

Transforming data is a crucial step, but storing it properly in a Data Warehouse is even more so, as it allows it to be used for analysis, reporting and decision making. Next, we will learn how to load our transformed data into a Data Warehouse system using Redshift and Pentaho, tools that facilitate data management and optimization.

Which records are inserted or updated?

First, it is vital to determine which records within our transformation flow should be inserted as new and which should be updated. This is done by filtering based on an action indicator, where:

  • Insert: Indicator -1
  • Update: Any value other than -1.

For the records to be inserted, a CSV file will be created and stored in an Amazon S3 bucket. Records that require updating can be processed directly due to their small volume.

How is an update performed?

Setting up an update process is essential to maintain data integrity. A search by customer ID is performed to update specific fields such as first names, last names and contact numbers, without affecting the original customer ID and upload date. Here are some key steps:

  • Connection and table selection: Use Redshift and select the target table.
  • Field update: Change the relevant fields without unwanted alterations.
  • Field mapping: Ensure that the data flow matches the table structure in the database.

How to export data to an S3 bucket?

For new records, the export to CSV is performed in such a way that the file is saved in S3. The correct delimiter is set and the personal bucket path is validated. It is ensured that the columns conform to the expected format.

S3://personal_bucket/AdventureWorks/DimClients.csv

Here special attention is paid to data formats, such as dates, ensuring a consistent schema for further uploading.

How do we upload files to Redshift from S3?

The COPY method is key to accessing and transferring data from S3 to Redshift. Configuration involves:

  • Definition of the target table: AdventureWorks.DIMCustomers
  • Source file path: Matching the bucket set.
  • Formats and permissions: Be sure to assign the relevant access keys.
COPY AdventureWorks.DIMClientsFROM 'S3://bucket_personal/AdventureWorks/DimClients.csv'.

Additional challenges

After setting up the data load, challenges are proposed to reinforce learning by applying to other tables such as products, vendors, territory and dates, each with their own characteristics that require tailored solutions.

Keep your execution compliant with archiving and transformation best practices, optimizing both data integrity and query performance. This understanding will enable you to address any particularities in complex data scenarios.

The next step is to consolidate your workflow by integrating transformations and orchestrations, thus ensuring efficient data management in your Data Warehouse! Continue to learn and enhance your data management skills.

Contributions 2

Questions 1

Sort by:

Want to see more contributions, questions and answers from the community?

`COPY dwh_adventureworks.dim_clientes` `FROM 's3://dwhlearning/path/to/yourfile.csv'` `CREDENTIALS 'aws_access_key_id=YOUR_ACCESS_KEY;aws_secret_access_key=YOUR_SECRET_KEY'` `DELIMITER ';'` `REGION 'us-east-1';`
En windows tuve que hacer lo siguiente para poder conectar al s3: te vas a la carpeta donde tienes pentaho en mi caso es la siguiente: **C:\Users\User\Desktop\pdi-ce-9.4.0.0-343\data-integration** despues buscamos el archivo **Spoon.bat** , click derecho **abrir con code** y despues de los primeros comentarios escribimos las keys de s3: ![](https://static.platzi.com/media/user_upload/image-ca1cd19d-badd-4418-86b0-104b2222f962.jpg) guardamos y cerramos pentaho si esta abierto y listo :) (asi fue como me funciono intente)