Introducción y Fundamentos del NLP
Procesamiento de Lenguaje Natural
Configuración del Entorno y Exploración de Datos
Preprocesamiento Inicial
Quiz: Introducción y Fundamentos del NLP
Técnicas Tradicionales de NLP para Documentos Empresariales
Tokenización, Stemming y Lematización
Visualización y generación de nubes de palabras
Representación Vectorial: Bag-of-Words y TF-IDF
Extracción de Términos Clave y Modelado de Temas
Clasificación Tradicional para Análisis de Sentimientos y Categorías
Quiz: Técnicas Tradicionales de NLP para Documentos Empresariales
Introducción y Profundización en Transformers para Aplicaciones Empresariales
Fundamentos de Transformers y su Relevancia en NLP
Tokenización Avanzada con Transformers y Hugging Face
Uso de Modelos Preentrenados de Transformers para Clasificación
Reconocimiento de Entidades (NER) en Documentos Corporativos con Transformers
Fine-Tuning de Transformers para Datos Empresariales
Quiz: Introducción y Profundización en Transformers para Aplicaciones Empresariales
Proyecto Final y Estrategia Comercial B2B
Desarrollo y Prototipado de la Aplicación Empresarialparte 1
Desarrollo y Prototipado de la Aplicación Empresarialparte 2
Despliegue del proyecto en Hugging Face
Data visualization is a powerful tool for understanding customer perception of our products. Word clouds, in particular, provide an immediate visual representation of the most frequent terms in reviews, allowing us to quickly identify trends and sentiment. In this article, we will explore how to build a word cloud from Spanish Amazon reviews, using Python data analysis tools.
To start our analysis of Amazon reviews, we will use Google Colab, a platform that allows us to run Python code in the cloud. In this first part, we will work with CPU, although in later phases we might require GPU to optimize training and reduce latencies.
The process starts with loading and exploring the dataset. We will follow these steps:
!unrar
console command.# Import necessary librariesimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as sns
# Unzip the file!unrar e file.rar
# Load the datasetdf = pd.read_csv('review_dataframe_complete.csv')
# Display the first rowsdf.head(3)
The dataset we are using contains Amazon product reviews in Spanish, with the following columns:
We can scan both the first and last rows to get an idea of the content:
# First 3 rowsdf.head(3)
# Last 3 rowsdf.tail(3)
In the examples we see, there are TV reviews with comments like "no good, the screen is gone" or "horrible, we had to buy another one. Money down the drain", as well as products from other categories such as "toys" or "wireless devices" with more positive comments such as "I loved the headset".
To better understand our data, it is important to visualize the distribution of ratings and product categories. This will give us an overview of customer satisfaction and the types of products most reviewed.
We can create a bar chart to visualize the distribution of stars:
plt.figure(figsize=(8, 4))sns.countplot(x='stars', data=df)plt.title('Distribution of scores')plt.xlabel('Stars')plt.ylabel('Quantity')plt.show().
The result shows that we have approximately 40,000 products with one star, another 40,000 with two stars, and so on. This indicates that our dataset is balanced, which is ideal for analysis as there is no bias towards high or low scores.
We can also analyze the distribution of product categories:
# Count categoriescategory_counts = df['category'].value_counts()
# Take the 9 most frequent categories and group the rest as "Other"top_categories = category_counts.iloc[:9].indexdf['category_grouped'] = df['category'].apply(lambda x: x if x in top_categories else 'Others')
# Visualizeplt.figure(figsize=(10, 6))sns.countplot(x='category_grouped', data=df, palette='skyblue')plt.title('Product distribution: Top 9 plus Others')plt.xlabel('Categories')plt.ylabel('Quantity')plt.xticks(rotation=45)plt.show()
In this visualization, we can see that the most frequent categories include 'home', 'wireless' and 'toys', while the 'other' category contains approximately 80,000 products, indicating a great diversity in our dataset.
So far, we have managed to load and explore our dataset of Amazon reviews. We know the structure of the data, the distribution of scores and the main product categories. This is the first fundamental step in building our word cloud.
In the next phase, we will dig deeper into the content of the reviews, analyzing the text to identify patterns, sentiment and keywords that will help us better understand customers' perception of the products.
Data exploration is just the beginning of our analytical journey. With these basics in place, we will be ready to apply more advanced natural language processing techniques that will allow us to extract valuable insights from customer opinions.
Have you used word clouds to analyze customer feedback? Share your experiences and results in the comments section.
Contributions 3
Questions 0
Want to see more contributions, questions and answers from the community?