Introducción y Fundamentos del NLP
Procesamiento de Lenguaje Natural
Configuración del Entorno y Exploración de Datos
Preprocesamiento Inicial
Quiz: Introducción y Fundamentos del NLP
Técnicas Tradicionales de NLP para Documentos Empresariales
Tokenización, Stemming y Lematización
Visualización y generación de nubes de palabras
Representación Vectorial: Bag-of-Words y TF-IDF
Extracción de Términos Clave y Modelado de Temas
Clasificación Tradicional para Análisis de Sentimientos y Categorías
Quiz: Técnicas Tradicionales de NLP para Documentos Empresariales
Introducción y Profundización en Transformers para Aplicaciones Empresariales
Fundamentos de Transformers y su Relevancia en NLP
Tokenización Avanzada con Transformers y Hugging Face
Uso de Modelos Preentrenados de Transformers para Clasificación
Reconocimiento de Entidades (NER) en Documentos Corporativos con Transformers
Fine-Tuning de Transformers para Datos Empresariales
Quiz: Introducción y Profundización en Transformers para Aplicaciones Empresariales
Proyecto Final y Estrategia Comercial B2B
Desarrollo y Prototipado de la Aplicación Empresarialparte 1
Desarrollo y Prototipado de la Aplicación Empresarialparte 2
Despliegue del proyecto en Hugging Face
You don't have access to this class
Keep learning! Join and start boosting your career
Sentiment analysis of product reviews has become a critical tool for companies seeking to understand their customers' perceptions. Using natural language processing and machine learning techniques, we can automatically classify whether a review is positive or negative, allowing organizations to respond in a timely manner to reviews and improve the customer experience. Let's see how to implement a sentiment classifier using traditional machine learning techniques.
After we have cleaned and vectorized our review dataset, the next step is to create a model that can automatically classify them. For our use case, we will use a Naive Bayes classifier, which is a probabilistic model based on Bayes' theorem.
This classifier has particular characteristics:
For our model, we will define a positive review as one that has more than three stars (assigning a value of 1), while negative reviews are those with one or two stars (value 0).
# We create a binary variable for sentimentdf['sentiment_bin'] = (df['stars'] > 3).astype(int)
To train our model, we need to split our data into training and test sets:
# TFIDF matrix will be our X variable (features)X = tfidf_matrix
# sentiment_bin column will be our y variable (labels)y = df['sentiment_bin']
# We split the data: 80% for training, 20% for testingX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42).
In industry, a ratio of 80% for training and 20% for testing is generally used. In some more complex cases, a split of 70% for training, 20% for testing and 10% for validation may be used.
Training the model is quite straightforward with scikit-learn:
# We import the classifierfrom sklearn.naive_bayes import MultinomialNB
# We create and train the model modelmodel = MultinomialNB()model.fit(X_train, y_train).
One of the great advantages of Naive Bayes is its reduced training time, which allows for lower computational costs. If you are using a GPU, the training will be practically instantaneous, while on CPU it could take a few seconds more.
To evaluate the performance of the model, we use standard metrics:
# We evaluate the modelfrom sklearn.metrics import classification_report, accuracy_score
y_pred = model.predict(X_test)print("Accuracy:", accuracy_score(y_test, y_pred))print(classification_report(y_test, y_pred))
In our case, we obtained 79% accuracy, which is a good result considering that in the industry a model is considered acceptable from 70%. It is also important to check the F1-score, especially when working with unbalanced datasets. Our F1-score values (0.84 and 0.70) also indicate good performance.
Once the model has been trained, it is essential to save it for future use with new data:
# We save the modelimport pickle
model_path = "nb_classifier.pkl"with open(model_path, 'wb') as file: pickle.dump(model, file)
# To load the model laterwith open(model_path, 'rb') as file: loaded_model = pickle.load(file).
The model is saved with .pkl
or .pickle
extension, which allows us to retrieve it easily when we need to make predictions with new data.
To use our model with new reviews, we must follow the same preprocessing process that we applied during training:
clean
function.# Example of a new reviewnew_review = "This product is excellent and exceeded my expectations"
# Preprocessingcleaned_review = clean(new_review)processed_review = remove_stopwords_and_lemmatize(cleaned_review)
# Vectorizationreview_vector = tfidf_vectorizer.transform([processed_review])
# Predictionprediction = loaded_model.predict(review_vector)print("Sentiment:", "Positive" if prediction[0] == 1 else "Negative")
When running this code with the review "This product is excellent and exceeded my expectations", the model correctly classifies it as positive (1).
It is important to note that these traditional models have certain limitations:
For example, when testing with the review "I hate it 🙂", the model classifies it as negative (0), without understanding that the emoji could indicate sarcasm. Similarly, with "I loved it?", the model classifies it as positive (1), without detecting that the question mark could change the meaning of the sentence.
These limitations are due to the fact that our cleaning function removes punctuation marks and that the model is not designed to understand complex contexts. To overcome these limitations, there are more modern methods such as Transformers, which have a greater ability to understand natural language in all its complexity.
Sentiment analysis using traditional machine learning techniques offers an effective solution to automatically classify product reviews, allowing companies to quickly identify negative opinions and act accordingly. Although these models have their limitations, they represent an excellent starting point for implementing sentiment analysis systems. Have you ever implemented a similar system? Share your experience in the comments.
Contributions 1
Questions 0
Want to see more contributions, questions and answers from the community?