You don't have access to this class

Keep learning! Join and start boosting your career

Aprovecha el precio especial y haz tu profesión a prueba de IA

Antes: $249

Currency
$209
Suscríbete

Termina en:

2 Días
18 Hrs
28 Min
17 Seg

Clasificación Tradicional para Análisis de Sentimientos y Categorías

8/16
Resources

Sentiment analysis of product reviews has become a critical tool for companies seeking to understand their customers' perceptions. Using natural language processing and machine learning techniques, we can automatically classify whether a review is positive or negative, allowing organizations to respond in a timely manner to reviews and improve the customer experience. Let's see how to implement a sentiment classifier using traditional machine learning techniques.

How to create a sentiment classification model with Naive Bayes?

After we have cleaned and vectorized our review dataset, the next step is to create a model that can automatically classify them. For our use case, we will use a Naive Bayes classifier, which is a probabilistic model based on Bayes' theorem.

This classifier has particular characteristics:

  • It assumes independence between features (words).
  • It calculates the probability that a document belongs to a specific class.
  • It is ideal when we have limited hardware
  • It offers low latency in predictions
  • It is very practical to implement and manage

For our model, we will define a positive review as one that has more than three stars (assigning a value of 1), while negative reviews are those with one or two stars (value 0).

# We create a binary variable for sentimentdf['sentiment_bin'] = (df['stars'] > 3).astype(int)

How to split the data for training and evaluation?

To train our model, we need to split our data into training and test sets:

# TFIDF matrix will be our X variable (features)X = tfidf_matrix
 # sentiment_bin column will be our y variable (labels)y = df['sentiment_bin']
 # We split the data: 80% for training, 20% for testingX_train, X_test, y_train, y_test = train_test_split(X, y,  test_size=0.2, random_state=42).

In industry, a ratio of 80% for training and 20% for testing is generally used. In some more complex cases, a split of 70% for training, 20% for testing and 10% for validation may be used.

How to train and evaluate the Naive Bayes model?

Training the model is quite straightforward with scikit-learn:

# We import the classifierfrom sklearn.naive_bayes import MultinomialNB
 # We create and train the model modelmodel = MultinomialNB()model.fit(X_train, y_train).

One of the great advantages of Naive Bayes is its reduced training time, which allows for lower computational costs. If you are using a GPU, the training will be practically instantaneous, while on CPU it could take a few seconds more.

To evaluate the performance of the model, we use standard metrics:

# We evaluate the modelfrom sklearn.metrics import classification_report, accuracy_score
 y_pred = model.predict(X_test)print("Accuracy:", accuracy_score(y_test, y_pred))print(classification_report(y_test, y_pred))

In our case, we obtained 79% accuracy, which is a good result considering that in the industry a model is considered acceptable from 70%. It is also important to check the F1-score, especially when working with unbalanced datasets. Our F1-score values (0.84 and 0.70) also indicate good performance.

How to save and load the model for later use?

Once the model has been trained, it is essential to save it for future use with new data:

# We save the modelimport pickle
model_path = "nb_classifier.pkl"with open(model_path, 'wb') as file: pickle.dump(model, file)
 # To load the model laterwith open(model_path, 'rb') as file: loaded_model = pickle.load(file).

The model is saved with .pkl or .pickle extension, which allows us to retrieve it easily when we need to make predictions with new data.

How to use the model to classify new reviews?

To use our model with new reviews, we must follow the same preprocessing process that we applied during training:

  1. Clean the text with our clean function.
  2. Apply stopwords removal and lemmatization
  3. Transform the text to vector space using TF-IDF
  4. Perform prediction with the model
# Example of a new reviewnew_review = "This product is excellent and exceeded my expectations"
 # Preprocessingcleaned_review = clean(new_review)processed_review = remove_stopwords_and_lemmatize(cleaned_review)
 # Vectorizationreview_vector = tfidf_vectorizer.transform([processed_review])
 # Predictionprediction = loaded_model.predict(review_vector)print("Sentiment:", "Positive" if prediction[0] == 1 else "Negative")

When running this code with the review "This product is excellent and exceeded my expectations", the model correctly classifies it as positive (1).

Limitations of traditional models

It is important to note that these traditional models have certain limitations:

  • They do not understand sarcasm in text.
  • They do not correctly interpret emojis
  • They do not capture the context of expressions well.

For example, when testing with the review "I hate it 🙂", the model classifies it as negative (0), without understanding that the emoji could indicate sarcasm. Similarly, with "I loved it?", the model classifies it as positive (1), without detecting that the question mark could change the meaning of the sentence.

These limitations are due to the fact that our cleaning function removes punctuation marks and that the model is not designed to understand complex contexts. To overcome these limitations, there are more modern methods such as Transformers, which have a greater ability to understand natural language in all its complexity.

Sentiment analysis using traditional machine learning techniques offers an effective solution to automatically classify product reviews, allowing companies to quickly identify negative opinions and act accordingly. Although these models have their limitations, they represent an excellent starting point for implementing sentiment analysis systems. Have you ever implemented a similar system? Share your experience in the comments.

Contributions 1

Questions 0

Sort by:

Want to see more contributions, questions and answers from the community?

El sarcasmo es una cualidad que nos hace muy humanos.