Ojalá este profe diera el de regresión lineal. Todo es tan claro…
Fundamentos de regresión logística
¿Qué es la regresión logística?
Tu primera clasificación con regresión logística
¿Cuándo usar regresión logística?
Fórmula de regresión logística
Regresión logística binomial
Preparando los datos
Análisis de correlación y escalabilidad de los datos
Análisis exploratorio de datos
Entrenamiento con regresión logística binomial
Evaluando el modelo (MLE)
Análisis de resultados de regresión logística
Regularizers
Regresión logística multinomial
Cómo funciona la regresión logística multiclase
Carga y preprocesamiento de datos multiclase
Análisis exploratorio y escalamiento de datos multiclase
Entrenamiento y evaluación de regresión logística multiclase
Conclusiones
Proyecto final y siguientes pasos
Comparte tu proyecto de regresión logística y certifícate
Logistic regression is a powerful and versatile technique in the field of Machine Learning, especially for data classification. The use of Python and Scikit Learn facilitates its implementation, allowing us to tackle complex tasks with relative simplicity. We will begin by discussing how to set up the environment and efficiently load the necessary data.
For this project, we need several libraries that will help us in different aspects of the process:
These libraries, already preloaded in the environment, allow us to work without complications. The specific dataset we will use are images of handwritten digits, available through LogDigit
from Scikit Learn.dataset
.
We start by loading the data into an object called Digits
:
from sklearn.datasets import load_digitsdigits digits = load_digits().
The Digits
object contains several relevant properties, including thedata
, thefeature_names
, and a Target
variable, which indicates which digit is represented in each image.
To see these digit images more clearly, we use NumPy to restructure them into an 8x8 format, which is the structure documented in the original dataset.
import numpy as np
image = np.reshape(digits.data[0], (8, 8))
We can visualize the image using Matplotlib
:
import matplotlib.pyplot as plt
plt.imshow(image, cmap='gray')plt.show()
This visualization allows us to better understand the data we are manipulating, providing a firm foundation for learning the model.
Properly dividing our data between training and test sets is crucial to validate and evaluate the performance of our model. This responsibility not only underpins the results obtained, but also ensures the reliability of the algorithm in the face of previously unseen data.
The separation of the data into training and testing allows:
Scikit Learn's train_test_split
function is used to split the data:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.2, random_state=0).
Here, test_size=0.2
indicates that 20% of the dataset will be used for testing. The random_state
ensures that the split is reproducible in future runs.
Once the data is ready and split, the next step is to train the logistic regression model. Here, we will highlight how to set up a model, train it, predict results, and finally, evaluate its performance.
Setting up and training the model is extremely simple:
from sklearn.linear_model import LogisticRegression
logistic_reg = LogisticRegression(max_iter=200)logistic_reg.fit(x_train, y_train).
The fit
function trains the model using the training set.
With the model trained, we can obtain predictions on the test set:
predictions = logistic_reg.predict(x_test).
These predictions will allow us to evaluate the performance of the model by comparing them with the actual values of y_test
.
To evaluate the effectiveness of the model, we use a confusion matrix:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, predictions).
And to visualize it:
import seaborn as sns
plt.figure(figsize=(9, 9))sns.heatmap(cm, annot=True, linewidths=0.5, square=True, cmap='coolwarm')plt.ylabel('Actual label')plt.xlabel('Predicted label')plt.show()
This matrix allows us to identify mostly hits and misses in the model; the values on the diagonal indicate the number of correct predictions.
Exploring logistic regression using Python and Scikit Learn is an excellent starting point to enter the world of machine learning. The simplicity of the code and the accuracy of the classification demonstrate the effectiveness of this technique. I invite you to keep digging and practicing with more complex models, following this course or exploring other datasets and algorithms. Learning never ends!
Contributions 15
Questions 2
Ojalá este profe diera el de regresión lineal. Todo es tan claro…
Para evitar que se muestren esas advertencias de ConvergeWarning al usar el modelo de regresión logística
es recomendable SIEMPRE poner max_iter=10000, algunas veces basta con solo poner 1000
Para determinar el accuracy o evaluar rápidamente un modelo de Regresion Logistica podemos usar el la función score ( esto no es recomendable cuando se tienen clases muy desbalanceadas pero para “ganar tiempo” resulta muy util)
Con este código pueden subir sus propias fotos de dígitos a mano (Escritos sobre fondo negro) y se genera una predicción
from PIL import Image
import math
def classify(img_path):
image = Image.open(img_path).convert('L')
plt.imshow(image, cmap='gray')
plt.title('Your photo')
plt.show()
x, y = image.size
x2, y2 = 8, 8
image = image.resize((x2,y2))
image = [np.array(image).flatten()]
prediction = logistic_reg.predict(image)[0]
print(f'\n The image has a {prediction}')
classify('/content/example.png')

plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, linewidths=.5, square=True, cmap='coolwarm')
plt.ylabel('Predicted label')
plt.xlabel('Actual label')
plt.ylabel('Predicted label')
plt.xlabel('Actual label')
Para evitar esas letras “raras” luego de usar matplotlib se puede usar ; como se muestra aquí:
Si desean acceder a las imagenes sin hacer reshape: digits.images
Want to see more contributions, questions and answers from the community?