Aprender los conceptos clave

1

Todo lo que aprender谩s sobre MA con Scikit-Learn

2

驴C贸mo aprenden las m谩quinas?

3

Problemas que podemos resolver con Scikit-learn

4

Las matem谩ticas que vamos a necesitar

Iniciar un proyecto con sklearn

5

Configuraci贸n de nuestro entorno Python

6

Instalaci贸n de librer铆as en Python

7

Datasets que usaremos en el curso

Optimizaci贸n de features

8

驴C贸mo afectan nuestros features a los modelos de Machine Learning?

9

Introducci贸n al PCA

10

Preparaci贸n de datos para PCA e IPCA

11

Implementaci贸n del algoritmo PCA e IPCA

12

Kernels y KPCA

13

驴Qu茅 es la regularizaci贸n y c贸mo aplicarla?

14

Implementaci贸n de Lasso y Ridge

15

Explicaci贸n resultado de la implementaci贸n

16

ElasticNet: Una t茅cnica intermedia

Regresiones robustas

17

El problema de los valores at铆picos

18

Regresiones Robustas en Scikit-learn

19

Preparaci贸n de datos para la regresi贸n robusta

20

Implementaci贸n regresi贸n robusta

M茅todos de ensamble aplicados a clasificaci贸n

21

驴Qu茅 son los m茅todos de ensamble?

22

Preparaci贸n de datos para implementar m茅todos de ensamble

23

Implementaci贸n de Bagging

24

Implementaci贸n de Boosting

Clustering

25

Estrategias de Clustering

26

Implementaci贸n de Batch K-Means

27

Implementacti贸n de Mean-Shift

Optimizaci贸n param茅trica

28

Validaci贸n de nuestro modelo usando Cross Validation

29

Implementaci贸n de K-Folds Cross Validation

30

Optimizaci贸n param茅trica

31

Implementaci贸n de Randomized

32

Bonus: Auto Machine Learning

Salida a producci贸n

33

Revisi贸n de nuestra arquitectura de c贸digo

34

Importar y exportar modelos con Sklearn

35

Creaci贸n de una API con Flask para el modelo

36

Cierre del curso

37

Material adicional para consultar

No tienes acceso a esta clase

隆Contin煤a aprendiendo! 脷nete y comienza a potenciar tu carrera

Implementaci贸n de Bagging

23/37
Recursos

Aportes 22

Preguntas 3

Ordenar por:

Los aportes, preguntas y respuestas son vitales para aprender en comunidad. Reg铆strate o inicia sesi贸n para participar.

Les paso la versi贸n del c贸digo implementando los clasificadores que provee scikit learn

import pandas as pd

from sklearn.neighbors import  KNeighborsClassifier
from sklearn.ensemble import BaggingClassifier

from sklearn.model_selection import train_test_split
from sklearn.metrics import  accuracy_score

from sklearn.svm import LinearSVC
from sklearn.svm import SVC
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier

if __name__ == "__main__":
    
    path = './Bagging/data/heart.csv'
    dataset = pd.read_csv(path)

    print(dataset.head(5))
    print('')
    print(dataset['target'].describe())

    x = dataset.drop(['target'], axis=1, inplace=False)
    y = dataset['target']

    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.35, random_state=42)

    knn_class = KNeighborsClassifier().fit(x_train, y_train)
    knn_pred = knn_class.predict(x_test)

    print('')
    print('Accuracy KNeighbors:', accuracy_score(knn_pred, y_test))
    print('')

    #bag_class = BaggingClassifier(base_estimator=KNeighborsClassifier(), n_estimators=50).fit(x_train, y_train)
    #bag_pred = bag_class.predict(x_test)

    #print('')
    #print('Accuracy Bagging with KNeighbors:', accuracy_score(bag_pred, y_test))
    #print('')

    classifier = {
        'KNeighbors': KNeighborsClassifier(),
        'LinearSCV': LinearSVC(),
        'SVC': SVC(),
        'SGDC': SGDClassifier(),
        'DecisionTree': DecisionTreeClassifier()
    }

    for name, estimator in classifier.items():
        bag_class = BaggingClassifier(base_estimator=estimator, n_estimators=5).fit(x_train, y_train)
        bag_pred = bag_class.predict(x_test)

        print('Accuracy Bagging with {}:'.format(name), accuracy_score(bag_pred, y_test))
        print('')

Donde este es el output del c贸digo:

  • Accuracy KNeighbors: 0.6908077994428969
  • Accuracy Bagging with KNeighbors: 0.7437325905292479
  • Accuracy Bagging with SVC: 0.9164345403899722
  • Accuracy Bagging with SGDC: 0.5988857938718662
  • Accuracy Bagging with DecisionTree: 0.9610027855153204

Hola, comparto el c贸digo con varios de los algoritmos de sklearn para clasificaci贸n:

import pandas as pd 

from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import BaggingClassifier

from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.svm import LinearSVC
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

import warnings
warnings.filterwarnings("ignore")

if __name__ == '__main__':
    dt_heart = pd.read_csv('./datasets/heart.csv')
    #print(dt_heart['target'].describe())

    x = dt_heart.drop(['target'], axis=1)
    y = dt_heart['target']

    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.35, random_state=1)

    knn_class = KNeighborsClassifier().fit(x_train, y_train)
    knn_prediction = knn_class.predict(x_test)
    print('='*64)
    print('SCORE con KNN: ', accuracy_score(knn_prediction, y_test))

    '''bag_class = BaggingClassifier(base_estimator=KNeighborsClassifier(), n_estimators=50).fit(x_train, y_train) # base_estimator pide el estimador en el que va a estar basado nuestro metodo || n_estimators nos pide cuantos de estos modelos vamos a utilizar
    bag_pred = bag_class.predict(x_test)
    print('='*64)
    print(accuracy_score(bag_pred, y_test))'''

    estimators = {
        'LogisticRegression' : LogisticRegression(),
        'SVC' : SVC(),
        'LinearSVC' : LinearSVC(),
        'SGD' : SGDClassifier(loss="hinge", penalty="l2", max_iter=5),
        'KNN' : KNeighborsClassifier(),
        'DecisionTreeClf' : DecisionTreeClassifier(),
        'RandomTreeForest' : RandomForestClassifier(random_state=0)
    }

    for name, estimator in estimators.items():
        bag_class = BaggingClassifier(base_estimator=estimator, n_estimators=50).fit(x_train, y_train)
        bag_predict = bag_class.predict(x_test)
        print('='*64)
        print('SCORE Bagging with {} : {}'.format(name, accuracy_score(bag_predict, y_test)))

Output:

En algunas di贸 mejores resultados sin Bagging 馃:

classifiers = {
    'KNN': KNeighborsClassifier,
    'SGD': SGDClassifier,
    'SVC': SVC,
    'LinearSVC': LinearSVC,
    'LogisticRegression': LogisticRegression,
    'DecisionTree': DecisionTreeClassifier,
    'RandomForest': RandomForestClassifier
}

for name, classifier in classifiers.items():
    model = classifier().fit(X_train, y_train)
    prediction = model.predict(X_test)
    bag_class = BaggingClassifier(base_estimator=classifier(), n_estimators=50).fit(X_train, y_train)
    bag_predict = bag_class.predict(X_test)
    print("="*64)
    print(name)
    print("Accuracy:", accuracy_score(prediction, y_test))
    print("Bagging Accuracy:", accuracy_score(bag_predict, y_test))

KNN
Accuracy: 0.7075208913649025
Bagging Accuracy: 0.754874651810585

SGD
Accuracy: 0.6768802228412256
Bagging Accuracy: 0.6880222841225627

SVC
Accuracy: 0.6963788300835655
Bagging Accuracy: 0.6880222841225627

LinearSVC
Accuracy: 0.7994428969359332
Bagging Accuracy: 0.8022284122562674

LogisticRegression
Accuracy: 0.8328690807799443
Bagging Accuracy: 0.8217270194986073

DecisionTree
Accuracy: 0.9749303621169917
Bagging Accuracy: 0.9832869080779945

RandomForest
Accuracy: 0.9832869080779945
Bagging Accuracy: 0.9777158774373259

usando arboles de decisi贸n, obtuve un accuracy de 0.9902597402597403

Sin embargo, en la practica supongo para problemas relacionados con la salud o sector financiero, se requiere una precisi贸n de los modelos 鈥渕谩s alta鈥, por ser tem谩ticas demasiado sensibles.

Excelente clase, solo tengo una cosa para decir, vivan los modelos tree.

Utilizando el m茅todo de Bagging con 30 estimadores pero con el algoritmo de Decision Tree, alcanc茅 un accuracy mediante cross_val_score de 0.995

Este es mi score esta 100% seguro

Agregu茅 el clasificador RandomForestClassifier() de Scikit-Learn y mi Accuracy Score fue de 0.9832869080779945.
As铆 se importa:

rom sklearn.ensemble import RandomForestClassifier

Agregue random forest y decision tree, dejo el c贸digo como gu铆a

import pandas as pd  

from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

if __name__ == "__main__":
    dt_heart = pd.read_csv("./data/heart.csv")
    print(dt_heart.head(5))

    X = dt_heart.drop(["target"],axis=1)
    y = dt_heart["target"]

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.35, random_state=42)

    knn_class = KNeighborsClassifier().fit(X_train,y_train)
    knn_pred = knn_class.predict(X_test)

    print("="*64)
    print(accuracy_score(knn_pred, y_test))

    bag_class = BaggingClassifier(base_estimator=KNeighborsClassifier(),n_estimators=30).fit(X_train,y_train)
    bag_pred = bag_class.predict(X_test)

    print("="*64)
    print(accuracy_score(bag_pred, y_test))


    rf_class = BaggingClassifier(base_estimator=RandomForestClassifier(),n_estimators=30).fit(X_train,y_train)
    rf_pred = rf_class.predict(X_test)

    print("="*64)
    print(accuracy_score(rf_pred, y_test))

    dt_class = BaggingClassifier(base_estimator=DecisionTreeClassifier(),n_estimators=30).fit(X_train,y_train)
    dt_pred = dt_class.predict(X_test)

    print("="*64)
    print(accuracy_score(dt_pred, y_test))

Use SVC con gamma=2 y C=1 con y sin bagging y me dio exactamente lo mismo, un accuracy de 0.9693

#//////////////////////SVC
    print("="*32)
    print("="*32)
    print("Using SVC")

    #SVC
    svc_class = SVC(gamma=2, C=1).fit(X_train, y_train)
    svc_predict = svc_class.predict(X_test)

    print("="*32)
    print('Score SVC:', accuracy_score(y_test, svc_predict))

    #SVC With Bagging

    bag_svc_class = BaggingClassifier(base_estimator=SVC(gamma=2, C=1), n_estimators=50).fit(X_train, y_train)
    bag_svc_predict = bag_svc_class.predict(X_test)

    print("="*32)
    print('Score SVC:', accuracy_score(y_test, bag_svc_predict))
<h1>Using SVC</h1> <h1>Score SVC: 0.9693593314763231</h1>

Score SVC: 0.9610027855153204

Excelente clase!!!

Les comparto mi c贸digo, espero les sirva de gu铆a.

mi compu se tarda algo por que le di n_estimator=500 y si me tardo un poco

Hola, les comparto el c贸digo con los otros clasificadores y con una peque帽a modificaci贸n para poder realizar las respectivas configuraciones de cada modelo.

Excelente explicaci贸n.

Una implementaci贸n del Bagging con TreeDecision, b谩sica, da mejores resultados en comparaci贸n con KNeighbord.

Bagging based in TreeDecisionClassifier

Bagging based in KNeighbordClassifier

import pandas as pd
import sklearn

from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import BaggingClassifier

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

dt_heart = pd.read_csv('./heart.csv')
X = dt_heart.drop(['target'],axis=1)
y = dt_heart['target']

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.35)

knn_class = KNeighborsClassifier().fit(X_train,y_train)
knn_pred = knn_class.predict(X_test)
print(accuracy_score(knn_pred,y_test))

bag_class = BaggingClassifier(base_estimator=KNeighborsClassifier(),n_estimators=50).fit(X_train,y_train)
bag_pred = bag_class.predict(X_test)
print(accuracy_score(bag_pred,y_test))

from sklearn.svm import LinearSVC
from sklearn.svm import SVC
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier

estimadores = {
    'SVC' : SVC(),
    'LinearSVC' : LinearSVC(),
    'SGDL' : SGDClassifier(),
    'Tree' : DecisionTreeClassifier()
              }

for name, estimador in estimadores.items():
    bag_classs = BaggingClassifier(base_estimator=estimador, n_estimators=5).fit(X_train,y_train)
    estimador.fit(X_train,y_train)
    
    bag_pred = bag_classs.predict(X_test)
    predictions = estimador.predict(X_test)
    
    print(name)
    print('Accuracy: ',accuracy_score(predictions,y_test))
    print('Accuracy_BAGG: ',accuracy_score(bag_pred,y_test))

Si antes del SPLIT se normalizan los datos, los ACCURACY de los modelos aumentan.

# Normalizamos los datos
df_features = StandardScaler().fit_transform(df_features)

Para el problema de KNeighbors si primero normalzamos los datos como se vio en la clase 10 tenemos lo siguiente


Como se observa normalizando se obtiene un score mas alto que incluso usando una funci贸n de emsable, sin embargo cuando usamos el BaggingClasiffier nuestro score baja, lo cual muestra la importancia de primero normalizar los datos ( esto solo se hace en clasficaciones de acuerdo a lo mencionado en la clase 23 del curso de ML aplicado a python

Implemente el clasificador random forest y me dio un accuracy demasiado bueno.
Me hace dudar si lo hice bien

import pandas as pd
import numpy as np
from sklearn.neighbors import  KNeighborsClassifier
from sklearn.ensemble import BaggingClassifier
 
from sklearn.ensemble import RandomForestClassifier

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

if __name__ == '__main__':

    dt_heart = pd.read_csv('./data/heart.csv')
    print(dt_heart['target'].describe())

    X = dt_heart.drop(['target'], axis=1)
    y = dt_heart['target']

    X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3)

    knn_class = KNeighborsClassifier().fit(X_train, y_train)
    knn_pred = knn_class.predict(X_test)
    print('='*70)
    print(f'KNeighborsClassifier {accuracy_score(knn_pred, y_test)}')

    bag_class = BaggingClassifier(base_estimator=KNeighborsClassifier(), n_estimators=50).fit(X_train, y_train)
    bag_pred = bag_class.predict(X_test)
    print('='*70)
    print(f'BaggingClassifier {accuracy_score(bag_pred, y_test)}')
 
    forest = RandomForestClassifier(200)
    forest_class = forest.fit(X_train, y_train)
    forest_pred = forest_class.predict(X_test)
    print('='*70)
    print(f'RanfomForestClassifier: {accuracy_score(forest_pred, y_test)}')

======================================================================
KNeighborsClassifier 0.7045454545454546

BaggingClassifier 0.7597402597402597

RanfomForestClassifier: 0.9902597402597403

Agregue el siguiente codigo para probar con otros modelos

from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
estimators = {
    'KNN':KNeighborsClassifier(),
    'SVC': SVC(gamma="auto", random_state=42),
    'DecisionTree': DecisionTreeClassifier()
}

for name, estimator in estimators.items():
    bag_clf = BaggingClassifier(base_estimator=estimator, n_estimators = 50).fit(X_train, y_train)
    bag_clf.fit(X_train, y_train)
    bag_pred = bag_clf.predict(X_test)
    print('-'*64)
    print(name)
    print(accuracy_score(bag_pred,y_test))

Resultados:

----------------------------------------------------------------
KNN
0.807799442896936
----------------------------------------------------------------
SVC
0.9832869080779945
----------------------------------------------------------------
DecisionTree
1.0