Fundamentos de MLOps y tracking de modelos

1

¿Qué es MLOps y para qué sirve?

2

Tracking de modelos en localhost con MLflow

3

Tracking de modelos en localhost: directorio personalizado

4

Etapas del ciclo de MLOps

5

Componentes de MLOps

6

Tracking de modelos con MLflow y SQLite

7

Tracking de modelos con MLflow en la nube

Tracking del ciclo de vida de modelos de machine learning

8

Tracking de experimentos con MLflow: preprocesamiento de datos

9

Tracking de experimentos con MLflow: definición de funciones

10

Tracking de experimentos con MLflow: tracking de métricas e hiperparámetros

11

Tracking de experimentos con MLflow: reporte de clasificación

12

Entrenamiento de modelos baseline y análisis en UI de MLflow

13

MLflow Model Registry: registro y uso de modelos

14

Registro de modelos con mlflow.client

15

Testing de modelo desde MLflow con datos de prueba

16

¿Para qué sirve el tracking de modelos en MLOps?

Orquestación de pipelines de machine learning

17

Tasks con Prefect

18

Flows con Prefect

19

Flow de modelo de clasificación de tickets: procesamiento de datos y features

20

Flow de modelo de clasificación de tickets: integración de las tasks

21

Flow de modelo de clasificación de tickets: ejecución de tasks

22

¿Cómo se integra la orquestación en MLOps?

Despliegue de modelo de machine learning

23

Despligue con Docker y FastAPI: configuración y requerimientos

24

Despligue con Docker y FastAPI: definición de clases y entry point

25

Despligue con Docker y FastAPI: procesamiento de predicciones en main app

26

Despligue con Docker y FastAPI: configuración de la base de datos

27

Despliegue y pruebas de modelo de machine learning en localhost

28

Despliegue y pruebas de modelo de machine learning en la nube

29

¿Qué hacer con el modelo desplegado?

Monitoreo de modelo de machine learning en producción

30

¿Cómo monitorear modelos de machine learning en producción?

31

Entrenamiento de modelo baseline

32

Preparar datos para crear reporte con Evidently

33

Análisis de la calidad de los datos con Evidently

34

Creación de reportes con Grafana

35

¿Cómo mejorar tus procesos de MLOps?

You don't have access to this class

Keep learning! Join and start boosting your career

Aprovecha el precio especial y haz tu profesión a prueba de IA

Antes: $249

Currency
$209
Suscríbete

Termina en:

0 Días
3 Hrs
14 Min
56 Seg

Flow de modelo de clasificación de tickets: integración de las tasks

20/35
Resources

How to split tasks to improve flow clarity and efficiency?

The design of a programming workflow should avoid encompassing too many functions in a single task. This is comparable to creating methods in classes that do everything. The key is to split the functionalities to facilitate debugging and refactoring in the future.

To implement this, when performing data transformation and data splitting, it is essential to first read the product file of the previous data processing task. In addition, you must consider where the results will be stored, usually in an orchestration module with a designated folder for the processed data.

Where is the processed data stored?

Within our flow, the processed data is meticulously organized to ensure traceability. For example, we find different folders containing:

  • The trained model with its optimal hyperparameters.
  • Training and test data in Pickle format, which is useful to distinguish which data was trained with and which data will be used for future tests.
  • A JSON file with the ID to string mapping.

When the data transformation is performed, these results are also stored in the processed data folder.

How to transform and split data properly?

Effective data transformation and partitioning involves considering how we structure the data set for use in predictive models. The key steps in such a task are:

  1. Reading the DataFrame: Start by reading the CSV resulting from the previous task.

  2. Identification of features and labels: The processed text is recognized as X (features), and the labels, initially in string format, are transformed to integers needed for the model.

  3. Use of Convectorizer: Convectorizer instance of Scikit-learn to perform transformations on the data.

  4. Dataset splitting: The X set is split into training and test data.

    Using Convectorizer to transform the data

    vectorizer = CountVectorizer() X_transformed = vectorizer.fit_transform(X)

How to train the best model in your flow?

Adding a task to train the optimal model is vital to improve performance and get more accurate predictions. The TrainingBestModel function focuses on the following aspects:

  • Training and test data: Receive these sets as essential arguments.
  • Optimal hyperparameters: A dictionary of hyperparameter values that are located in config.py.

To begin experimentation with MLflow, we follow this approach:

  1. Experiment initialization: Use start_run to track the experiment.

  2. Training and prediction: Train the model and obtain predictions along with performance metrics.

  3. Model and metrics logging: Save the trained model and metrics as an artifact.

  4. Report printing: Print a sorted report for both training and testing.

    Model initialization and registration

    with mlflow.start_run(run_name="BestModelRun") as run: model = RandomForestClassifier(**params) model.fit(x_train, y_train) predictions = model.predict(x_test) accuracy = accuracy_score(y_test, predictions) mlflow.log_metric("accuracy", accuracy) mlflow.sklearn.log_model(model, "model")

By integrating multiple tasks into the flow, you optimize the training processes, ensuring a continuous and efficient flow. Every detail, from the input configuration to the comments in the code, plays a critical role in the success of the pipeline.

Contributions 1

Questions 1

Sort by:

Want to see more contributions, questions and answers from the community?

No puedo creer que esto este en Platzi, espero que mis ojos me esten engañando!