No puedo creer que esto este en Platzi, espero que mis ojos me esten engañando!
Fundamentos de MLOps y tracking de modelos
¿Qué es MLOps y para qué sirve?
Tracking de modelos en localhost con MLflow
Tracking de modelos en localhost: directorio personalizado
Etapas del ciclo de MLOps
Componentes de MLOps
Tracking de modelos con MLflow y SQLite
Tracking de modelos con MLflow en la nube
Tracking del ciclo de vida de modelos de machine learning
Tracking de experimentos con MLflow: preprocesamiento de datos
Tracking de experimentos con MLflow: definición de funciones
Tracking de experimentos con MLflow: tracking de métricas e hiperparámetros
Tracking de experimentos con MLflow: reporte de clasificación
Entrenamiento de modelos baseline y análisis en UI de MLflow
MLflow Model Registry: registro y uso de modelos
Registro de modelos con mlflow.client
Testing de modelo desde MLflow con datos de prueba
¿Para qué sirve el tracking de modelos en MLOps?
Orquestación de pipelines de machine learning
Tasks con Prefect
Flows con Prefect
Flow de modelo de clasificación de tickets: procesamiento de datos y features
Flow de modelo de clasificación de tickets: integración de las tasks
Flow de modelo de clasificación de tickets: ejecución de tasks
¿Cómo se integra la orquestación en MLOps?
Despliegue de modelo de machine learning
Despligue con Docker y FastAPI: configuración y requerimientos
Despligue con Docker y FastAPI: definición de clases y entry point
Despligue con Docker y FastAPI: procesamiento de predicciones en main app
Despligue con Docker y FastAPI: configuración de la base de datos
Despliegue y pruebas de modelo de machine learning en localhost
Despliegue y pruebas de modelo de machine learning en la nube
¿Qué hacer con el modelo desplegado?
Monitoreo de modelo de machine learning en producción
¿Cómo monitorear modelos de machine learning en producción?
Entrenamiento de modelo baseline
Preparar datos para crear reporte con Evidently
Análisis de la calidad de los datos con Evidently
Creación de reportes con Grafana
¿Cómo mejorar tus procesos de MLOps?
You don't have access to this class
Keep learning! Join and start boosting your career
The design of a programming workflow should avoid encompassing too many functions in a single task. This is comparable to creating methods in classes that do everything. The key is to split the functionalities to facilitate debugging and refactoring in the future.
To implement this, when performing data transformation and data splitting, it is essential to first read the product file of the previous data processing task. In addition, you must consider where the results will be stored, usually in an orchestration module with a designated folder for the processed data.
Within our flow, the processed data is meticulously organized to ensure traceability. For example, we find different folders containing:
When the data transformation is performed, these results are also stored in the processed data folder.
Effective data transformation and partitioning involves considering how we structure the data set for use in predictive models. The key steps in such a task are:
Reading the DataFrame: Start by reading the CSV resulting from the previous task.
Identification of features and labels: The processed text is recognized as X
(features), and the labels, initially in string format, are transformed to integers needed for the model.
Use of Convectorizer: Convectorizer
instance of Scikit-learn to perform transformations on the data.
Dataset splitting: The X
set is split into training and test data.
vectorizer = CountVectorizer() X_transformed = vectorizer.fit_transform(X)
Adding a task to train the optimal model is vital to improve performance and get more accurate predictions. The TrainingBestModel
function focuses on the following aspects:
config.py
.To begin experimentation with MLflow, we follow this approach:
Experiment initialization: Use start_run
to track the experiment.
Training and prediction: Train the model and obtain predictions along with performance metrics.
Model and metrics logging: Save the trained model and metrics as an artifact.
Report printing: Print a sorted report for both training and testing.
with mlflow.start_run(run_name="BestModelRun") as run: model = RandomForestClassifier(**params) model.fit(x_train, y_train) predictions = model.predict(x_test) accuracy = accuracy_score(y_test, predictions) mlflow.log_metric("accuracy", accuracy) mlflow.sklearn.log_model(model, "model")
By integrating multiple tasks into the flow, you optimize the training processes, ensuring a continuous and efficient flow. Every detail, from the input configuration to the comments in the code, plays a critical role in the success of the pipeline.
Contributions 1
Questions 1
No puedo creer que esto este en Platzi, espero que mis ojos me esten engañando!
Want to see more contributions, questions and answers from the community?