No tienes acceso a esta clase

隆Contin煤a aprendiendo! 脷nete y comienza a potenciar tu carrera

Aprende Ingl茅s, Programaci贸n, AI, Ciberseguridad y m谩s a precio especial.

Antes: $249


Termina en:

2 D铆as
21 Hrs
40 Min
23 Seg

Implementando DBSCAN


Aportes 5

Preguntas 2

Ordenar por:

驴Quieres ver m谩s aportes, preguntas y respuestas de la comunidad?

***<u>What is noise parameter?</u>*** **X\_m ,y\_m = make\_moons(n\_samples=250, noise=0.5, random\_state=42)** In the context of machine learning and synthetic data generation, the term "noise" typically refers to random variations or perturbations added to the data. It introduces random fluctuations or errors to make the dataset more realistic and to simulate the inherent variability in real-world data. In the specific case of the `make_moons` function from scikit-learn, the `noise` parameter controls the amount of random variation to be added to the data. The function generates a synthetic dataset representing two interleaving half circles, and the `noise` parameter allows you to control the level of random noise applied to the data points. Here's an explanation of the parameters used in your code snippet: * `n_samples`: The total number of data points to generate. * `noise`: The standard deviation of the Gaussian noise added to the data. A higher value of `noise` will result in more scattered and less structured data points. * `random_state`: Seed for random number generation to ensure reproducibility. So, in the case of `make_moons(n_samples=250, noise=0.05, random_state=42)`, it generates a dataset with 250 samples representing two half circles, and each data point has some level of random noise (standard deviation of 0.05) added to its position. This is useful for creating a more realistic and challenging dataset for certain types of machine learning tasks, especially those dealing with non-linear decision boundaries.
## <u>Which would be the best noise?</u> ![馃殌]( There isn't a single "best" noise level for the moons data. The ideal noise level depends on what you're trying to achieve: * **Easy clustering task and baseline performance:** If you want to establish a baseline performance for your clustering algorithm or want a very easy task to test with, then using a low noise level (e.g., noise=0.01 or even no noise) might be suitable. * **Simulating real-world data:** Real-world data often has some level of noise or uncertainty. Using a moderate noise level (e.g., noise=0.05 as you used) can be a good choice to make the clustering task more realistic and reflect the challenges of working with actual data. * **Evaluating robustness to noise:** If you're interested in testing how well your clustering algorithm handles noise, you might experiment with different noise levels (e.g., noise=0.05, 0.1, 0.2) and see how the performance changes. This can help you assess the algorithm's robustness. Here are some additional factors to consider: * **The clustering algorithm you're using:** Some algorithms might be more sensitive to noise than others. * **The desired level of difficulty:** Do you want a clear separation for easy evaluation, or a more challenging scenario with some ambiguity? Ultimately, the best noise level depends on your specific goals and the context of your work. It's a good practice to try different noise levels and see how they affect the clustering results.

Es impresionante como ajustando los hiperpar谩metros se pueden identificar perfectamente tanto los 鈥渒鈥 como los outliers.

dbscan_blobs = DBSCAN(eps=0.8, min_samples=4)
y_predict = dbscan_blobs.fit_predict(X)
df_blobs['cluster'] = y_predict
sns.scatterplot(data=df_blobs, x='x1', y='x2', hue='cluster', palette='bright');

Buena clase sencilla y super explicativa !!

De las mejores explicaciones que he encontrado sobre el uso de DBSCAN