Discussing the article: "MQL5 Wizard Techniques you should know (Part 09). Pairing K-Means Clustering with Fractal Waves"

 

Check out the new article: MQL5 Wizard Techniques you should know (Part 09). Pairing K-Means Clustering with Fractal Waves.

K-Means clustering takes the approach to grouping data points as a process that’s initially focused on the macro view of a data set that uses random generated cluster centroids before zooming in and adjusting these centroids to accurately represent the data set. We will look at this and exploit a few of its use cases.

By default, k-means is very slow and inefficient in fact, that’s why it is often referred to as naïve k-means, with the ‘naïve’ implying there are quicker implementations. Part of this drudgery stems from the random assignment of the initial centroids to the data set during the start of the optimization. In addition, after the random centroids have been selected, Lloyd’s algorithm is often employed to arrive at the correct centroid and therefore category values. There are supplements & alternatives to Lloyd’s algorithm and these include: Jenks’ Natural Breaks which focuses on cluster mean rather than distance to chosen centroids; k-medians which as the name suggests uses cluster median and not centroid or mean, as the proxy in guiding towards the ideal classification; k-medoids that uses actual data points within each cluster as a potential centroid thereby being more robust against noise and outliers, as per Wikipedia; and finally fuzzy mode clustering where the cluster boundaries are not clear cut and data points can and do tend to belong to more than one cluster. This last format is interesting because rather than ‘classify’ each data point, a regressive weight is assigned that quantifies by how much a given data point belongs to each of the applicable clusters.

bannr

Our objective for this article will be to showcase one more type of k-means implementation that is touted to be more efficient and that is k-means++. This algorithm relies on Lloyd’s methods like the default naïve k-means but it differs in the initial approach towards the selection of random centroids. This approach is not as ‘random’ as the naïve k-means and because of this, it tends to converge much faster and more efficiently than the latter.

Author: Stephen Njuki

Reason: