For all of the details, you can get a paid Pro subscription plan from Codecademy.
Here are the basics: Clustering is the most well-known unsupervised learning technique. It finds structure in unlabeled data by identifying similar groups, or clusters. It’s an Unsupervised method of Machine Learning, which means that your data isn’t labeled beforehand.
You simply decide how many groups you’d like to split the data into (n_clusters = 3) and go from there. Setting this up correctly in Python along with all the modules (a PYTHONPATH issue) could be another blog in itself (see Stack Overflow for help), but the basic code looks something like this:
from sklearn.cluster import KMeans
import pandas as pd #structures your data as a spreadsheet
import numpy as np #handles the arrays created by your code
model = KMeans(n_clusters = 3)
labels = model.predict(qds)
np.savetxt(‘file_2’, matrix, delimiter=”,”)
In this example, I put all of my data in a csv file with unlabeled columns (each cell was marked 0 or 1). Then once the code runs, it spits out which of the three group numbers (0, 1, 2) that each row belongs to.
It’s an interesting way to classify large chunks of data, and I’m just getting started. More to come soon.