The idea of data science is pretty dope to me. I’ve been into automation since ~2007. So once I learned how to code in Python, that was it. I took @Codecademy courses to get an overview. Here’s the way I think about the 9 algorithms they taught:
1. Linear regression. Predict an outcome based on one factor
2. Multiple linear regression. Predict an outcome based on many factors
3. K nearest Neighbors. Classify labeled data. used to solve both classification and regression problems.
4. Naive Bayes. predict the probability of different classes based on various attributes
5. Logistic regression. Classify labeled data. used for classification problems
6. Decision trees. Predicting a category based on many factors. used for classification and regression.
7. Random forests. Predict an outcome given many factors. used for both classification and regression tasks.
8. Perceptron neural net. Binary classifiers that decide class membership by comparing a linear combination of the features to a threshold
9. K means clustering. Classify unlabeled data.
Classification predicts which category an observation belongs to. Regression predicts its numerical value. After going through the course, I think the hard part is knowing which tool is most appropriate in a given situation.
Data science involves coding plus statistical concepts. You “train” a model based on a data set, hoping to uncover patterns so that it can give you reasonably accurate predictions. This could help you detect certain things at scale such as fraud, cancer, fake news, or business opportunities.
My interest would be finding marketing insights, making product recommendations, and predicting customer behavior. Technically, data science, AI, Machine Learning and Deep Learning are all different things but they’re still conceptually related.
For more info, read: Data Science vs. Artificial Intelligence vs. Machine