Beginners’ Roadmap to Machine Learning

09 Jan 2023

“What would you recommend for an absolute beginner?”

That was the question asked, as a follow up, to my last post on book recommendations for turbocharging machine learning technical growth.

To start with, there are two clarifications to be made: (a) there are no absolute beginners and (b) though related, machine learning (ML) is different from data analysis.

Entrants into ML can be broadly classified into three: people coming from solid quantitative sciences (Statistics, Mathematics, Physics, etc.); people from software development and engineering, and other science and non-science backgrounds who want to use ML as a tool to drive values in their fields(Bioinformatics, Geospatial data science, etc.). They are all coming from something. Not nothing.

Machine learning aims to predict (in most situations) or prescribe for future events, based on patterns learnt from past events. Data analysis aims to provide actionable insights based on in-depth analysis of past events. The key skills needed are transferable, but the core differentiating factor, i.e. “ability to predict” makes ML to need, arguably, more challenging technical skills. But this is not an authoritative take.

Coming to ML, I would advise the following steps:

Brief introduction to its terms and terminologies - This can run concurrently with (b) below. Because these terms can be endless, it is advisable to just pause and search to read about a newly-encountered term. For starters, you must know the difference between supervised and unsupervised ML, train-test split, cross-validation, high-level information about a number of models used for classification (Logistic regression, Support Vector Machine), regression analysis (Linear regression), and clustering (K-Means).
Python programming - Since your implementation would likely be in Python (apologies to R enthusiasts), a comfortable level with the language is important. You should be striving towards the ability to write decent structured programs with defined functions. I would recommend Mosh’s Intro to Python Youtube video
Data analysis using Python - This would introduce two popular libraries for data cleaning and wrangling - Numpy and Pandas. Since real world data is always noisy and messy, understanding how to take “dirty” data to one suitable for ML modeling is an unavoidable skill. I would recommend Jose Portilla’s Udemy course or his ML Master class.
Work on the “Hello World” ML projects - Projects like Titanic, MNIST and the Iris datasets are what I have seen called “Hello World” of ML projects. These are not projects to be listed on the resume. But they would definitely help in internalizing the lessons being learnt in (a) - (c). Variants of tutorials for these projects are also on YouTube.
Personal project - With all the above under your belt, you should be able to creatively find a problem of your own personal interest and bring it into ML modeling. If you’re from Ibadan, Nigeria, like me, you may want to build an ML model that identifies if someone wears a tribal mark or not. Yeah. A project you will likely curate the dataset, treat it and train your model with it.

Remember: this is a marathon, not a sprint. The willingness and readiness to continously upskill is important for the long haul.