Notifications
Clear all

Which mathematical concepts are most important for learning machine learning?

3 Posts
4 Users
0 Reactions
225 Views
0
Topic starter

Im so tired of looking at these roadmap blogs because they all say something different and I am honestly getting super overwhelmed. I read online that I need like a full degree in math to even touch scikit-learn but then some other guy on reddit said you just need to know basic high school algebra and you can just call libraries and forget the rest. I am trying to build a basic churn prediction model for this small non-profit I am volunteering for in Chicago and my boss wants it done in about 8 weeks so I do not have time to go back to college for three years.

My constraints:

  • timeline: got about 2 months total
  • budget: literally zero dollars, looking for free stuff if possible
  • current level: took calc 1 like five years ago and forgot most of it
  • use case: mostly tabular data, maybe some basic neural nets later if I have time

Is it just linear algebra? I keep seeing stuff about eigenvalues and partial derivatives but do I actually need to solve those by hand or just know what they do to the weights? I am just trying to figure out what the bare minimum is so I dont waste time on theoretical stuff that doesnt help me code. Like how much stats is actually required vs just being able to read a graph? I feel like I am drowning in jargon right now...


3 Answers
11

You can absolutely crush this churn model in two months! Honestly, the people saying you need a math degree are totally gatekeeping. I started exactly where you are and found that focusing on practical application is way more efficient. Since you are working with tabular data for a non-profit, you can skip the deep theory for now and focus on what actually moves the needle. Here is the bare minimum you need to master:

  • Linear Algebra Basics: You just need to understand what a matrix and a vector are. Your dataset is basically just a big matrix! You dont need to do the math yourself, but you should know how dimensions work so you dont get shape errors in your code.
  • Probability and Stats: This is actually the most important part for churn. You gotta understand stuff like mean, variance, and especially evaluation metrics like the F1-score. Since churn is usually imbalanced, accuracy is a total trap and will lead you astray!
  • Calculus: Just get the vibe of what a gradient is. It is basically just a compass telling the model which way to go to find the lowest error. You never have to solve these by hand. I used O'Reilly Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow 3rd Edition to get my first project running and it was amazing! It focuses on coding first which I love. For a free resource, look up the Coursera Machine Learning Specialization by Andrew Ng on YouTube. You dont need to pay for the certificate to learn the content, and his explanations are fantastic!


10

Adding my two cents here because Ive seen too many people dive into the deep end of theory and never actually ship a model. In my experience, for a 2-month window, you need to be extremely picky about what you study. If you waste time on manual derivatives, you will miss your deadline. Ive tried many ways to train juniors over the years, and these are the paths Id weigh:

  • Fast.ai Practical Deep Learning for Coders

Pros: You get results fast. It teaches top-down, so you see the code first and math only when it breaks.
Cons: Can feel like magic at first, might leave some gaps in why things work.

  • StatQuest with Josh Starmer YouTube Channel

Pros: Best for intuition. He explains stuff like Logistic Regression and Random Forests without the scary notation.
Cons: Sometimes a bit slow if youre in a huge rush to code right now.

  • Khan Academy Statistics and Probability Course

Pros: Essential for churn. You need to understand distributions and p-values to know if your results are real or just noise.
Cons: Traditional academic style, can be a grind. Honestly, just focus on matrix shapes. If the dimensions of your data match, the library usually handles the rest. Skip the eigenvalues for now... you dont need them to help a non-profit. Stick to the stats so you dont give your boss bad data.


3

I think basic probability and some statistics are the real MVPs here. Not totally sure, but I've heard multivariate calculus is only really crucial if you're building stuff from scratch.


Share: