Im so tired of looking at these roadmap blogs because they all say something different and I am honestly getting super overwhelmed. I read online that I need like a full degree in math to even touch scikit-learn but then some other guy on reddit said you just need to know basic high school algebra and you can just call libraries and forget the rest. I am trying to build a basic churn prediction model for this small non-profit I am volunteering for in Chicago and my boss wants it done in about 8 weeks so I do not have time to go back to college for three years.
My constraints:
Is it just linear algebra? I keep seeing stuff about eigenvalues and partial derivatives but do I actually need to solve those by hand or just know what they do to the weights? I am just trying to figure out what the bare minimum is so I dont waste time on theoretical stuff that doesnt help me code. Like how much stats is actually required vs just being able to read a graph? I feel like I am drowning in jargon right now...
You can absolutely crush this churn model in two months! Honestly, the people saying you need a math degree are totally gatekeeping. I started exactly where you are and found that focusing on practical application is way more efficient. Since you are working with tabular data for a non-profit, you can skip the deep theory for now and focus on what actually moves the needle. Here is the bare minimum you need to master:
Adding my two cents here because Ive seen too many people dive into the deep end of theory and never actually ship a model. In my experience, for a 2-month window, you need to be extremely picky about what you study. If you waste time on manual derivatives, you will miss your deadline. Ive tried many ways to train juniors over the years, and these are the paths Id weigh:
Pros: You get results fast. It teaches top-down, so you see the code first and math only when it breaks.
Cons: Can feel like magic at first, might leave some gaps in why things work.
Pros: Best for intuition. He explains stuff like Logistic Regression and Random Forests without the scary notation.
Cons: Sometimes a bit slow if youre in a huge rush to code right now.
Pros: Essential for churn. You need to understand distributions and p-values to know if your results are real or just noise.
Cons: Traditional academic style, can be a grind. Honestly, just focus on matrix shapes. If the dimensions of your data match, the library usually handles the rest. Skip the eigenvalues for now... you dont need them to help a non-profit. Stick to the stats so you dont give your boss bad data.
I think basic probability and some statistics are the real MVPs here. Not totally sure, but I've heard multivariate calculus is only really crucial if you're building stuff from scratch.