Types and Tailcalls

Ultralearning Data Science - Week 5

published on November 24th, 2019

How the Fifth Week Went

In the fifth week, it felt like I was finally getting some traction in my ultralearning project. I continued working on the Rossmann store sales challenge and finally improved my predictions significantly by stacking the random forest prediction on top of linearly predicting the average sales for each store. While working on this, I was missing some features and analyzed the residuals that an intermediate model was producing and got the right initution. It was also nice to see that the separation between my training set and validation set kept working and prediction results on the validation set were a solid predictor for the performance on the leaderboard. It certainly feels like I am finally making some progress here!

Another thing that I'm happy with is that I've been more serious with doing open recall on the lectures I see and have writen and reviewed Anki cards. This makes the whole thing seem more serious and I'm happy that I'm learning and memorizing concepts which were slightly fuzzy before.

This week, I've also attended one of the local data science meetups, the Munich Datageeks. This was a great event and I very much enjoyed the community and having discussions around data science. I must say that doing data sciency things felt a lot more attainable after the meetup and it was certainly very motivating to take part in this meetup.

The one thing I have fallen short of was to publish the exploratory data analysis for an ongoing Kaggle challenge. It's not that I haven't done a thing, I've looked at multiple ongoing Kaggle challenges and started to explore the data, but it's not yet at a level where I am comforable with posting it publicly.

Reviewing Goals for Week 5:

Looking back at my goals for week 5, here are the results:

  1. Try to improve the random forest models in the Rossmann challenge by
    • [done] predicting the log sales instead of sales
    • [done] simplifying the random forest models by throwing out correlated features
    • [done] doing some amount of feature engineering as pointed out by the Rossmann challenge winner
    • [done] giving XGBoost another shot on the Rossmann data
  2. [in progress] Do exploratory data analysis on a new Kaggle challenge and post it publicly
  3. [done] Watch lectures 7 and 8 of fast.ai and do open recall on them
  4. [done] Write Anki Cards

My Goals for Week 6

In the upcomming week, I have a few social committments, which will impact the amount of time that I'm able to dedicated to learning. I'm aiming for a similar goal as last week.

  1. Complete the EDA for the data science bowl 2019 Kaggle challenge.
  2. Watch lectures 9 and 10 of fast.ai.
  3. Build my own neural network based on the content of the fast.ai lectures.
  4. Review fast.ai lectures so far and write Anki cards.
  5. Review lecture notes from the ML coursera course by Andrew Ng and write Anki cards.

comments powered by Disqus