[–] spandan-madan link

And people like you who take time out to read and learn is exactly the reason why people like me write such articles! Absolutely thrilled that people liked it and that I will be contributing in people learning this beautiful field of science I do research in :)

reply

[–] anantzoid link

Thanks for putting so much time and effort into this. This is definitely not "Yet-another-intro-to-ML".

reply

[–] zintinio5 link

I'm gonna go through it as well, that was a significant amount of work you put in.

reply

[–] allenleein link

Respect!

reply

[–] tekkk link

Reading articles like this written by people who want to share their fabulous domain knowledge for free of charge really is the reason why I read Hacker News. Thank you, i hope i will have the time to read through it all with thought and later hopefully utilize it with my own projects.

reply

[–] fabatka link

Hi! This is really great page, I love reading it. Just a few tips:

The for loops in your code can be made more conscise: instead of

  for i in range(len(movies_with_overviews)):
      movie=movies_with_overviews[i]
you can write

  for movie in movies_with_overviews:
Also, at around In[82], you don't declare Y, but still reference it at the train-test split. Another way to do the train-test split is by using the train-test split in scikit-learn: http://scikit-learn.org/stable/modules/generated/sklearn.mod...

reply

[–] spandan-madan link

To quote one of the greatest professor in ML Pedro Domingos - "First-timers are often surprised by how little time in a machine learning project is spent actually doing machine learning. But it makes sense if you consider how time-consuming it is to gather data, integrate it, clean it and pre-process it, and how much trial and error can go into feature design.....Learning is often the quickest part of this, but that’s because we’ve already mastered it pretty well! Feature engineering is more difficult because it’s domain-specific, while learners can be largely general-purpose."

reply

[–] deepGem link

These are the tutorials that depict the reality of a machine learning career. Everyone broadly understands that data preparation is the key, but few realize what that involves. Half of this tutorial is just about getting and prepping data for training. Kudos!

reply

[–] spandan-madan link

Working on them already! Next one is going to be on Word Embeddings for Natural Language Processing. Basically, how do we convert words and sentences to numbers so that a computer can work with them. Applications like Text classification, sentiment analysis all of them depend on this one single fundamental backbone!

reply

[–] companycalls link

That sounds great! Hope to catch it on here when you post it, thanks again for this tutorial - it's a fantastic resource.

reply

[–] Omnipresent link

This so so helpful. It would take me months to gather resources to learn this stuff and I wouldn't even know what I would be looking for. To the author: please share more content if your valuable time permits

reply

[–] spandan-madan link

Sure, would love to get in touch about my work over mail! What's your email ID Andrew?

reply

[–] AndrewKemendo link

andrew@pair3d.com

reply

[–] AndrewKemendo link

Great write-up. Especially the fact that half of it was about finding cleaning and structuring data! You can tell someone isn't applying ML if they aren't spending most of their time getting their data organized. It's the "sharpening the axe" part of the hour Lincoln describes.

For example, they never introduce you to how you can run the same algorithm on your own dataset

I actually think the tensorflow tutorial on CNNs actually runs through training and classification on your own set with inception pretty well.

You mention you're a CV student. Any particular area of focus?

reply

[–] stevew20 link

I have been searching for exactly this type of tutorial for months. Your explanation of the state of online "10 minute introductions" for machine learning is spot on. I understand the concepts, and have a thorough background in programming, yet there always was a gap in my knowledge base. Thank you for sharing this!

reply

[–] sekasi link

While much of this goes over my head, detailed write-ups like this by people who have no direct way of gaining a financial outcome from all their hard work is the cornerstone of why the internet is fantastic.

Amazing work!

reply

[–] jonheller link

This is wonderful. I just became interested in this subject but had difficult finding resources that weren't simple copy/paste examples, as you mentioned, or semester-long courses. Thank you!

reply

[–] spandan-madan link

Hi!

I couldn't find much, that's why I stressed on it in the tutorial. Scraping is a fun hobby but it's extremely useful. I strongly suggest spending time using python's selenium and beautiful soup libraries. The former is good to automate pages with javascript elements, and the latter to parse HTML!

reply

[–] praveer13 link

Are there more great resources like this to learn finding, cleaning and structuring data? Would greatly appreciate it if someone could point me in a direction.

reply

[–] ireadfaces link

I saw this tutorial by you somewhere Spandan, and found it here on HN. I am yet to explore it but I have marked your GIT repo already. Thanks for the hard work.

reply

[–] allpratik link

Spandan, this is fantastic and detailed write up. Kudos! And thanks for investing your time to do this!

reply

[–] mcintyre1994 link

This looks amazing, thankyou for sharing! :)

reply

[–] code4tee link

Very nice work. Thanks for sharing.

reply

[–] craptocurrency link

Amazing piece of work

reply