2017-08-28 22:11:00+02:00
Short Review of Jose Portilla's Data Science Course
When I started to get interested in the field of Data Science I found Jose Portilla's course on Udemy titled Python for Data Science and Machine Learning Bootcamp. Since I already knew some Python I decided to purchase the course during a sale that Udemy was running (I have know learned that courses on Udemy are perpetually on "sale"). Although it's been a while since I've completed the course, I thought it might be good to give a quick review on the course that served as my introduction to the field.
Jose Portilla presents the course as an equivalent to the data science bootcamps that are often offered for thousands of dollars. In the course description he lists an expansive array of topics being covered over the 21.5 hours of lecture videos. Here's some of things that are covered:
- Programming with Python
- NumPy with Python
- Using pandas Data Frames to solve complex tasks
- Use pandas to handle Excel Files
- Web scraping with python
- Connect Python to SQL
- Use matplotlib and seaborn for data visualizations
- Use plotly for interactive visualizations
- Machine Learning with SciKit Learn
- Linear Regression
- K Nearest Neighbors
- K Means Clustering
- Decision Trees
- Random Forests
- Natural Language Processing
- Neural Nets and Deep Learning
- Support Vector Machines
Having completed the entire course I can say he definitely touches on all these topics in his videos. The course serves as an excellent introduction to the practical parts of Data Science. After completing the course you're able to tackle the datasets on Kaggle or anywhere. Throughout the course Jose Portilla takes you from having no data science experience to being able to run some ML algorithms on your data. He even goes over using Spark to handle really big datasets.
However, this course really only serves as a practical introduction to Data Science. It takes you through the steps on how to clean, visualize, and analyze the datasets he uses but what if you're data isn't like his? What if you're not sure what type of ML algorithm would be best to use? What if your data comes from multiple sources? What if your features have a 1000 unique values, how should you deal with them? The course doesn't really address these types of broader questions, but it doesn't have to. What's important is that you after completing the course you can start playing and dealing with data. But I think a great deal of practice and learning afterwards is necessary before you start really doing anything meaningful. Nonetheless, it's an excellent course for those starting out, in my limited opinion.