Review: Applied Data Science with Python Specialization by Michigan
Every profession can benefit from being more data-driven. We are on a rampage to check out the most popular online data science courses to help our users find the best fits. Whether you're an aspiring data scientist, or maybe an analyst looking to upskill with stronger data-science knowledge, we have your back. This week we reviewed University of Michigan's Applied Data Science with Python Specialization. We liked it for several reasons, but will give you all the information relevant to help you make the best choice for your circumstances.
The course is designed for beginner-to-intermediate python coders and will help students become proficient in statistical calculations, machine learning concepts, information visualization, text analysis, and social network analysis. By the end of the program, you'll be familiar with popular python toolkits including pandas, matplotlib, scikit-learn, nltk, and networkx.
The three courses to complete the specialization are:
- Introduction to Data Science with Python
- Applied Plotting, Charting & Data Representation in Python
- Applied Machine Learning in Python
- Applied Text Mining in Python
- Applied Social Network Analysis in Python
The material is taught by Christopher Brooks, a professor at U of M and Director of their Learning Analytics and Research program.
After getting familiar with the program, we can confidently recommend this specialization for someone who is either an established analyst and somewhat familiar with the material, or someone who is interested in auditing the class and does not plan to complete the assignments. There are lots of options for data-science credentials, and we didn't love this material as much as we had hoped.
The primary reason we say this is because this course is from 2016 and (from what we can tell) has not been updated. This is a rather large problem, because the language and toolkits have been updated over the years, so much of the syntax is obsolete. Many of the more recent reviews echo our sentiments.
The curriculum is meant to be consumed over a 20-week period, but it can also be completed at your own pace.
We’ve recapped the learning objectives from each week to set your expectations for course material. The great part about this program is that you can jump to any course, and any section if it’s interesting to you. For example, if you’re an established data scientist only looking for text mining concepts, skip to course 4 weeks 2-3. However, you only can get the certificate if you complete all 20 weeks of content.
To audit an individual week-- find the exact course (we've linked them individually here) and click "audit" to save it to your profile. Then open the desired week on the side panel that aligns with our recaps.
Learning Objectives from Week 1: Fundamentals of Data Manipulation with Python
- Use numpy to load, manipulate, and select data, and understand numpy data types.
- Use numpy to show the benefits of vectorization.
- Use regular expressions to work with string data.
- Explain how regular expression pattern matching works.
Learning Objectives from Week 2: Basic Data Processing with Pandas
- Explain how the series class builds on numpy datatypes, and remember that the series class is a numpy array.
- Demonstrate the basics of querying a series structure, and remember that the dataframe is a key data science structure.
- Define the features of a dataframe, demonstrate what an axis is in relation to dataframes, and demonstrate the difference between the series and dataframe classes.
- Use pandas DataFrames to represent raw data.
Learning Objectives from Week 3: More Data Processing with Pandas
- Continue using pandas DataFrames to represent raw data.
Learning Objectives from Week 4: Answering Questions with Messy Data
- Learn basic statistical test knowledge on DataFrames in pandas.
- Recognize other kinds of structured data such as networks, graphs, natural languageHide Learning ObjectivesBeyond Data Manipulation.
Learning Objectives from Week 1: Principles of Information Visualization
- Practice identifying graphics that contain misleading information.
- Identify mechanisms used to trick viewers in visualizations.
- Use principles from Alberto Cairo to explain how graphics can be misleading.
- Create a radar plot to assess complexity of misleading graphic.
Learning Objectives from Week 2: Basic Charting
- Work with real CSV data to create different charts in Matplotlib
- Demonstrate procedure of composite charts
Learning Objectives from Week 3: Charting Fundamentals
- Develop advanced features using basic features (i.e artists).
- Create new visualizations by expanding matplotlib codebase and using subplots.
- Develop interactive and animated visualizations such as histograms, box plots, and histograms.
Learning Objectives from Week 4: Applied Visualizations
- Identify two compatible datasets and brainstorm a research question that can be answered from their overlap.
- Create real visuals in matplotlib to address your chosen research question.
- Write a presentation
Learning Objectives from Week 1: Fundamentals of Machine Learning - Intro to SciKit Learn
- Learn basic machine learning concepts and workflows, including the different types of ML tasks and how they interact with real-world problems.
- Understand how a basic classification algorithm (k-nearest neighbors) can learn and make predictions.
- Build and evaluate a basic k-nearest neighbors classifier on an example dataset using Python and scikit-learn.
Learning Objectives from Week 2: Supervised Machine Learning
- Understand estimation and prediction in linear model-based supervised learning algorithms.
- Understand strengths and weaknesses of supervised learning methods to choose the right algorithm for a task.
- Apply supervised machine learning algorithms in Python using scikit-learn and understand general principles, techniques such as regularization, feature scaling, and cross-validation to avoid overfitting or underfitting.
Learning Objectives from Week 3: Evaluation
- Learn why accuracy is not always sufficient to evaluate the performance of a classifier.
- Understand various evaluation metrics in machine learning, and how to interpret results when using them.
- Optimize machine learning algorithm by choosing appropriate evaluation metric for the task.
Learning Objectives from Week 4: More Supervised Machine Learning
- Understand parameter estimation and prediction in decision tree and neural network-based supervised learning algorithms.
- Apply appropriate algorithm for a task by understanding strengths and weaknesses of additional supervised learning methods.
- Apply additional types of supervised machine learning algorithms in Python using scikit-learn, and recognize and avoid data leakage.
Learning Objectives from Week 1: Working with Text in Python
- Practice interpreting text in terms of basic building blocks: sentences and words.
- Identify common problems with raw text and perform textual cleaning tasks in Python.
- Write regular expressions (RegEx) to find textual patterns.
Learning Objectives from Week 2: Basic Natural Language Processing
- Learn different natural language tasks and process free text through the NLTK toolkit to tag language constructs onto text.
- Derive meaningful features out of text.
Learning Objectives from Week 3: Classification of Text
- Compare text classification to other classification approaches, including Naive Bayes and Support Vector Machine algorithms.
- Practice classifying text in two classes by using one of these approaches in Python.
- Identify and extract features from text and work to transform them into feature vectors for the machine learning model.
Learning Objectives from Week 4: Topic Modeling
- List and describe techniques for named entity recognition and other various information extraction tasks.
- Learn to apply WordNet-based similarity measures on text and derive semantic topics from a large text collection.
Learning Objectives from Week 1: Why Study Networks & Basics on NetworkX
- Recognize and categorize real-world networks.
- Identify applications and important questions that network science can help answer.
- Determine appropriate type of network to model real networked data.
- Construct and manipulate networks of different types using NetworkX, including bipartite graph and related algorithms such as graph projections
Learning Objectives from Week 2: Network Connectivity
- Describe how distance measures can be used to identify central and peripheral nodes in networks and use NetworkX to implement them.
- Identify connected components in directed and undirected graphs, and use NetworkX to find them.
- Understand different types of network attacks, and use NetworkX to measure clustering and robustness of networks, including node and edge removal attacks.
Learning Objectives from Week 3: Influence Measures and Network Centralization
- Understand different network centrality measures, and apply them in NetworkX.
- Describe differences and similarities between centrality measures.
- Use centrality measures for real-world applications and analyze real-world networks.
Learning Objectives from Week 4: Multiple Regression
- Understand degree distribution of a network and use NetworkX to visualize it.
- Recognize properties of real-world networks, such as power law degree distribution, high clustering and small average shortest paths.
- Understand the mechanics of network generation models, such as Preferential Attachment and Small World Models and their properties, and use NetworkX for link prediction and node feature creation in real-world networks.
This program heavily focuses on R, which is primarily a data scientist's tool. If you're looking for web analytics, there are other programs (view our alternative course recommendations) that may be more suited for your needs.
There's an argument against R for Python-- most companies will accept either, and many of the concepts are interchangeable beyond the syntax. Maybe try auditing first to figure out what program is best for you!
Cost and Auditing
The program is only $49/month, and comes with a Linkedin Certificate on behalf of Duke University (remember– this is a prestigious place!!). If you complete the curriculum on the proposed timeline, it should take about 4-5 months, though you could blitz through it on a break in far less. While that seems steep, compared to a degree or bootcamp this micro-certification is a steal!
If you have a learning budget, or are dedicated to upskilling your career with a data-focus– we recommend paying for and completing the program to get the shareable certificate (GET RECEIPTS!). This will help make your Linkedin more searchable to recruiters who may be looking for specific keywords and programs.
To audit the program and simply learn the material, this program is completely free! Thanks Coursera!
This program has been around for several years and each course has more than a thousand ratings. The most popular by far is Introduction to Data Science with Python (Course 1), with 26k reviews and a rating of 4.5. The second most popular is Applied Machine Learning (Course 3), which boasts more than 8k reviews and a 4.6 rating average.
We noticed however, lots of the material is challenging even for established data scientists and the assignments were incredibly complicated.
Some of our favorite positive review points:
- Awesome course. The lectures are very good. One thing that I liked a lot is that they taught regular expression which is very helpful in processing strings. Also, the assignments are very tough and if you can solve them correctly (not hard coding, and by yourself), it will boost your confidence a lot. And please remember this course is not for beginners. If you have basic knowledge of Python (list, dictionary, function, list comprehension, etc), it won't be that tough.
- Excellent tour through the basic terminology and key metrics of Graphs, with a lot of help from the networkX library that simplifies many, otherwise tough, tasks, calculations and processes.
Aggregations of negative review points:
- The version of pandas used in this course is 0.19.2, which is not convenient for using some functions.
- The materials are useful, but the tools are outdated and the lectures contain numerous mistakes.
- What is taught in the lessons and what is asked in the assignments have nothing to do with it.
- Assignments are too difficult and auto-grader is tough. (Recommended workaround: peak at the discussion forums and use Google)
... and our favorite overall review:
Great introduction to representing and manipulating data with python pandas series and dataframes. Lectures are interesting and clearly presented with interactive examples in jupyter notebooks. The last two assignments are quite tricky as the hard part is cleaning and preprocessing real-world data, obtained from wikipedia etc., in dataframe form, somtimes using techniques not explicitly covered in lectures so some searching and self-learning is required. However, this is the core learning experience of the course as it reflects the messiness of data analysis in real world situations.
While we love Coursera as a hosting platform, sometimes Udemy rocks with their selection of materials and deep dives.
The biggest complaint about this course from U of M is that it uses outdated python modules that will not translate to the real world. While it helps to be able to problem solve, it also helps to know the right syntax. This python programming class has more than 200 exercises and focuses almost exclusively on Python 3. The material updates frequently to remain cutting-edge.
This course can be included in Udemy's Personal Plan, which costs about $16/month after a 7-day trial. Otherwise it's a straight $120 to complete at your own pace. We recommend signing up for the plan to see if this course is right for you (and then cancelling if not!).
Stanford University does a baller specialization in machine learning through Coursera. This tried-and-true program has been around since 2012, and is taught by one of the most revered AI professionals in the industry. Read our full writeup here. It's free to audit, but if you want the certificate (recommended), it's $49 a month to complete at your own pace.
Best Alternative Analytics Specializations
Google also sponsors a data analytics certificate program through Coursera. This is one of the more coveted certificates in the industry for learning the Google Analytics tool specifically, hence our recommendation of this course from Duke. Google's course is also free to audit, but same rules apply if you want the certificate to show off at $49 a month.
If you're looking to learn and have some time, we LOVED the format of this Udemy program. We weren't alone-- the course has more than 150,000 reviews and averaged a 4.7 rating.
The program focuses on continuous improvement, rather than cramming in curriculum. It breaks out each day, literally 100 of them, into bite-sized chunks of python to keep the learning experience from being overwhelming. They had an impressive roster of clients too, including Volvo, Box and Evenbrite.
We recommend completing this program in tandem with another certification like U of M's (for street cred, err, recruiter search keywords) to really dig deep into the material. This course also only costs $16/month, or $85 to complete at your own pace.
The University of Michigan's Data Science program is a great credential for job seekers and mid-level python users, but the content might be a little outdated for beginners looking to really sink their teeth into programming with Python.
If you're looking for a beginner-level python class, this may not be the certification for you. We'd recommend checking out one of our recommended alternative classes and deciding what structure works best for you. BUT, if you're looking to one-up your thinking around data science and mathematics, this course would certainly be a valuable asset!
Here at Bridged we are huge fans of stacking micro-certifications to achieve desired career results. This program could be one notch in your arsenal to really kick your technical expertise into gear!