Quick and Easy Data Analysis in Python

Exploratory Data Analysis is the most important and the very first step in which we discover pattern and trends in the dataset. Today, I am going to show you the easiest and quickest way to do Exploratory Data Analysis with just some lines of code in Python. Exploratory Data Analysis is a process where we tend to analyze the dataset and summarize the main characteristics of the dataset often using visual methods. EDA is really important because if you are not familiar with the dataset you are working on, then you won’t be able to infer something from that data. However, EDA generally takes a lot of time. But today we will learn the fastest way to do EDA.

In this article, we will work on Automating EDA using

  1. ) Sweetviz
  2. ) Pandas Profiling.

These are python library that generates beautiful, high-density visualizations to start your EDA. Let us first explore Sweetviz in detail and later we will move on to Pandas Profiling.

Installing Sweetviz

Like any other python library, we can install Sweetviz by using the pip install command given below.

Analyzing Dataset

In this article,we will be using advertising dataset that contains 4 attributes and 200 rows. First, we need to load the using pandas.

Image for post
Image for post

Sweetviz has a function named Analyze() which analyzes the whole dataset and provides a detailed report with visualization.

Let’s Analyze our dataset using the command given below.

Image for post
Image for post

And here we go, as you can see above our EDA report is ready and contains a lot of information for all the attributes. It’s easy to understand and is prepared in just 3 lines of code.

Now, let us move on to Panda’s Profiling

Installing Pandas Profiling

Like as we did for sweetviz ,we need to install pandas-profiling by using the pip install command given below.

Now lets use this library on a Kaggle data set (cervical cancer risk classification) and walk through the output. Using the below code snippet I have generated a detailed report of the data using the pandas ProfileReport module.

Here is a snapshot of the output:

Image for post
Image for post
Overview of Dataset

As you can see from the snapshot, at one go you get all the important inferences of the data. This is just the Overview Tab. You can dig deeper into each variable’s characteristics by clicking the Variables tab.

Image for post
Image for post
Variables of Dataset

Here we get description of the data and its distribution. This output is given for each variable in the data separately. Next is the Correlations Tab. Five types of correlations are provided for the variables. You can analyse each correlation to understand the relationship between the target and dependent variables.

Image for post
Image for post
Correlations Tab
Image for post
Image for post

Next tab is for Missing values. The missing value analysis is shown in five different output formats. The Count bar chart provides a quick look at the number of missing values for each variable. There is also Matrix, Heatmap, and Dendrogram that provides a nice pictorial representation of all the missing values in the data.

Image for post
Image for post
Missing values Tab
Image for post
Image for post
Heatmap

The last tab in the profile report provides a Sample of the first and last few rows of the data set.

Image for post
Image for post
Sample Tab

Overall both the libraries are excellent and reduces the effort involved in data exploration, as all the key EDA outcomes are part of the profile report. I would suggest to use both libraries to get on the same dataset and compare your results. Based on this report further data exploration can be performed.

Before You Go

Thanks for reading! If you want to get in touch with me, feel free to reach me on fahadpatel1403@gmail.com or my LinkedIn Profile. You can also view the code and data I have used here in my Github.

fahadpatel1403

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store