descriptive package
Submodules
descriptive.pyeda module
- descriptive.pyeda.display_column_types(data)[source]
Separate numerical columns and categorical columns.
- Parameters:
data – variable
- Returns:
list of numerical and categorical feature names
- descriptive.pyeda.display_dataset_detail(data)[source]
Print dataset details.
- Parameters:
data – variable
- Returns:
dataset details
- descriptive.pyeda.display_dataset_info(data) None[source]
Print dataset info.
- Parameters:
data – variable
- Returns:
total rows and columns
- descriptive.pyeda.display_describe_data(data) None[source]
Calculate basic statistics for the whole data.
- Parameters:
data – variable
- Returns:
prints out describe()
- descriptive.pyeda.display_summary_data(data)[source]
The function prints out a summary table of columns.
number of unique.
Null values.
Null Percentage.
DataType.
- Parameters:
data – variable
- Returns:
Summarize columns
- descriptive.pyeda.import_dataset(file_name: str)[source]
Read cvs data file.
- Parameters:
file_name – string contain the csv file name
- Returns:
pandas dataframe
- descriptive.pyeda.read_dataset(data)[source]
Reading data from a variable.
- Parameters:
data – variable
- Returns:
data
- descriptive.pyeda.save_data_to_csv_file(data, filename: str)[source]
Save data to a csv file.
- Parameters:
data – variable
filename – string
- Returns:
updated csv data file
- descriptive.pyeda.select_categorical_variables(data) list[source]
Selecting categorical variables.
- Parameters:
data – variable
- Returns:
all categorical features in a dataset
- descriptive.pyeda.select_numeric_variables(data) list[source]
Selecting numerical variables.
- Parameters:
data – variable
- Returns:
all numeric features in a dataset
- descriptive.pyeda.vis_advanced_stack_bar(data, first_categorical: str, second_categorical: str, third_categorical: str, title: str = 'Add Chart Title', subtitle: str = 'explain ur data viz by subtitle')[source]
Visualize percentage relationship using three categorical variables.
Add new title and subtitle content as a string.
- Parameters:
data – variable
first_categorical – str
second_categorical – str
third_categorical – str
title – str
subtitle – str
- Returns:
stacked bar plot
- descriptive.pyeda.vis_heatmap(data)[source]
Visualize the correlation between multiple numeric columns.
- Parameters:
data – variable
- Returns:
heatmap
- descriptive.pyeda.vis_highest_percentage_datapoints(data, first_categorical_col: str, second_categorical_col: str, title: str = 'add chart title', subtitle: str = 'explain ur data viz by subtitle ')[source]
Visualize the highest percentage of datapoint values,
for categorical variable grouped by second categorical variable.
Add new title and subtitle content as a string.
- Parameters:
data – variable
first_categorical_col – str
second_categorical_col – str
title – str
subtitle – str
- Returns:
count plot
- descriptive.pyeda.vis_lowest_percentage_datapoints(data, first_categorical: str, second_categorical: str, title: str = 'add chart title', subtitle: str = 'explain ur data viz by subtitle ')[source]
Visualize the lowest percentage of datapoint values,
for categorical variable grouped by second categorical variable.
Add new title and subtitle content as a string.
- Parameters:
data – variable
first_categorical – str
second_categorical – str
title – str
subtitle – str
- Returns:
count plot
- descriptive.pyeda.vis_the_highest_label_pie_chart(data, categorical_col: str, numerical_col: str, title: str = 'add chart title', subtitle: str = 'explain ur data viz by subtitle')[source]
Visualize the highest label in pie chart.
Add new title and subtitle content as a string.
- Parameters:
data – variable
categorical_col – str
numerical_col – str
title – str
subtitle – str
- Returns:
pie chart
- descriptive.pyeda.vis_top_highest_average(data_frame, categorical_column: list, numerical_column: str, avg_numbers: list, title: str = 'add chart title', subtitle: str = 'Explain ur data viz by subtitle ')[source]
Visualize the highest average values.
add avg_numbers to determine which bars to be colored.
Add new title and subtitle content as a string.
- Parameters:
data_frame – variable
categorical_column – list
numerical_column – str
avg_numbers – list
title – str
subtitle – str
- Returns:
bar chart
- descriptive.pyeda.vis_top_ten_values(data, first_column: str, by_second_column: str, color_bar: list, title: str = 'explain ur data viz by subtitle', subtitle: str = 'explain ur data viz by subtitle')[source]
Visualize top ten values, add color_bar as integer list to determine which bars to be colored.
Add new title and subtitle content as a string.
- Parameters:
data – Variable
first_column – str
by_second_column – str
color_bar – list
title – str
subtitle – str
- Returns:
Bar plot
- descriptive.pyeda.visualize_advanced_bar_plot(data, categorical_col: str, numerical_col: str, second_categorical_col: str, title: str = 'add chart title', subtitle: str = 'Explain ur data viz by subtitle ')[source]
Summarize two categorical columns by numerical column .
Add new title and subtitle content as a string.
- Parameters:
data – variable
categorical_col – str
numerical_col – str
second_categorical_col – str
title – str
subtitle – str
- Returns:
bar chart
- descriptive.pyeda.visualize_advanced_kde(data, first_numeric: str, second_numeric: str, categorical_col: str, title: str = 'Add Chart Title', subtitle: str = 'Explain ur data viz by subtitle')[source]
Visualize the kernel density estimate of a numeric columns by categorical column.
- Parameters:
data – variable
first_numeric – str
second_numeric – str
categorical_col – str
title – str
subtitle – str
- Returns:
KDE Plot
- descriptive.pyeda.visualize_advanced_scatter_plot(data, first_numeric: str, second_numeric: str, categorical_col: str, title: str = 'Add Chart Title', subtitle: str = 'Explain ur data viz by subtitle')[source]
Vis the relationship between two numeric variable’s by a third categorical variable, to dictate the color of data point’s.
Add new title and subtitle content as a string.
- Parameters:
data – Data frame variable
first_numeric – str
second_numeric – str
categorical_col – str
title – str
subtitle – str
- Returns:
Advanced scatter plot
- descriptive.pyeda.visualize_basic_bar_plot(data, numerical_col: str, categorical_col: str, title: str = 'Add chart title ', subtitle: str = 'Explain ur data viz by subtitle ')[source]
Visualize the mean of a numeric column by the categories of a categorical column.
Add new Title and subtitle content to describe your chart.
- Parameters:
data – variable
numerical_col – str
categorical_col – str
title – str
subtitle – str
- Returns:
Bar chart
- descriptive.pyeda.visualize_basic_kde(data, first_numeric: str, title: str = 'Add Chart Title', subtitle: str = 'Explain ur data viz by subtitle')[source]
Visualize the kernel density estimate of a numeric column.
- Parameters:
data – variable
first_numeric – str
title – str
subtitle – str
- Returns:
KDE Plot
- descriptive.pyeda.visualize_basic_scatter_plot(data, first_column: str, second_column: str, title: str = 'Add Chart Title', subtitle: str = 'Explain ur data viz by subtitle')[source]
Visualize the relationship between two numeric columns.
Add new title and subtitle content as a string.
- Parameters:
data – variable
first_column – str
second_column – str
title – str
subtitle – str
- Returns:
Basic scatter plot
- descriptive.pyeda.visualize_boxplot(data, numeric_column: str, categorical_column: str, title: str = 'Add Chart Title', subtitle: str = 'Explain ur data viz by subtitle')[source]
Visualize the distribution of a numeric column by the categories of a categorical column.
Add new title and subtitle content as a string.
- Parameters:
data – variable
numeric_column – str
categorical_column – str
title – str
subtitle – str
- Returns:
Box Plot
- descriptive.pyeda.visualize_causation(data, first_numeric: str, second_numeric: str, category_col: str, title: str = 'add chart title', subtitle: str = 'explain ur data viz by subtitle ')[source]
Visualize the relationship between two numeric column by a categorical column.
Add new title and subtitle content as a string.
- Parameters:
data – variable
first_numeric – str
second_numeric – str
category_col – str
title – str
subtitle – str
- Returns:
regression plot
- descriptive.pyeda.visualize_countplot(data, first_categorical_col: str, second_categorical_col: str, title: str = 'add chart title', subtitle: str = 'explain ur data viz by subtitle ')[source]
Visualize the count of a categorical column by the categories of another categorical column.
Add new title and subtitle content as a string.
- Parameters:
data – variable
first_categorical_col – str
second_categorical_col – str
title – str
subtitle – str
- Returns:
count plot
- descriptive.pyeda.visualize_distribution_of_categorical_col(data_frame, column_name: str)[source]
Visualize the distribution of a categorical column.
- Parameters:
data_frame – variable
column_name – str
- Returns:
histogram, and pie charts
- descriptive.pyeda.visualize_distribution_of_numeric_col(data_frame, column_name: str, bins: int) None[source]
Visualize the distribution of a numeric column.
- Parameters:
data_frame – variable
column_name – str
bins – int
- Returns:
Histogram, boxplot, q-q plot, skewness and kurtosis values
- descriptive.pyeda.visualize_kde(data, first_numeric: str, categorical_col: str, title: str = 'Add Chart Title', subtitle: str = 'Explain ur data viz by subtitle')[source]
Visualize the kernel density estimate of a numeric column per categorical column.
- Parameters:
data – variable
first_numeric – str
categorical_col – str
title – str
subtitle – str
- Returns:
KDE Plot
- descriptive.pyeda.visualize_line_plot(data, first_numeric: str, second_numeric: str, categorical_col: str, title: str = 'Add Chart Title')[source]
Visualize the relationship between two numeric columns and the categories of a categorical column.
Add title to chart as a string.
- Parameters:
data – variable
first_numeric – str
second_numeric – str
categorical_col – str
title – str
- Returns:
lm plot
- descriptive.pyeda.visualize_linear_regression(data, first_numeric: str, second_numeric: str, title: str = 'add chart title', subtitle: str = 'explain ur data viz by subtitle ')[source]
Visualize the relationship between two numeric column.
Add new title and subtitle content as a string.
- Parameters:
data – variable
first_numeric – str
second_numeric – str
title – str
subtitle – str
- Returns:
regression plot
- descriptive.pyeda.visualize_multi_numeric_columns_avg(data, numerical_col: list, categorical_col: str, title: str = 'Add chart title ', subtitle: str = 'Explain ur data viz by subtitle ')[source]
Visualize the mean of a multiple numeric columns by the categories of a categorical column. Add new Title and subtitle content to describe your chart.
- Parameters:
data – variable
numerical_col – list
categorical_col – str
title – str
subtitle – str
- Returns:
point plot chart
- descriptive.pyeda.visualize_pair_plot(data, numerical_columns: list, by_categorical_col: str)[source]
Visualize the relationship between multiple numeric columns by categorical column.
- Parameters:
data – variable
numerical_columns – list
by_categorical_col – str
- Returns:
pair plot
- descriptive.pyeda.visualize_pie_chart(data, categorical_col: str, numerical_col: str, title: str = 'Add Chart Title', subtitle: str = 'explain ur data viz by subtitle')[source]
Visualize pie chart.
Add new title and subtitle content as a string.
- Parameters:
data – variable
categorical_col – str
numerical_col – str
title – str
subtitle – str
- Returns:
pie chart
- descriptive.pyeda.visualize_point_plot(data, numerical_col: str, categorical_col: str, title: str = 'Add Chart Title', subtitle: str = 'Explain ur data viz by subtitle ')[source]
Visualize the mean of a numerical column by the categories of a categorical column.
Add new title and subtitle content as a string.
- Parameters:
data – variable
numerical_col – str
categorical_col – str
title – str
subtitle – str
- Returns:
point plot
- descriptive.pyeda.visualize_stack_bar(data, first_categorical: str, second_categorical: str, title: str = 'Add Chart Title', subtitle: str = 'explain ur data viz by subtitle')[source]
Visualize percentage relationship using two categorical variables.
Add new title and subtitle content as a string.
- Parameters:
data – Data fame variable
first_categorical – str
second_categorical – str
title – str
subtitle – str
- Returns:
stacked bar plot
- descriptive.pyeda.visualize_time_relationship(data, date_colum: str, numerical_column: str, filter_by: str, title: str = 'add chart title', subtitle: str = 'explain ur data viz by subtitle ')[source]
Visualize the sum of all shared data points for continuous variable by date variable, filter by :
Year
Month
Day
Add new title and subtitle content as a string.
- Parameters:
data – variable
date_colum – str
numerical_column – str
filter_by – str
title – str
subtitle – str
- Returns:
Line Chart
- descriptive.pyeda.visualize_time_relationship_by_categorical_variable(data, date_colum: str, numerical_col: str, categorical_colum: str, filter_by: str, title: str = 'add chart title', subtitle: str = 'explain ur data viz by subtitle ')[source]
Visualize the sum between date column and continues column with categorical column, filter by :
Year
Month
Day
Add new title and subtitle content as a string.
- Parameters:
data – variable
date_colum – str
numerical_col – str
filter_by – str
title – str
categorical_colum – str
subtitle – str
- Returns:
Line Chart