Skip to content

This repository showcases my learning process of automating EDA using 'ydata-profiling'

License

Notifications You must be signed in to change notification settings

amitbisht99/ydata-profiling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automating EDA with ydata-profiling

This repository demonstrates how to automate Exploratory Data Analysis (EDA) using the ydata-profiling library (formerly known as pandas-profiling). It simplifies the process of generating a comprehensive EDA report, saving time and ensuring a thorough analysis.

🚀 Features of ydata-profiling

The tool provides the following capabilities:

  • Type Inference: Automatically detects data types (Categorical, Numerical, Date, etc.).
  • Warnings: Identifies data challenges like missing values, inaccuracies, skewness, and more.
  • Univariate Analysis: Generates descriptive statistics (mean, median, mode, etc.) and visualizations like histograms.
  • Multivariate Analysis: Includes correlation analysis, missing data summaries, duplicate rows detection, and pairwise variable interactions.
  • Time-Series Analysis: Provides insights such as auto-correlation, seasonality, and ACF/PACF plots.
  • Text Analysis: Detects most common categories, scripts, and blocks (e.g., Latin, ASCII).
  • File & Image Analysis: Reviews file sizes, creation dates, dimensions, and EXIF metadata.
  • Dataset Comparison: Quickly compares datasets in one line of code.
  • Flexible Output Formats: Reports can be exported as:
    • HTML: Easily shareable interactive reports
    • JSON: Suitable for automation systems
    • Jupyter Notebook Widgets

📂 Project Structure

  • data/: Contains sample datasets used for demonstration.
  • notebooks/: Jupyter Notebooks showcasing how to use ydata-profiling.
  • output/: Stores generated EDA reports.

🛠️ Getting Started

For Pre-requisites & Running Code, Refer: https://github.com/ydataai/ydata-profiling

📊 Sample Output The output/ folder contains example reports generated with ydata-profiling.

Reports include: Data summary (missing values, duplicates, etc.) Visualizations (correlations, distributions, etc.) Detailed variable analysis

🎥 Credits: Big thanks to https://www.youtube.com/@CodeWithHarry for his excellent tutorial https://www.youtube.com/watch?v=sGQfiyXOvF0&t=1136s on pandas profiling, which inspired this project.

🤝 Contributing: Contributions are welcome! If you have suggestions, feel free to open an issue or submit a pull request.

📜 License: This project is licensed under the MIT License.

💬 Feedback: If you find this project helpful or have any questions, feel free to reach out!

About

This repository showcases my learning process of automating EDA using 'ydata-profiling'

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published