This project analyzes global energy production data to understand trends, correlations, and optimization opportunities in power plants. The dataset includes information about power plant capacities, fuel types, and energy generation over several years.
The dataset used in this project is global_power_plant_database.csv, which contains the following columns:
country: Country where the power plant is located country_long: Long name of the country name: Name of the power plant gppd_idnr: Unique identifier for the power plant capacity_mw: Capacity of the power plant in megawatts latitude: Latitude of the power plant longitude: Longitude of the power plant primary_fuel: Primary fuel used by the power plant other_fuel1, other_fuel2, other_fuel3: Additional fuels used commissioning_year: Year when the power plant was commissioned generation_gwh_2013 to generation_gwh_2019: Energy generation in gigawatt-hours for each year generation_data_source: Source of the generation data estimated_generation_gwh_2013 to estimated_generation_gwh_2017: Estimated energy generation in gigawatt-hours
Python: Programming language used for data analysis and modeling. Pandas: Library for data manipulation and analysis. NumPy: Library for numerical operations. Matplotlib: Library for creating static, animated, and interactive visualizations. Seaborn: Statistical data visualization library based on Matplotlib. Scikit-Learn: Machine learning library used for building and evaluating regression models.