import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("data/wdi.csv")
df.shape(217, 14)
]— title: “WDI 2022 Exploratory Data Analysis” author: “Saahil Mardhekar” date: “2026-02-25” format: html: toc: true number-sections: true code-fold: true pdf: toc: true number-sections: true execute: echo: false warning: false message: false bibliography: references.bib crossref: true —
This report analyzes 2022 country-level data from the World Development Indicators dataset published by the World Bank [@worldbank_wdi].
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("data/wdi.csv")
df.shape(217, 14)
df.head()| country | inflation_rate | exports_gdp_share | gdp_growth_rate | gdp_per_capita | adult_literacy_rate | primary_school_enrolment_rate | education_expenditure_gdp_share | measles_immunisation_rate | health_expenditure_gdp_share | income_inequality | unemployment_rate | life_expectancy | total_population | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | NaN | 18.380042 | -6.240172 | 352.603733 | NaN | NaN | NaN | 68.0 | NaN | NaN | 14.100 | 62.879 | 41128771.0 |
| 1 | Albania | 6.725203 | 37.395422 | 4.856402 | 6810.114041 | 98.5 | 95.606712 | 2.74931 | 86.0 | NaN | NaN | 11.588 | 76.833 | 2777689.0 |
| 2 | Algeria | 9.265516 | 31.446856 | 3.600000 | 5023.252932 | NaN | 108.343933 | NaN | 79.0 | NaN | NaN | 12.437 | 77.129 | 44903225.0 |
| 3 | American Samoa | NaN | 46.957520 | 1.735016 | 19673.390102 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 44273.0 |
| 4 | Andorra | NaN | NaN | 9.563798 | 42350.697069 | NaN | 90.147346 | 2.66623 | 98.0 | NaN | NaN | NaN | NaN | 79824.0 |
df.isna().sum().sort_values(ascending=False)health_expenditure_gdp_share 197
income_inequality 189
adult_literacy_rate 168
education_expenditure_gdp_share 112
primary_school_enrolment_rate 103
inflation_rate 48
exports_gdp_share 48
unemployment_rate 31
measles_immunisation_rate 24
gdp_growth_rate 15
gdp_per_capita 14
life_expectancy 8
country 0
total_population 0
dtype: int64
df[["gdp_per_capita"]].describe()| gdp_per_capita | |
|---|---|
| count | 203.000000 |
| mean | 20345.707649 |
| std | 31308.942225 |
| min | 259.025031 |
| 25% | 2570.563284 |
| 50% | 7587.588173 |
| 75% | 25982.630050 |
| max | 240862.182448 |
df["gdp_per_capita"].dropna().plot(kind="hist", bins=30)
plt.title("GDP per Capita (2022)")
plt.xlabel("GDP per Capita (USD)")
plt.ylabel("Count")
plt.tight_layout()
plt.show()
See Figure Figure 1 and Table Table 1.
df[["life_expectancy"]].describe()| life_expectancy | |
|---|---|
| count | 209.000000 |
| mean | 72.416519 |
| std | 7.713322 |
| min | 52.997000 |
| 25% | 66.782000 |
| 50% | 73.514634 |
| 75% | 78.475000 |
| max | 85.377000 |
sub = df[["gdp_per_capita", "life_expectancy"]].dropna()
plt.scatter(sub["gdp_per_capita"], sub["life_expectancy"])
plt.title("Life Expectancy vs GDP per Capita (2022)")
plt.xlabel("GDP per Capita (USD)")
plt.ylabel("Life Expectancy (years)")
plt.tight_layout()
plt.show()
See Figure Figure 2 and Table Table 2.
df[["unemployment_rate"]].describe()| unemployment_rate | |
|---|---|
| count | 186.000000 |
| mean | 7.268661 |
| std | 5.827726 |
| min | 0.130000 |
| 25% | 3.500750 |
| 50% | 5.537500 |
| 75% | 9.455250 |
| max | 37.852000 |
top_unemp = (
df[["country", "unemployment_rate"]]
.dropna()
.sort_values("unemployment_rate", ascending=False)
.head(15)
)
plt.bar(top_unemp["country"], top_unemp["unemployment_rate"])
plt.title("Top 15 Unemployment Rates (2022)")
plt.xlabel("Country")
plt.ylabel("Unemployment Rate (%)")
plt.xticks(rotation=60, ha="right")
plt.tight_layout()
plt.show()
See Figure Figure 3 and Table Table 3.
key = df[["gdp_per_capita", "life_expectancy", "unemployment_rate"]]
summary = pd.DataFrame({
"count": key.count(),
"mean": key.mean(),
"median": key.median(),
"std": key.std(),
"min": key.min(),
"max": key.max()
})
summary| count | mean | median | std | min | max | |
|---|---|---|---|---|---|---|
| gdp_per_capita | 203 | 20345.707649 | 7587.588173 | 31308.942225 | 259.025031 | 240862.182448 |
| life_expectancy | 209 | 72.416519 | 73.514634 | 7.713322 | 52.997000 | 85.377000 |
| unemployment_rate | 186 | 7.268661 | 5.537500 | 5.827726 | 0.130000 | 37.852000 |
See Table Table 4.
GDP per capita shows a strongly right-skewed distribution, indicating that a small number of countries have very high income levels relative to the majority. Life expectancy generally increases with GDP per capita, suggesting a positive association between income and health outcomes. Unemployment rates vary widely across countries and do not display a simple linear pattern with GDP per capita. These findings are based on available 2022 data from the World Development Indicators dataset [@worldbank_wdi].