Authors
This analysis is a collaborative work produced by Tulane University students, Rhon Farber and Chente McAneny-Droz, for CMPS-3160: Introduction to Data Science, taught by Dr. Culotta in the Fall 2023 term. In addition to computer science, Rhon is double mmajoring in finance and Chente is double majoring in economics; both students belong to the class of 2025.
Background
Politics, globally, has become increasingly divisive in recent years with parties taking up more radical stances and populations being polarized to either the left or right, with little remianing in the center. This has transformed the political landscape with increasing stalemates and complete overhauls when the opposing party gains power. This begs the question how does this partisanship effect the healthy development of countries and who does it best: the left or right?
The United States has acted as a hegemon since the World Wars and has been the largest economy since before then. Despite over a hundred years sitting at the top, the United States has failed to claim the top spot, for decades now, on the Human Development Index (HDI) and has fallen out of the top 20 most developed countries as of 2021, challenging the United State's superiority complex.
Our goal is to discover which side of the political spectrum is best when it comes to development as outlined in the HDI provided by the United Nations Development Programme. In order to do this, we will use historical data on politicial institutions provided by The World Bank. This project will be a comparative analysis, and we hope to synthesize what will allow the U.S. to increase ranks in terms of the HDI based on polticial lean.
With that said, the scope of the analysis will be limited to the original G7 countries: Canada, France, Germany, Italy, Japan, the United Kingdom, and the United States. When comparing internaitonal markets and policies, countries must have alike economies due to the relativity of convergence points; the G7 counties are still among the top 10 global economies, as of now.
Collaboration Plan
Our collaboration plan is to meet at Howard-Tilton Memorial Library weekly. This weekly recurring meeting will be on Wednesdays at 3:30 PM. During this time, we will designates tasks to complete for the next meeting and go over what we had completed from the last meeting. This time will be used to go over any questions or factors we need to consider in our findings. Outside our designated meet time, both partners are expected to be reasonably responsive over text message for any questions that arise between meetings. A collaborative GitHub repository has already been set up to share resources. All Colab and Google Doc files will be easily accessible by the other partner. A master Google Sheet with personal deadlines and project deadlines will be made to make sure the project stays on schedule.
Data Linkage
Before we start, we must clone the repository we will be pulling pre-uploaded data from. This allows us to access and modify files from said external repositroy stored on Github under a collaborative account between partners. Additionally, specific libraries must be imported to run certain functions within the program; this will be done here as well.
Here is the link to our Github Repository: farber-mcaneny
%cd /content
!git clone https://github.com/nmcanenydroz/farber-mcaneny
#import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
/content Cloning into 'farber-mcaneny'... remote: Enumerating objects: 43, done. remote: Counting objects: 100% (43/43), done. remote: Compressing objects: 100% (38/38), done. remote: Total 43 (delta 9), reused 0 (delta 0), pack-reused 0 Receiving objects: 100% (43/43), 4.01 MiB | 7.31 MiB/s, done. Resolving deltas: 100% (9/9), done.
Background
The Human-Development Index (HDI) is a metric developed by the United Nations and launched in 1990. The summary measure is a "geometric mean of normalized indices" across three main dimensions: health, education, and stardard of living. Through the many indicators taken into account across the different dimensions, countries are assigned a score between 0 and 1 then ranked accordingly. Though economic facotrs are taken into account, this metric was designed to access global development and quality of life.
Figure 1: United Nations Development Programme Chart
Data Sets
We took two data sets off of the United Nations Development Programme (UNDP) website's data center
Both files can be found on this page: UNDP Data Center
Extraction, Transform, and Load
Taking the raw link from the Github repository, we preform a simple Pandas read function and assign it to a variable. Since the metadata will only be used for reference, we will load the two datasets in the same step.
#load data
HDI = pd.read_csv("https://raw.githubusercontent.com/nmcanenydroz/farber-mcaneny/data/HDI_Rankings.csv") #composite indicies
HDI_def = pd.read_excel("https://github.com/nmcanenydroz/farber-mcaneny/raw/data/HDI_Definitions.xlsx") #metadata
HDI.head() #output
iso3 | country | hdicode | region | hdi_rank_2021 | hdi_1990 | hdi_1991 | hdi_1992 | hdi_1993 | hdi_1994 | ... | mf_2012 | mf_2013 | mf_2014 | mf_2015 | mf_2016 | mf_2017 | mf_2018 | mf_2019 | mf_2020 | mf_2021 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AFG | Afghanistan | Low | SA | 180.0 | 0.273 | 0.279 | 0.287 | 0.297 | 0.292 | ... | 1.86 | 1.88 | 1.66 | 1.62 | 1.66 | 1.41 | 1.32 | 1.38 | 1.38 | 1.38 |
1 | AGO | Angola | Medium | SSA | 148.0 | NaN | NaN | NaN | NaN | NaN | ... | 4.09 | 4.53 | 3.97 | 3.59 | 2.79 | 2.64 | 2.28 | 2.18 | 2.18 | 2.18 |
2 | ALB | Albania | High | ECA | 67.0 | 0.647 | 0.629 | 0.614 | 0.617 | 0.624 | ... | 12.44 | 11.49 | 13.14 | 12.61 | 14.39 | 14.46 | 12.85 | 12.96 | 12.96 | 12.96 |
3 | AND | Andorra | Very High | NaN | 40.0 | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | ARE | United Arab Emirates | Very High | AS | 26.0 | 0.728 | 0.739 | 0.742 | 0.748 | 0.755 | ... | 49.56 | 49.68 | 55.49 | 59.76 | 64.95 | 75.61 | 65.97 | 68.95 | 68.95 | 68.95 |
5 rows × 1008 columns
Next, we are going to begin data cleaning. Since we are only focusing on the original G7 countires, we can drop all other data. Let's keep the world average as well, just as a benchmark.
HDI['country'].unique() #checking syntax/abbreviations of the countries
G7 = ["Canada", "France", "Germany", "Italy", "Japan", "United Kingdom", "United States", "World"] #store countries to keep in variable
HDI = HDI[HDI['country'].isin(G7)] #keep only G7 countries
# drop non-time series variables
HDI = HDI.drop('iso3', axis=1)
HDI = HDI.drop('region', axis=1)
HDI = HDI.drop('hdicode', axis=1)
HDI = HDI.drop('rankdiff_hdi_phdi_2021', axis=1)
HDI = HDI.drop('gdi_group_2021', axis=1)
HDI.info() #check data types
HDI.head(10) #updated df
<class 'pandas.core.frame.DataFrame'> Int64Index: 8 entries, 29 to 205 Columns: 1003 entries, country to mf_2021 dtypes: float64(1002), object(1) memory usage: 62.8+ KB
country | hdi_rank_2021 | hdi_1990 | hdi_1991 | hdi_1992 | hdi_1993 | hdi_1994 | hdi_1995 | hdi_1996 | hdi_1997 | ... | mf_2012 | mf_2013 | mf_2014 | mf_2015 | mf_2016 | mf_2017 | mf_2018 | mf_2019 | mf_2020 | mf_2021 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
29 | Canada | 15.0 | 0.860 | 0.864 | 0.868 | 0.866 | 0.872 | 0.876 | 0.879 | 0.880 | ... | 34.830000 | 35.000000 | 33.820000 | 34.420000 | 35.460000 | 36.480000 | 35.840000 | 35.050000 | 35.050000 | 35.05000 |
44 | Germany | 9.0 | 0.829 | 0.835 | 0.842 | 0.852 | 0.858 | 0.863 | 0.868 | 0.874 | ... | 20.360000 | 20.360000 | 20.700000 | 20.060000 | 20.500000 | 21.290000 | 20.230000 | 19.400000 | 19.400000 | 19.40000 |
58 | France | 28.0 | 0.791 | 0.799 | 0.806 | 0.809 | 0.823 | 0.828 | 0.833 | 0.838 | ... | 20.900000 | 20.950000 | 20.750000 | 20.150000 | 19.790000 | 19.700000 | 17.610000 | 17.100000 | 17.100000 | 17.10000 |
61 | United Kingdom | 18.0 | 0.804 | 0.809 | 0.816 | 0.820 | 0.828 | 0.828 | 0.834 | 0.842 | ... | 17.940000 | 17.860000 | 19.090000 | 18.970000 | 18.270000 | 18.200000 | 18.010000 | 17.850000 | 17.850000 | 17.85000 |
84 | Italy | 30.0 | 0.778 | 0.783 | 0.789 | 0.795 | 0.804 | 0.810 | 0.816 | 0.824 | ... | 14.320000 | 13.740000 | 13.370000 | 13.160000 | 13.480000 | 13.550000 | 12.330000 | 11.850000 | 11.850000 | 11.85000 |
87 | Japan | 19.0 | 0.845 | 0.849 | 0.850 | 0.855 | 0.861 | 0.863 | 0.868 | 0.871 | ... | 22.570000 | 22.800000 | 22.420000 | 20.300000 | 19.280000 | 19.470000 | 18.890000 | 18.180000 | 18.180000 | 18.18000 |
184 | United States | 21.0 | 0.872 | 0.873 | 0.878 | 0.880 | 0.884 | 0.885 | 0.887 | 0.890 | ... | 28.520000 | 28.090000 | 28.230000 | 28.330000 | 28.060000 | 28.120000 | 29.140000 | 29.650000 | 29.650000 | 29.65000 |
205 | World | NaN | 0.601 | 0.604 | 0.607 | 0.610 | 0.614 | 0.619 | 0.624 | 0.629 | ... | 12.186601 | 12.474065 | 12.520563 | 12.372667 | 12.277707 | 12.277878 | 12.220286 | 12.375236 | 12.325166 | 12.27192 |
8 rows × 1003 columns
We need to transform this dataset and set an index because the data is being sorted by row number and is in wide format. To do this, we need to iterate through the variable names and extract the year to create a separate year column.
# Separate non-time-scaled variables if needed
non_time_scaled = HDI[['country', 'hdi_rank_2021', 'gii_rank_2021']]
HDI = HDI.drop(columns=['hdi_rank_2021', 'gii_rank_2021'])
# Melt the DataFrame
HDI_melted = HDI.melt(id_vars=['country'], var_name='year_measure', value_name='value')
# Split 'year_measure' into two columns 'year' and 'measure'
HDI_melted[['measure', 'year']] = HDI_melted['year_measure'].str.extract(r'([a-zA-Z_]+)(\d+)')
HDI_melted.drop(columns='year_measure', inplace=True)
# Convert year to numeric
HDI_melted['year'] = pd.to_numeric(HDI_melted['year'], errors='coerce')
# Drop rows with NaN in 'year' if any
HDI_melted = HDI_melted.dropna(subset=['year'])
# Pivot the table
HDI_pivoted = HDI_melted.pivot_table(index=['country', 'year'], columns='measure', values='value', aggfunc='first')
# Reset index to have country and year as columns
HDI_final = HDI_pivoted.reset_index()
# Fix column names if necessary
HDI_final.columns.name = None
# Remove any trailing underscores
HDI_final.columns = HDI_final.columns.str.rstrip('_')
# Select all rows except the first one
HDI_final = HDI_final.iloc[1:]
# drop inconsistent time-series variables (starts in 2010, not necessary)
HDI_final = HDI_final.drop('coef_ineq', axis=1)
HDI_final = HDI_final.drop('loss', axis=1)
HDI_final = HDI_final.drop('ineq_le', axis=1)
HDI_final = HDI_final.drop('ineq_edu', axis=1)
HDI_final = HDI_final.drop('ineq_inc', axis=1)
HDI_final = HDI_final.drop('co', axis=1)
# Display the first few rows of the transformed DataFrame
HDI_final.head(10)
# Make a separate dataframe that excludes the world averages
HDI_G7 = HDI_final[HDI_final['country'] != 'World']
Exploratory Data Analysis
Now that everything is loaded and transformed, let's explore trends within and create graphics to learn more about the UNDP dataset.
A) Let's start with simple summary statistics in order to understand the spread of the data. We should exclude the world average since it's a benchmark, and it will weigh down the statistics.
stat = HDI_G7['hdi'].describe() #use filtered df without world average
print(stat) #output
count 224.000000 mean 0.884933 std 0.036440 min 0.778000 25% 0.866000 50% 0.889000 75% 0.913250 max 0.948000 Name: hdi, dtype: float64
As we can see, the standard deviation is quite low meaning much of the data is centered around the mean of 0.884933. For the G7 countries the mean is very high according to the threshold, as listed on the UNDP website, of ≥ 0.8, even the minimum is high and only slightly falls out of the very high categorization. Globally, all of the G7 countries are leaders in HCI scores and have consistently been.
B) Next let's make a correlation matrix that compares all of the variable to the single HDI variable in an effort to understand the types of relationship each one has. All the variables are used to compute the final HDI score so there is an established relationship, but let's see the magnitude.
HDI_corr = HDI_G7.corr()['hdi'] #create correlation matrix only by the hdi variable
HDI_corr.abs().sort_values(ascending=False) #sort from strongest to weakest correlations
<ipython-input-6-4cb8b49c905d>:1: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning. HDI_corr = HDI_G7.corr()['hdi'] #create correlation matrix only by the hdi variable
hdi 1.000000 hdi_m 0.987939 hdi_f 0.984156 se_m 0.930239 se_f 0.895607 mys_f 0.894586 mys 0.894299 ihdi 0.883862 mys_m 0.877203 year 0.802706 gni_pc_f 0.738864 gnipc 0.692018 eys 0.650410 eys_m 0.636044 le_m 0.626480 pr_m 0.625355 pr_f 0.625355 eys_f 0.618409 lfpr_f 0.573178 gni_pc_m 0.554645 le 0.529518 gii 0.480361 le_f 0.401160 gdi 0.331170 phdi 0.302999 diff_hdi_phdi 0.226837 mf 0.222624 lfpr_m 0.082819 abr 0.055515 mmr 0.043379 Name: hdi, dtype: float64
The strongest relationships involve secondary education, mean years in school, and gross national income per capita. Noticeably there are two variables that deal directly with government: pr_f and pr_m. These are the share of seats in parliament or legislative branch held by females (pr_f) and males (pr_f). Their coefficeints are indentical and have a strong relationship with HDI. Historically, there has been a relationship between gender and development due to inequality in quality of life which explains the strength of the coefficents.
C) Finally, let's chart the HDI scores over time and compare the both G7 countries and the world average
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(12, 6))
# plot x-axis as year and y-axis as hdi with each country
sns.lineplot(data=HDI_final, x='year', y='hdi', hue='country', marker='o')
# plot descriptions
plt.title('HDI Score by Country Over Time')
plt.ylabel('HDI Score')
plt.xlabel('Year')
plt.legend(title='Country', loc='lower right')
plt.show()
Unsuprisingly, the G7 countries have a much higher HDI score than the world average. Interestingly, all G7 countries started in the 0.75-0.90 range and then by the end are in the tighter 0.85-0.95 range. Additionally, despite the US starting on top, it ended in 4th place out of the G7 countries. Conversely, Germany started in the middle and ended up on top. France and Italy remained constant, both starting and ending with their respective ranks at the bottom, despite fluctuating with each other from the early 2000s to the early 2010s.
Background
Founded in 1947, The World Bank (WB) is an international financial institution that funds development projects in low and middle-income countries with the goal of getting rid of poverty. Through this mission, they keep extensive political, social, and economic datasets. The following dataset tracks electoral and institutional data over the years and gives information about the political spectrum, number of seats, number of opposing parties, and specific metrics such as polarization, fragmentation, and political cohesion. The most pertinent data will be on the entries on the political spectrum: left, center, and right.
Data Sets
We took two datasets off The World Bank Data Catalog.
Both can be found on this page: WB Data Catalog
Extraction, Transform, and Load
Taking the raw link from the Github repository, we preform a simple Pandas read function and assign it to a variable. The metadata will only be used for reference.
politics = pd.read_excel("https://github.com/nmcanenydroz/farber-mcaneny/raw/data/dpi2012.xls") #load from raw
politics #output
countryname | ifs | year | system | yrsoffc | finittrm | yrcurnt | multpl | military | defmin | ... | checks | stabs_strict | stabs | stabns_strict | stabns | tenlong_strict | tenlong | tenshort_strict | tenshort | polariz | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | South Sudan | NaN | 2011 | Presidential | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | ... | 1.0 | NaN | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 |
1 | South Sudan | NaN | 2012 | Presidential | 2.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | ... | 1.0 | 0.00 | 0.00 | 0.000000 | 0.000000 | 2.0 | 2.0 | 2.0 | 2.0 | 0.0 |
2 | Turk Cyprus | 0 | 1975 | -999 | 1.0 | -999.0 | -999.0 | -999.0 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | Turk Cyprus | 0 | 1976 | Presidential | 1.0 | 1.0 | 0.0 | 1.0 | NaN | NaN | ... | 1.0 | NaN | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 |
4 | Turk Cyprus | 0 | 1977 | Presidential | 2.0 | 1.0 | 4.0 | 1.0 | NaN | NaN | ... | 3.0 | 0.00 | 0.00 | 0.000000 | 0.000000 | 2.0 | 2.0 | 1.0 | 1.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6759 | Zimbabwe | ZWE | 2008 | Presidential | 21.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | ... | 3.0 | 0.00 | 0.00 | 0.000000 | 0.000000 | 28.0 | 28.0 | 21.0 | 21.0 | 0.0 |
6760 | Zimbabwe | ZWE | 2009 | Presidential | 22.0 | 1.0 | 4.0 | 1.0 | 0.0 | 0.0 | ... | 4.0 | 0.25 | 0.25 | 0.333333 | 0.333333 | 22.0 | 22.0 | 1.0 | 1.0 | NaN |
6761 | Zimbabwe | ZWE | 2010 | Presidential | 23.0 | 1.0 | 3.0 | 1.0 | 0.0 | 0.0 | ... | 4.0 | 0.00 | 0.00 | 0.000000 | 0.000000 | 23.0 | 23.0 | 2.0 | 2.0 | NaN |
6762 | Zimbabwe | ZWE | 2011 | Presidential | 24.0 | 1.0 | 2.0 | 1.0 | 0.0 | 0.0 | ... | 4.0 | 0.00 | 0.00 | 0.000000 | 0.000000 | 24.0 | 24.0 | 3.0 | 3.0 | NaN |
6763 | Zimbabwe | ZWE | 2012 | Presidential | 25.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | ... | 4.0 | 0.00 | 0.00 | 0.000000 | 0.000000 | 25.0 | 25.0 | 4.0 | 4.0 | NaN |
6764 rows × 125 columns
First, this dataframe has 177 individual countries and 113 variables so lets keep only the G7 countries and the following 11 variables: 'countryname', 'year', 'execme', 'execrlc', 'herfgov', 'totalseats', 'gov1me', 'gov1rlc', 'opp1me', 'opp1rlc', 'polariz'
In order to keep only the G7 countries however, we will need to rename a few of the countries, namely, UK to United Kingdom, US to United States, and FRG/Germany to just Germany.
We will print the unique countries and the head of the table to make sure all the changes were done properly.
# Define columns
columns = ['countryname', 'year', 'execme', 'execrlc', 'herfgov', 'totalseats', 'gov1me', 'gov1rlc',
'opp1me', 'opp1rlc', 'polariz']
politics = politics[columns]
politics = politics.rename(columns={'countryname': 'country'}) #keeping country variable consistent for merge
# Define the replacements
replacements = {'USA': 'United States', 'FRG/Germany': 'Germany', 'UK': 'United Kingdom'}
# Replace the values in the 'country' column
politics['country'] = politics['country'].replace(replacements)
politics = politics[politics['country'].isin(G7)] #keep only G7 countries
print(politics['country'].unique()) #prints each country
politics.head()
['Canada' 'Germany' 'France' 'United Kingdom' 'Italy' 'Japan' 'United States']
country | year | execme | execrlc | herfgov | totalseats | gov1me | gov1rlc | opp1me | opp1rlc | polariz | |
---|---|---|---|---|---|---|---|---|---|---|---|
1066 | Canada | 1975 | LPC | Left | 1.0 | 263 | LPC | Left | PCP | Right | 0.0 |
1067 | Canada | 1976 | LPC | Left | 1.0 | 263 | LPC | Left | PCP | Right | 0.0 |
1068 | Canada | 1977 | LPC | Left | 1.0 | 258 | LPC | Left | PCP | Right | 0.0 |
1069 | Canada | 1978 | LPC | Left | 1.0 | 258 | LPC | Left | PCP | Right | 0.0 |
1070 | Canada | 1979 | LPC | Left | 1.0 | 258 | LPC | Left | PCP | Right | 0.0 |
It looks like several columns have repeated categories for the data they represent. For example, we will only be looking at 7 countries so making sure 'country' is a 'category' dtype so it can be more efficient. Next we will check all of the data types, and makes changes as appropriate.
politics.dtypes #check types
country object year int64 execme object execrlc object herfgov float64 totalseats int64 gov1me object gov1rlc object opp1me object opp1rlc object polariz float64 dtype: object
Looks like most of the dtypes are appropriate however we should change some of them to a 'category' rather than 'object' for efficiency. The ones that need to be changed are below:
#change type to categorical
politics['country'] = politics['country'].astype('category')
politics['execrlc'] = politics['execrlc'].astype('category')
politics['gov1rlc'] = politics['gov1rlc'].astype('category')
politics['opp1rlc'] = politics['opp1rlc'].astype('category')
Exploratory Data Analysis
Now that everything is loaded and transformed, let's explore trends within and create graphics to learn more about the WB dataset.
A) First, let's create a countplot that shows the frequency of political lean for each of the countries to understand, historically, which way each country leans. We will use the 'exerlc' variable which the political lean of the executive (president/prime minister)
plt.figure(figsize=(12, 8)) #adjust size
sns.countplot(x='execrlc', hue='country', data=politics) #use a countplot because x is categorical
plt.title('Political Leanings Frequency') #add title
plt.xlabel('Political Leanings') #provide x axis label
Text(0.5, 0, 'Political Leanings')
As we can see, there is a couple stray 0 entries for Japan and a couple -999 entries for France. This is because they are inbetween categories on the spectrum: right-center or left-center. Let's filter the data for Japan only and France only to research the political lean of the parties that are causing the 0 and -999. We will manually check through the data frames.
japan_only = politics[politics['country'] == 'Japan'] #filter to only Japan
display(japan_only) #output
country | year | execme | execrlc | herfgov | totalseats | gov1me | gov1rlc | opp1me | opp1rlc | polariz | |
---|---|---|---|---|---|---|---|---|---|---|---|
3156 | Japan | 1975 | LDP | Right | NaN | 0 | LDP | Right | Socialist | Left | NaN |
3157 | Japan | 1976 | LDP | Right | NaN | 0 | LDP | Right | Socialist | Left | NaN |
3158 | Japan | 1977 | LDP | Right | 1.000000 | 511 | LDP | Right | Socialist | Left | 2.0 |
3159 | Japan | 1978 | LDP | Right | 1.000000 | 520 | LDP | Right | Socialist | Left | 2.0 |
3160 | Japan | 1979 | LDP | Right | 1.000000 | 511 | LDP | Right | Socialist | Left | 2.0 |
3161 | Japan | 1980 | LDP | Right | 1.000000 | 502 | LDP | Right | Socialist | Left | 2.0 |
3162 | Japan | 1981 | LDP | Right | 1.000000 | 511 | LDP | Right | Socialist | Left | 0.0 |
3163 | Japan | 1982 | LDP | Right | 1.000000 | 456 | LDP | Right | Socialist | Left | 0.0 |
3164 | Japan | 1983 | LDP | Right | 1.000000 | 456 | LDP | Right | Socialist | Left | 0.0 |
3165 | Japan | 1984 | LDP | Right | 1.000000 | 511 | LDP | Right | Socialist | Left | 2.0 |
3166 | Japan | 1985 | LDP | Right | 1.000000 | 511 | LDP | Right | Socialist | Left | 2.0 |
3167 | Japan | 1986 | LDP | Right | 1.000000 | 511 | LDP | Right | Socialist | Left | 2.0 |
3168 | Japan | 1987 | LDP | Right | 0.962040 | 512 | LDP | Right | Socialist | Left | 0.0 |
3169 | Japan | 1988 | LDP | Right | 1.000000 | 511 | LDP | Right | Socialist | Left | 0.0 |
3170 | Japan | 1989 | LDP | Right | 1.000000 | 511 | LDP | Right | Socialist | Left | 0.0 |
3171 | Japan | 1990 | LDP | Right | 1.000000 | 511 | LDP | Right | Socialist | Left | 0.0 |
3172 | Japan | 1991 | LDP | Right | 1.000000 | 515 | LDP | Right | Socialist | Left | 0.0 |
3173 | Japan | 1992 | LDP | Right | 0.758301 | 492 | LDP | Right | SDPJ | Left | 0.0 |
3174 | Japan | 1993 | LDP | Right | 0.758301 | 492 | LDP | Right | SDPJ | Left | 0.0 |
3175 | Japan | 1994 | JNP | Right | 0.197264 | 511 | SDPJ | Left | LDP | Right | 2.0 |
3176 | Japan | 1995 | SDPJ | Left | 0.529525 | 509 | LDP | Right | NFP | Right | 2.0 |
3177 | Japan | 1996 | SDPJ | Left | 0.529525 | 509 | LDP | Right | NFP | Right | 2.0 |
3178 | Japan | 1997 | LDP | Right | 1.000000 | 500 | LDP | Right | NFP | Right | 0.0 |
3179 | Japan | 1998 | LDP | Right | 1.000000 | 500 | LDP | Right | NFP | Right | 0.0 |
3180 | Japan | 1999 | LDP | Right | 1.000000 | 500 | LDP | Right | NFP | Right | 0.0 |
3181 | Japan | 2000 | LDP | Right | 1.000000 | 500 | LDP | Right | NFP | Right | 0.0 |
3182 | Japan | 2001 | LDP | Right | 0.752972 | 480 | LDP | Right | DPJ | 0 | 0.0 |
3183 | Japan | 2002 | LDP | Right | 0.752972 | 480 | LDP | Right | DPJ | 0 | 0.0 |
3184 | Japan | 2003 | LDP | Right | 0.752972 | 480 | LDP | Right | DPJ | 0 | 0.0 |
3185 | Japan | 2004 | LDP | Right | 0.761206 | 480 | LDP | Right | DPJ | 0 | 0.0 |
3186 | Japan | 2005 | LDP | Right | 0.761206 | 480 | LDP | Right | DPJ | 0 | 0.0 |
3187 | Japan | 2006 | LDP | Right | 0.828372 | 480 | LDP | Right | DPJ | 0 | 0.0 |
3188 | Japan | 2007 | LDP | Right | 0.828372 | 480 | LDP | Right | DPJ | 0 | 0.0 |
3189 | Japan | 2008 | LDP | Right | 0.828372 | 480 | LDP | Right | DPJ | 0 | 0.0 |
3190 | Japan | 2009 | LDP | Right | 0.828372 | 480 | LDP | Right | DPJ | 0 | 0.0 |
3191 | Japan | 2010 | DPJ | 0 | 0.938669 | 480 | DPJ | 0 | LDP | Right | 0.0 |
3192 | Japan | 2011 | DPJ | 0 | 0.938669 | 480 | DPJ | 0 | LDP | Right | 0.0 |
3193 | Japan | 2012 | DPJ | 0 | 0.938669 | 480 | DPJ | 0 | LDP | Right | 0.0 |
france_only = politics[politics['country'] == 'France'] #filter to only France
display(france_only) #output
country | year | execme | execrlc | herfgov | totalseats | gov1me | gov1rlc | opp1me | opp1rlc | polariz | |
---|---|---|---|---|---|---|---|---|---|---|---|
2130 | France | 1975 | UDR | Right | 0.553848 | 469 | UDR | Right | PS | Left | 2.0 |
2131 | France | 1976 | UDR | Right | 0.553848 | 469 | UDR | Right | PS | Left | 2.0 |
2132 | France | 1977 | independent | -999 | 0.553848 | 469 | UDR | Right | PS | Left | NaN |
2133 | France | 1978 | independent | -999 | 0.553848 | 469 | UDR | Right | PS | Left | NaN |
2134 | France | 1979 | UDF | Right | 0.505823 | 481 | RPR | Right | PS | Left | 2.0 |
2135 | France | 1980 | UDF | Right | 0.505823 | 481 | RPR | Right | PS | Left | 2.0 |
2136 | France | 1981 | UDF | Right | 0.505823 | 481 | RPR | Right | PS | Left | 2.0 |
2137 | France | 1982 | PS | Left | 0.750092 | 491 | PS/MRG | Left | RPR | Right | 0.0 |
2138 | France | 1983 | PS | Left | 0.750092 | 491 | PS/MRG | Left | RPR | Right | 0.0 |
2139 | France | 1984 | PS | Left | 0.750092 | 491 | PS/MRG | Left | RPR | Right | 0.0 |
2140 | France | 1985 | PS | Left | 0.750092 | 491 | PS/MRG | Left | RPR | Right | 0.0 |
2141 | France | 1986 | PS | Left | 0.750092 | 491 | PS/MRG | Left | RPR | Right | 0.0 |
2142 | France | 1987 | RPR | Right | 0.371118 | 585 | RPR/UDF joint | Right | PS | Left | 2.0 |
2143 | France | 1988 | RPR | Right | 0.371118 | 585 | RPR/UDF joint | Right | PS | Left | 2.0 |
2144 | France | 1989 | PS | Left | 0.908813 | 550 | PS | Left | UDF | Right | 2.0 |
2145 | France | 1990 | PS | Left | 0.908813 | 550 | PS | Left | UDF | Right | 2.0 |
2146 | France | 1991 | PS | Left | 0.908813 | 550 | PS | Left | UDF | Right | 2.0 |
2147 | France | 1992 | PS | Left | 0.908813 | 550 | PS | Left | UDF | Right | 2.0 |
2148 | France | 1993 | PS | Left | 0.908813 | 553 | PS | Left | UDF | Right | 2.0 |
2149 | France | 1994 | RPR | Right | 0.502732 | 577 | RPR | Right | PS | Left | 2.0 |
2150 | France | 1995 | RPR | Right | 0.502732 | 577 | RPR | Right | PS | Left | 2.0 |
2151 | France | 1996 | RPR | Right | 0.502732 | 577 | RPR | Right | PS | Left | 2.0 |
2152 | France | 1997 | RPR | Right | 0.502732 | 577 | RPR | Right | PS | Left | 2.0 |
2153 | France | 1998 | PS | Left | 0.586364 | 577 | PS | Left | RPR | Right | 2.0 |
2154 | France | 1999 | PS | Left | 0.586364 | 577 | PS | Left | RPR | Right | 2.0 |
2155 | France | 2000 | PS | Left | 0.586364 | 577 | PS | Left | RPR | Right | 2.0 |
2156 | France | 2001 | PS | Left | 0.586364 | 577 | PS | Left | RPR | Right | 2.0 |
2157 | France | 2002 | PS | Left | 0.586364 | 577 | PS | Left | RPR | Right | 2.0 |
2158 | France | 2003 | UMP | Right | 0.860365 | 577 | UMP | Right | PS | Left | 0.0 |
2159 | France | 2004 | UMP | Right | 0.860365 | 577 | UMP | Right | PS | Left | 0.0 |
2160 | France | 2005 | UMP | Right | 0.860365 | 577 | UMP | Right | PS | Left | 0.0 |
2161 | France | 2006 | UMP | Right | 0.860365 | 577 | UMP | Right | PS | Left | 0.0 |
2162 | France | 2007 | UMP | Right | 0.860365 | 577 | UMP | Right | PS | Left | 0.0 |
2163 | France | 2008 | UMP | Right | 0.877282 | 580 | UMP | Right | PS | Left | 0.0 |
2164 | France | 2009 | UMP | Right | 0.877282 | 580 | UMP | Right | PS | Left | 0.0 |
2165 | France | 2010 | UMP | Right | 0.877282 | 580 | UMP | Right | PS | Left | 0.0 |
2166 | France | 2011 | UMP | Right | 0.877282 | 580 | UMP | Right | PS | Left | 0.0 |
2167 | France | 2012 | UMP | Right | 0.877282 | 580 | UMP | Right | PS | Left | 0.0 |
The DPJ or the Libral Democratic Party of Japan have 0 values for both 'execrlc' and 'gov1rlc'. After some further research, despite it's name, the DPJ is a conservative right leaning party. Let's change the 0 values to "Right" and re-run the countplot. Additionally for France, the source of the issue is the Union for a Popular Movement Party. After some further research, they are also right leaning. Let's change the -999 values to "Right" as well.
#replace 0s with right by specified row number (Japan)
politics.loc[3191, 'execrlc'] = 'Right'
politics.loc[3191, 'gov1rlc'] = 'Right'
politics.loc[3192, 'execrlc'] = 'Right'
politics.loc[3192, 'gov1rlc'] = 'Right'
politics.loc[3193, 'execrlc'] = 'Right'
politics.loc[3193, 'gov1rlc'] = 'Right'
#replace -999s with right by specified row number (France)
politics.loc[2132, 'execrlc'] = 'Right'
politics.loc[2133, 'gov1rlc'] = 'Right'
#re-run above plote
plt.figure(figsize=(12, 8))
sns.countplot(x='gov1rlc', hue='country', data=politics, order=['Left', 'Center', 'Right']) #redorder the bars
plt.title('Political Leanings Frequency')
plt.xlabel('Political Leanings')
Text(0.5, 0, 'Political Leanings')
With the corrected graph, we can see some clear trends. Starting with the most obvious, Italy has be consistently in the center or right, which is interesting because Italy has the lowest G7 HDI score. Conversely, Japan is almost exclusively right-wing and are in the top 3 for HDI score. The U.S. and U.K. are identical and are split about evenly between right and left and both are in the middle for HDI scores. France and Germany are identical as well with a slight right lean, but France is in the bottom and and Germany is the highest scoring HDI sountry. Lastly, Canada is slightly left leaning and is second in HDI. With the exception of Japan, it seems that a healthy split results in a higher HDI score. These placements may not seem major between 1st and 2nd, but if we were to pull all the countries back in, the rankings would vary more. Germany may be 1st in this filtered dataframe, but they are in 9th globally. Italy may be 7th, but they are 30th globally.
To begin understanding the relationship between HDI score and political lean, we need to merge the two datasets. Since the data sets were already prepared and modified to merge, all that is left is to preform an inner merge. We will merge on the two common columns: year and country
#merge dataframes on 'country' and 'year'
merged_df = pd.merge(HDI_G7, politics, on=['country', 'year'], how='inner')
merged_df.head()
country | year | abr | diff_hdi_phdi | eys | eys_f | eys_m | gdi | gii | gni_pc_f | ... | se_m | execme | execrlc | herfgov | totalseats | gov1me | gov1rlc | opp1me | opp1rlc | polariz | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Canada | 1990 | 27.610 | 28.372093 | 16.767969 | 17.242649 | 16.312469 | 0.983 | 0.193 | 23904.28896 | ... | 82.414486 | PCP | Right | 1.0 | 295 | PCP | Right | LPC | Left | 0.0 |
1 | Canada | 1991 | 26.658 | 26.620370 | 17.126560 | 17.599480 | 16.673040 | 0.983 | 0.189 | 23279.82550 | ... | 83.912903 | PCP | Right | 1.0 | 295 | PCP | Right | LPC | Left | 0.0 |
2 | Canada | 1992 | 26.245 | 26.958525 | 17.263720 | 17.743641 | 16.803530 | 0.984 | 0.188 | 23148.48482 | ... | 85.438564 | PCP | Right | 1.0 | 295 | PCP | Right | LPC | Left | 0.0 |
3 | Canada | 1993 | 25.498 | 26.905312 | 16.850031 | 17.143600 | 16.568150 | 0.980 | 0.185 | 23510.21681 | ... | 86.964224 | PCP | Right | 1.0 | 295 | PCP | Right | LPC | Left | 0.0 |
4 | Canada | 1994 | 25.470 | 27.866972 | 16.959970 | 17.234150 | 16.695801 | 0.979 | 0.185 | 24223.88725 | ... | 88.489885 | LPC | Left | 1.0 | 295 | LPC | Left | BQ | 0 | 0.0 |
5 rows × 40 columns
A) Now with the merged data set, lets create a line graph showing the HDI (Human Development Index) score over time for each G7 country, with the color of the line indicating the political party of the chief executive (left, right, center). This could show if there's any apparent correlation between political orientation and HDI trends.
plt.figure(figsize=(20, 10))
# use different line styles for different political parties
line_styles = {"Left": "-", "Center": "--", "Right": "-."}
party_colors = {"Left": "blue", "Center": "green", "Right": "red"}
# Define line styles and markers for each G7 country
country_styles = {
'Canada': {'linestyle': '-', 'marker': 'o'},
'France': {'linestyle': '--', 'marker': 's'}, # Square marker
'Germany': {'linestyle': '-.', 'marker': '^'}, # Triangle up
'Italy': {'linestyle': ':', 'marker': 'D'}, # Diamond marker
'Japan': {'linestyle': '-', 'marker': 'x'}, # X marker
'United Kingdom': {'linestyle': '--', 'marker': '+'}, # Plus marker
'United States': {'linestyle': '-.', 'marker': '*'} # Star marker
}
# loop plotting each contry's data
for country in merged_df['country'].unique():
if country in country_styles:
country_data = merged_df[merged_df['country'] == country]
years = country_data['year'].unique()
for i in range(len(years)-1):
year_data = country_data[country_data['year'] == years[i]]
next_year_data = country_data[country_data['year'] == years[i+1]]
if not year_data.empty and not next_year_data.empty:
plt.plot([years[i], years[i+1]],
[year_data['hdi'].values[0], next_year_data['hdi'].values[0]],
color=party_colors.get(year_data['execrlc'].values[0], 'black'),
label=f"{country} ({year_data['execrlc'].values[0]})",
linewidth=2,
linestyle=country_styles[country]['linestyle'],
marker=country_styles[country]['marker'])
# discripitve graph info
plt.title('HDI Over Time by Political Party in G7 Countries', fontsize=16)
plt.xlabel('Year', fontsize=14)
plt.ylabel('Human Development Index (HDI)', fontsize=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.grid(True) #add gridlines
sns.set(style="whitegrid", rc={"lines.linewidth": 3}) # increase line thickness
# legend modification
handles, labels = plt.gca().get_legend_handles_labels()
by_label = dict(zip(labels, handles))
plt.legend(by_label.values(), by_label.keys(), loc='lower right', title='Country (Party)')
# change color of the line over time by the political party (using for loop)
<matplotlib.legend.Legend at 0x79ed7bf0dff0>
All G7 countries have shown an increase in HDI over the 20-year period regardless of the political party in power. There doesn't seem to be a consistent pattern where one particular political orientation (left, center, right) leads to a significantly better or worse HDI. In some countries, like Canada and Italy, the left-leaning parties show the highest HDI growth, while in others, like the United Kingdom, the right-leaning parties demonstrate the highest growth. The HDI growth trajectories vary between countries. For instance, the HDI growth in the United States is relatively flat compared to other countries, regardless of the party in power. Conversely, countries like Italy and Germany show a steeper increase in HDI over time.
Looking at this graph, we hypothesize that left or right does not matter because, reguarless, the HDI scores have been consistently increasing, with none of the countries taking a significant decrease. What would be interesting is to create a third variable that measures how long a party is consistently in power, based on the executive branch, and measure if this is a determinate of change in HDI score. Furthermore, instead of solely looking at the raw score, we are going to look at percent change in HDI score to calculate annual growth. Let's do both now:
Consistency Score
To achieve this, we first transformed the execrlc variable, which represents the executive power rating, into a measure of year-on-year change. We did this by creating a party_change variable, which flags a change in the executive branch whenever there was a shift in the execrlc rating from one year to the next within the same country. This approach allowed us to capture the frequency of political shifts, under the assumption that frequent changes indicate lower political stability.
To normalize this measure across countries with different lengths of observation periods, we calculated the total number of years each country was observed in our dataset. This was crucial to ensure that our stability score is not biased by the duration for which a country's data is available. The next step was to aggregate these yearly changes for each country and divide by the total years observed, giving us a preliminary score where a higher value indicated more frequent changes, and thus, lower stability.
Finally, we refined this measure into a more intuitive 'consistency score' by inverting it (subtracting preliminary score from 1). A higher score indicates higher political stability, aligning with the general understanding that less frequent changes in executive power signify a more consistent government overall. Additionally, we also handled any NaN or infinite values by replacing them with 1, ensuring our scores remained within a logical range. This final consistency score provides a standardized measure of political stability across countries.
# Check unique values in 'country' column to ensure correct grouping
print("Unique countries:", merged_df['country'].unique())
# Ensure correct data types
merged_df['year'] = merged_df['year'].astype(int)
merged_df['execrlc'] = merged_df['execrlc'].astype(str)
# Calculate changes in party
merged_df['party_change'] = merged_df.groupby('country')['execrlc'].shift(1) != merged_df['execrlc']
# Fill NaN values in 'party_change' with 0
merged_df['party_change'] = merged_df['party_change'].fillna(0)
# Calculate total number of years observed for each country
merged_df['years_observed'] = merged_df.groupby('country')['year'].transform('count')
# Debug: Print a sample of the DataFrame to check calculations
print(merged_df[['country', 'year', 'execrlc', 'party_change', 'years_observed']].head(20))
# Check for zero years_observed
print("Countries with zero years_observed:", merged_df[merged_df['years_observed'] == 0]['country'].unique())
# Calculate the sum of party changes for each country
party_change_sum = merged_df.groupby('country')['party_change'].sum()
# Debug: Print the party_change_sum
print("Party change sum:\n", party_change_sum)
# Normalize by years observed
consistency_scores = party_change_sum / merged_df.groupby('country')['years_observed'].mean()
# Debug: Print the raw consistency scores
print("Raw consistency scores:\n", consistency_scores)
# Invert the score for ranking (less change = more stable)
consistency_scores = 1 - consistency_scores
# Handle NaN or infinite values
consistency_scores.fillna(1, inplace=True)
consistency_scores.replace([float('inf'), -float('inf')], 1, inplace=True)
# Rank countries by consistency score
ranked_countries = consistency_scores.sort_values(ascending=False)
# Debug: Check the final ranked scores
print("Ranked consistency scores:\n", ranked_countries)
Unique countries: ['Canada' 'France' 'Germany' 'Italy' 'Japan' 'United Kingdom' 'United States'] country year execrlc party_change years_observed 0 Canada 1990 Right True 23 1 Canada 1991 Right False 23 2 Canada 1992 Right False 23 3 Canada 1993 Right False 23 4 Canada 1994 Left True 23 5 Canada 1995 Left False 23 6 Canada 1996 Left False 23 7 Canada 1997 Left False 23 8 Canada 1998 Left False 23 9 Canada 1999 Left False 23 10 Canada 2000 Left False 23 11 Canada 2001 Left False 23 12 Canada 2002 Left False 23 13 Canada 2003 Left False 23 14 Canada 2004 Left False 23 15 Canada 2005 Left False 23 16 Canada 2006 Left False 23 17 Canada 2007 Right True 23 18 Canada 2008 Right False 23 19 Canada 2009 Right False 23 Countries with zero years_observed: [] Party change sum: country Canada 3 France 4 Germany 3 Italy 7 Japan 3 United Kingdom 3 United States 4 Name: party_change, dtype: int64 Raw consistency scores: country Canada 0.130435 France 0.173913 Germany 0.130435 Italy 0.304348 Japan 0.130435 United Kingdom 0.130435 United States 0.173913 dtype: float64 Ranked consistency scores: country Canada 0.869565 Germany 0.869565 Japan 0.869565 United Kingdom 0.869565 France 0.826087 United States 0.826087 Italy 0.695652 dtype: float64
HDI Growth
Now that consistency score is calculated, let's make a percent change variable that we will call 'hdi_growth'. This is fairly simple. Using the built in functions, we can easily calculate this.
# Calculate the year-over-year HDI growth
merged_df['hdi_growth'] = merged_df.groupby('country')['hdi'].pct_change()
# Drop rows with NaN in 'hdi_growth'
merged_df.dropna(subset=['hdi_growth'], inplace=True)
print(merged_df["hdi_growth"])
1 0.004651 2 0.004630 3 -0.002304 4 0.006928 5 0.004587 ... 156 0.000000 157 0.002208 158 0.003304 159 0.002195 160 0.003286 Name: hdi_growth, Length: 154, dtype: float64
Now that all the set up is finished, let's create a preditive model that will reveal the relationship between HDI score and political party as well as our reformed hypothesis, HDI growth and political consistency. To do this, we are going to regress left and right political lean as well as the consistency score on the Human Development Index (HDI). Then, we will calculate our error rates.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np
# Convert political orientation into dummy variables
merged_df['left'] = (merged_df['execrlc'] == 'Left').astype(int)
merged_df['right'] = (merged_df['execrlc'] == 'Right').astype(int)
# Merge consistency scores into the main DataFrame
merged_df = merged_df.merge(consistency_scores.rename('consistency_scores'), on='country')
# Selecting the independent variables and the dependent variable
X = merged_df[['left', 'right', 'consistency_scores']]
y = merged_df['hdi']
# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Get coefficients
coefficients = model.coef_
# Print the coefficients
print("Coefficients:")
print("Left:", coefficients[0])
print("Right:", coefficients[1])
print("Consistency Score:", coefficients[2])
# Predicting HDI
y_pred = model.predict(X_test)
# Evaluating the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("\nMean Squared Error:", mse)
print("R^2 Score:", r2)
Coefficients: Left: 0.014484738623220484 Right: 0.011421588037068223 Consistency Score: 0.18920868733311447 Mean Squared Error: 0.0008372372789526706 R^2 Score: 0.2731238503348823
Coefficient Interpretations
Our multiple regression model has provided insightful results about the impact of political orientation and political stability on a country's Human Development Index (HDI). The coefficients below indicate the degree of influence each factor has on the HDI.
Left-Wing (Coefficient: 0.0145): This positive coefficient suggests that a shift towards left-wing orientation is associated with a slight increase in the HDI. Specifically, for each unit increase in the measure of left-wing orientation, we expect the HDI to increase by approximately 0.0145, assuming other factors remain constant. This could imply that policies typically associated with left-wing politics, such as social welfare or equality measures, might have a beneficial impact on human development.
Right-Wing (Coefficient: 0.0114): The positive coefficient here, though slightly smaller than that for left-wing, indicates that right-wing orientation also positively influences the HDI. A unit increase in right-wing orientation is associated with an increase of about 0.0114 in the HDI. This might reflect the potential benefits of policies used by right-wing parties, such as economic liberalization or conservative social reforms.
Consistency Score (Coefficient: 0.1892): The relatively large coefficient for the consistency score, which measures political stability, underscores its significant impact on HDI. An increase in political stability leads to a substantial increase in HDI, with a coefficient more than ten times larger than those for political orientation demonstrating the crucial role of stable governance in promoting human development.
Regarding the model's performance, the mean squared error (MSE) is 0.000837, indicating the average squared difference between the observed actual outcomes and the outcomes predicted by the model. A lower MSE suggests better model performance. The R^2 score of 0.2731, while not exceptionally high, indicates that about 27% of the variability in HDI can be explained by our model. This suggests that while political orientation and stability are important factors, other variables not included in the model also play a significant role in determining HDI.
# Create a scatter plot with a regression line
plt.figure(figsize=(12, 8))
# Jittering x-axis values for clarity
consistency_jittered = merged_df['consistency_scores'] + np.random.normal(0, 0.02, size=len(merged_df))
# Create a scatter plot
sns.scatterplot(x=consistency_jittered, y='hdi', data=merged_df, alpha=0.6, hue='execrlc', palette=['red', 'blue', 'green'])
# Add a regression line (without jitter)
sns.regplot(x='consistency_scores', y='hdi', data=merged_df, scatter=False, color='black', truncate=False, line_kws={'linewidth': 4})
plt.title('Effect of Political Consistency on HDI Score')
plt.xlabel('Political Consistency Score')
plt.ylabel('HDI Score')
plt.legend(title='Political Orientation')
plt.show()
Looking at the graph, there is a clear direct relationship between consistency score and the raw HDI score with a very diverse mix of poltical orientation or lean. Noticeably, the political party itself is not an indicator of HDI score.Additionally, there are very few center leaning points. This is due to center oriented political parties not being elected as oftern.
Let's run the same regression but use HDI growth as our y variable:
# Selecting the independent variables and the dependent variable
X = merged_df[['left', 'right', 'consistency_scores']]
y = merged_df['hdi_growth']
# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Get coefficients
coefficients = model.coef_
# Print the coefficients
print("Coefficients:")
print("Left:", coefficients[0])
print("Right:", coefficients[1])
print("Consistency Score:", coefficients[2])
# Predicting HDI
y_pred = model.predict(X_test)
# Evaluating the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("\nMean Squared Error:", mse)
print("R^2 Score:", r2)
Coefficients: Left: -0.00296619483394267 Right: -0.0035134149670588465 Consistency Score: -0.002339420609040174 Mean Squared Error: 1.0226723596710013e-05 R^2 Score: -0.007974914669787836
Coefficient Interpretations
What jumps out immediately is that almost all values are negative, except for MSE. Though negative, consistency score still manages to be the least negative, comparatively to left and right.
Addressing it's predictive capability, this model is useless. Though it has very low MSE, it's R-squared value is negative, meaning it's predictive capabilites are mute.
# Create a scatter plot with a regression line
plt.figure(figsize=(12, 8))
# Jittering x-axis values for clarity
consistency_jittered = merged_df['consistency_scores'] + np.random.normal(0, 0.02, size=len(merged_df))
# Create a scatter plot
sns.scatterplot(x=consistency_jittered, y='hdi_growth', data=merged_df, alpha=0.6, hue='execrlc', palette=['red', 'blue', 'green'])
# Add a regression line (without jitter)
sns.regplot(x='consistency_scores', y='hdi_growth', data=merged_df, scatter=False, color='black', truncate=False, line_kws={'linewidth': 4})
plt.title('Effect of Political Consistency on HDI Growth')
plt.xlabel('Political Consistency Score')
plt.ylabel('HDI Growth')
plt.legend(title='Political Orientation')
plt.show()
Using HDI growth, there is a slight inverse relationship between. Compared to the raw HDI score model, the same remains true about the political orientation trends, the slope is much flatter. These trends are mute becuase the predictive capabilities are not functional.
Based on our findings, our inital hypothesis had little to do with HDI score. Instead, a separately created variable that measures political consistency has more to do with HDI score than the political lean. We found that the longer the same political oriented party was in power, the higher the raw HDI score was. When we ran a second model that used HDI growth, in percent change, the opposite was observed, but the predictive capabilites of the model were insignificant. An attempt to compare the variables 'gov1rlc' (largest political party in power) and 'execrlc' was made, but they had the same political lean for each entry.
Who is better for development? It doesn't matter who; it matters the duration. This finding is only applicable to the G7 countries in the late 20th century to early 20th century with well established democracies.
Conversion to HTML:
!jupyter nbconvert --to html 'Politics on Development.ipynb' #convert to HTML
[NbConvertApp] Converting notebook Final (3).ipynb to html [NbConvertApp] Writing 1616434 bytes to Final (3).html