Skip to content

Commit 2b7dd96

Browse files
longye-tianmmcky
andauthored
[inequality] Update exercise 3 (#498)
* [inequality] Update exercise 3 Hi Matt @mmcky , I have updated the exercise 3 of the inequality lecture using your code in #410 and add the simulation part below your solution. What do you think about this version of the solution? Best, Longye * Update inequality.md Hi Matt, I have updated the solution and in the main text by adding ` %%time`. What do you think about this comparison? * Update inequality.md add labels to the main text gini coefficient code. * Update inequality.md * add data.ipynb and delete to csv Hi Matt, I have added the data.ipynb to the folder and I think it contains sufficient code to save the data. I have also modified the contain to deal with the saving and call issues related to the csv. What do you think about these changes? Best, Longye * remove skip-execution code as it is not compatible with google collab * test the problem This commit is to test whether the problem is due to this code. * Revert "test the problem" This reverts commit 395657e. * test google colab RAM this commit is to test whether the crash is led by the * change link to notebook on github * update_inequality_exercise Hi Matt, This commit select 3000 random sample from the original dataset. Best, Longye * update year in the text update year in the text --------- Co-authored-by: mmcky <[email protected]>
1 parent fa99488 commit 2b7dd96

File tree

3 files changed

+247
-68
lines changed

3 files changed

+247
-68
lines changed
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": 1,
6+
"id": "258b4bc9-2964-470a-8010-05c2162f5e05",
7+
"metadata": {},
8+
"outputs": [
9+
{
10+
"name": "stdout",
11+
"output_type": "stream",
12+
"text": [
13+
"Requirement already satisfied: wbgapi in /Users/longye/anaconda3/lib/python3.10/site-packages (1.0.12)\n",
14+
"Requirement already satisfied: plotly in /Users/longye/anaconda3/lib/python3.10/site-packages (5.22.0)\n",
15+
"Requirement already satisfied: requests in /Users/longye/anaconda3/lib/python3.10/site-packages (from wbgapi) (2.31.0)\n",
16+
"Requirement already satisfied: tabulate in /Users/longye/anaconda3/lib/python3.10/site-packages (from wbgapi) (0.9.0)\n",
17+
"Requirement already satisfied: PyYAML in /Users/longye/anaconda3/lib/python3.10/site-packages (from wbgapi) (6.0)\n",
18+
"Requirement already satisfied: tenacity>=6.2.0 in /Users/longye/anaconda3/lib/python3.10/site-packages (from plotly) (8.4.1)\n",
19+
"Requirement already satisfied: packaging in /Users/longye/anaconda3/lib/python3.10/site-packages (from plotly) (23.1)\n",
20+
"Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/longye/anaconda3/lib/python3.10/site-packages (from requests->wbgapi) (1.26.16)\n",
21+
"Requirement already satisfied: charset-normalizer<4,>=2 in /Users/longye/anaconda3/lib/python3.10/site-packages (from requests->wbgapi) (2.0.4)\n",
22+
"Requirement already satisfied: idna<4,>=2.5 in /Users/longye/anaconda3/lib/python3.10/site-packages (from requests->wbgapi) (3.4)\n",
23+
"Requirement already satisfied: certifi>=2017.4.17 in /Users/longye/anaconda3/lib/python3.10/site-packages (from requests->wbgapi) (2024.6.2)\n"
24+
]
25+
}
26+
],
27+
"source": [
28+
"!pip install wbgapi plotly\n",
29+
"\n",
30+
"import pandas as pd\n",
31+
"import numpy as np\n",
32+
"import matplotlib.pyplot as plt\n",
33+
"import random as rd\n",
34+
"import wbgapi as wb\n",
35+
"import plotly.express as px\n",
36+
"\n",
37+
"url = 'https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/SCF_plus/SCF_plus_mini.csv'\n",
38+
"df = pd.read_csv(url)\n",
39+
"df_income_wealth = df.dropna()"
40+
]
41+
},
42+
{
43+
"cell_type": "code",
44+
"execution_count": 4,
45+
"id": "9630a07a-fce5-474e-92af-104e67e82be5",
46+
"metadata": {},
47+
"outputs": [
48+
{
49+
"name": "stdout",
50+
"output_type": "stream",
51+
"text": [
52+
"Requirement already satisfied: quantecon in /Users/longye/anaconda3/lib/python3.10/site-packages (0.7.1)\n",
53+
"Requirement already satisfied: requests in /Users/longye/anaconda3/lib/python3.10/site-packages (from quantecon) (2.31.0)\n",
54+
"Requirement already satisfied: numpy>=1.17.0 in /Users/longye/anaconda3/lib/python3.10/site-packages (from quantecon) (1.26.3)\n",
55+
"Requirement already satisfied: numba>=0.49.0 in /Users/longye/anaconda3/lib/python3.10/site-packages (from quantecon) (0.59.1)\n",
56+
"Requirement already satisfied: sympy in /Users/longye/anaconda3/lib/python3.10/site-packages (from quantecon) (1.12)\n",
57+
"Requirement already satisfied: scipy>=1.5.0 in /Users/longye/anaconda3/lib/python3.10/site-packages (from quantecon) (1.12.0)\n",
58+
"Requirement already satisfied: llvmlite<0.43,>=0.42.0dev0 in /Users/longye/anaconda3/lib/python3.10/site-packages (from numba>=0.49.0->quantecon) (0.42.0)\n",
59+
"Requirement already satisfied: certifi>=2017.4.17 in /Users/longye/anaconda3/lib/python3.10/site-packages (from requests->quantecon) (2024.6.2)\n",
60+
"Requirement already satisfied: idna<4,>=2.5 in /Users/longye/anaconda3/lib/python3.10/site-packages (from requests->quantecon) (3.4)\n",
61+
"Requirement already satisfied: charset-normalizer<4,>=2 in /Users/longye/anaconda3/lib/python3.10/site-packages (from requests->quantecon) (2.0.4)\n",
62+
"Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/longye/anaconda3/lib/python3.10/site-packages (from requests->quantecon) (1.26.16)\n",
63+
"Requirement already satisfied: mpmath>=0.19 in /Users/longye/anaconda3/lib/python3.10/site-packages (from sympy->quantecon) (1.3.0)\n"
64+
]
65+
}
66+
],
67+
"source": [
68+
"!pip install quantecon\n",
69+
"import quantecon as qe\n",
70+
"\n",
71+
"varlist = ['n_wealth', # net wealth \n",
72+
" 't_income', # total income\n",
73+
" 'l_income'] # labor income\n",
74+
"\n",
75+
"df = df_income_wealth\n",
76+
"years = df.year.unique()\n",
77+
"\n",
78+
"# create lists to store Gini for each inequality measure\n",
79+
"results = {}\n",
80+
"\n",
81+
"for var in varlist:\n",
82+
" # create lists to store Gini\n",
83+
" gini_yr = []\n",
84+
" for year in years:\n",
85+
" # repeat the observations according to their weights\n",
86+
" counts = list(round(df[df['year'] == year]['weights'] ))\n",
87+
" y = df[df['year'] == year][var].repeat(counts)\n",
88+
" y = np.asarray(y)\n",
89+
" \n",
90+
" rd.shuffle(y) # shuffle the sequence\n",
91+
" \n",
92+
" # calculate and store Gini\n",
93+
" gini = qe.gini_coefficient(y)\n",
94+
" gini_yr.append(gini)\n",
95+
" \n",
96+
" results[var] = gini_yr\n",
97+
"\n",
98+
"# Convert to DataFrame\n",
99+
"results = pd.DataFrame(results, index=years)\n",
100+
"results.to_csv(\"usa-gini-nwealth-tincome-lincome.csv\", index_label='year')"
101+
]
102+
},
103+
{
104+
"cell_type": "code",
105+
"execution_count": null,
106+
"id": "d59e876b-2f77-4fa7-b79a-8e455ad82d43",
107+
"metadata": {},
108+
"outputs": [],
109+
"source": []
110+
}
111+
],
112+
"metadata": {
113+
"kernelspec": {
114+
"display_name": "Python 3 (ipykernel)",
115+
"language": "python",
116+
"name": "python3"
117+
},
118+
"language_info": {
119+
"codemirror_mode": {
120+
"name": "ipython",
121+
"version": 3
122+
},
123+
"file_extension": ".py",
124+
"mimetype": "text/x-python",
125+
"name": "python",
126+
"nbconvert_exporter": "python",
127+
"pygments_lexer": "ipython3",
128+
"version": "3.10.12"
129+
}
130+
},
131+
"nbformat": 4,
132+
"nbformat_minor": 5
133+
}
Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,21 @@
11
year,n_wealth,t_income,l_income
2-
1950,0.8257332034366338,0.44248654139458626,0.5342948198773412
3-
1953,0.8059487586599329,0.4264544060935945,0.5158978980963702
4-
1956,0.8121790488050616,0.44426942873399283,0.5349293526208142
5-
1959,0.795206874163792,0.43749348077061573,0.5213985948309416
6-
1962,0.8086945076579359,0.4435843103853645,0.5345127915054341
7-
1965,0.7904149225687935,0.43763715466663444,0.7487860020887753
8-
1968,0.7982885066993497,0.4208620794438902,0.5242396427381545
9-
1971,0.7911574835420259,0.4233344246090255,0.5576454812313466
10-
1977,0.7571418922185215,0.46187678800902543,0.5704448110072049
11-
1983,0.7494335400643013,0.439345618464469,0.5662220844385915
12-
1989,0.7715705301674302,0.5115249581654197,0.601399568747142
13-
1992,0.7508126614055308,0.4740650672076798,0.5983592657979563
14-
1995,0.7569492388110265,0.48965523558400603,0.5969779516716903
15-
1998,0.7603291991801185,0.49117441585168614,0.5774462841723305
16-
2001,0.7816118750507056,0.5239092994681135,0.6042739644967272
17-
2004,0.7700355469522361,0.4884350383903255,0.5981432201792727
18-
2007,0.7821413776486978,0.5197156312086187,0.626345219575322
19-
2010,0.8250825295193438,0.5195972120145615,0.6453653328291903
20-
2013,0.8227698931835303,0.531400174984336,0.6498682917772644
21-
2016,0.8342975903562234,0.5541400068900825,0.6706846793375284
2+
1950,0.8257332034366366,0.44248654139458743,0.534294819877344
3+
1953,0.805948758659935,0.4264544060935942,0.5158978980963682
4+
1956,0.8121790488050612,0.44426942873399367,0.5349293526208106
5+
1959,0.7952068741637912,0.43749348077061534,0.5213985948309414
6+
1962,0.8086945076579386,0.44358431038536356,0.5345127915054446
7+
1965,0.7904149225687949,0.4376371546666344,0.7487860020887701
8+
1968,0.7982885066993503,0.4208620794438885,0.5242396427381534
9+
1971,0.7911574835420282,0.4233344246090255,0.5576454812313462
10+
1977,0.7571418922185215,0.46187678800902554,0.57044481100722
11+
1983,0.749433540064301,0.4393456184644682,0.5662220844385925
12+
1989,0.7715705301674285,0.5115249581654115,0.6013995687471289
13+
1992,0.7508126614055305,0.4740650672076754,0.5983592657979544
14+
1995,0.7569492388110274,0.4896552355840001,0.5969779516717039
15+
1998,0.7603291991801172,0.49117441585168525,0.5774462841723346
16+
2001,0.781611875050703,0.523909299468113,0.6042739644967232
17+
2004,0.7700355469522372,0.48843503839032354,0.5981432201792916
18+
2007,0.782141377648698,0.5197156312086207,0.6263452195753227
19+
2010,0.825082529519342,0.5195972120145641,0.6453653328291843
20+
2013,0.8227698931835299,0.5314001749843426,0.6498682917772886
21+
2016,0.8342975903562537,0.55414000689009,0.6706846793375292

lectures/inequality.md

Lines changed: 94 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -247,7 +247,7 @@ The following code block imports a subset of the dataset `SCF_plus` for 2016,
247247
which is derived from the [Survey of Consumer Finances](https://en.wikipedia.org/wiki/Survey_of_Consumer_Finances) (SCF).
248248

249249
```{code-cell} ipython3
250-
url = 'https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/SCF_plus/SCF_plus_mini.csv'
250+
url = 'https://github.com/QuantEcon/high_dim_data/raw/main/SCF_plus/SCF_plus_mini.csv'
251251
df = pd.read_csv(url)
252252
df_income_wealth = df.dropna()
253253
```
@@ -435,6 +435,8 @@ Let's examine the Gini coefficient in some simulations.
435435

436436
The code below computes the Gini coefficient from a sample.
437437

438+
(code:gini-coefficient)=
439+
438440
```{code-cell} ipython3
439441
440442
def gini_coefficient(y):
@@ -481,6 +483,7 @@ You can check this by looking up the expression for the mean of a lognormal
481483
distribution.
482484

483485
```{code-cell} ipython3
486+
%%time
484487
k = 5
485488
σ_vals = np.linspace(0.2, 4, k)
486489
n = 2_000
@@ -616,51 +619,11 @@ We will use US data from the {ref}`Survey of Consumer Finances<data:survey-consu
616619
df_income_wealth.year.describe()
617620
```
618621

619-
This code can be used to compute this information over the full dataset.
622+
[This notebook](https://github.com/QuantEcon/lecture-python-intro/tree/main/lectures/_static/lecture_specific/inequality/data.ipynb) can be used to compute this information over the full dataset.
620623

621624
```{code-cell} ipython3
622-
:tags: [skip-execution, hide-input, hide-output]
623-
624-
!pip install quantecon
625-
import quantecon as qe
626-
627-
varlist = ['n_wealth', # net wealth
628-
't_income', # total income
629-
'l_income'] # labor income
630-
631-
df = df_income_wealth
632-
633-
# create lists to store Gini for each inequality measure
634-
results = {}
635-
636-
for var in varlist:
637-
# create lists to store Gini
638-
gini_yr = []
639-
for year in years:
640-
# repeat the observations according to their weights
641-
counts = list(round(df[df['year'] == year]['weights'] ))
642-
y = df[df['year'] == year][var].repeat(counts)
643-
y = np.asarray(y)
644-
645-
rd.shuffle(y) # shuffle the sequence
646-
647-
# calculate and store Gini
648-
gini = qe.gini_coefficient(y)
649-
gini_yr.append(gini)
650-
651-
results[var] = gini_yr
652-
653-
# Convert to DataFrame
654-
results = pd.DataFrame(results, index=years)
655-
results.to_csv("_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv", index_label='year')
656-
```
657-
658-
However, to speed up execution we will import a pre-computed dataset from the lecture repository.
659-
660-
<!-- TODO: update from csv to github location -->
661-
662-
```{code-cell} ipython3
663-
ginis = pd.read_csv("_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv", index_col='year')
625+
data_url = 'https://github.com/QuantEcon/lecture-python-intro/raw/main/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv'
626+
ginis = pd.read_csv(data_url, index_col='year')
664627
ginis.head(n=5)
665628
```
666629

@@ -687,10 +650,6 @@ One possibility is that this change is mainly driven by technology.
687650

688651
However, we will see below that not all advanced economies experienced similar growth of inequality.
689652

690-
691-
692-
693-
694653
### Cross-country comparisons of income inequality
695654

696655
Earlier in this lecture we used `wbgapi` to get Gini data across many countries
@@ -1093,3 +1052,90 @@ plt.show()
10931052

10941053
```{solution-end}
10951054
```
1055+
1056+
```{exercise}
1057+
:label: inequality_ex3
1058+
1059+
The {ref}`code to compute the Gini coefficient is listed in the lecture above <code:gini-coefficient>`.
1060+
1061+
This code uses loops to calculate the coefficient based on income or wealth data.
1062+
1063+
This function can be re-written using vectorization which will greatly improve the computational efficiency when using `python`.
1064+
1065+
Re-write the function `gini_coefficient` using `numpy` and vectorized code.
1066+
1067+
You can compare the output of this new function with the one above, and note the speed differences.
1068+
```
1069+
1070+
```{solution-start} inequality_ex3
1071+
:class: dropdown
1072+
```
1073+
1074+
Let's take a look at some raw data for the US that is stored in `df_income_wealth`
1075+
1076+
```{code-cell} ipython3
1077+
df_income_wealth.describe()
1078+
```
1079+
1080+
```{code-cell} ipython3
1081+
df_income_wealth.head(n=4)
1082+
```
1083+
1084+
We will focus on wealth variable `n_wealth` to compute a Gini coefficient for the year 2016.
1085+
1086+
```{code-cell} ipython3
1087+
data = df_income_wealth[df_income_wealth.year == 2016].sample(3000, random_state=1)
1088+
```
1089+
1090+
```{code-cell} ipython3
1091+
data.head(n=2)
1092+
```
1093+
1094+
We can first compute the Gini coefficient using the function defined in the lecture above.
1095+
1096+
```{code-cell} ipython3
1097+
gini_coefficient(data.n_wealth.values)
1098+
```
1099+
1100+
Now we can write a vectorized version using `numpy`
1101+
1102+
```{code-cell} ipython3
1103+
def gini(y):
1104+
n = len(y)
1105+
y_1 = np.reshape(y, (n, 1))
1106+
y_2 = np.reshape(y, (1, n))
1107+
g_sum = np.sum(np.abs(y_1 - y_2))
1108+
return g_sum / (2 * n * np.sum(y))
1109+
```
1110+
```{code-cell} ipython3
1111+
gini(data.n_wealth.values)
1112+
```
1113+
Let's simulate five populations by drawing from a lognormal distribution as before
1114+
1115+
```{code-cell} ipython3
1116+
k = 5
1117+
σ_vals = np.linspace(0.2, 4, k)
1118+
n = 2_000
1119+
σ_vals = σ_vals.reshape((k,1))
1120+
μ_vals = -σ_vals**2/2
1121+
y_vals = np.exp(μ_vals + σ_vals*np.random.randn(n))
1122+
```
1123+
We can compute the Gini coefficient for these five populations using the vectorized function, the computation time is shown below:
1124+
1125+
```{code-cell} ipython3
1126+
%%time
1127+
gini_coefficients =[]
1128+
for i in range(k):
1129+
gini_coefficients.append(gini(y_vals[i]))
1130+
```
1131+
This shows the vectorized function is much faster.
1132+
This gives us the Gini coefficients for these five households.
1133+
1134+
```{code-cell} ipython3
1135+
gini_coefficients
1136+
```
1137+
```{solution-end}
1138+
```
1139+
1140+
1141+

0 commit comments

Comments
 (0)