Skip to content

Commit 6daf793

Browse files
committed
[inequality] Update exercise 3
Hi Matt @mmcky , I have updated the exercise 3 of the inequality lecture using your code in #410 and add the simulation part below your solution. What do you think about this version of the solution? Best, Longye
1 parent a291267 commit 6daf793

File tree

1 file changed

+86
-0
lines changed

1 file changed

+86
-0
lines changed

lectures/inequality.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1093,3 +1093,89 @@ plt.show()
10931093

10941094
```{solution-end}
10951095
```
1096+
1097+
```{exercise}
1098+
:label: inequality_ex3
1099+
1100+
The {ref}`code to compute the Gini coefficient is listed in the lecture above <code:gini-coefficient>`.
1101+
1102+
This code uses loops to calculate the coefficient based on income or wealth data.
1103+
1104+
This function can be re-written using vectorization which will greatly improve the computational efficiency when using `python`.
1105+
1106+
Re-write the function `gini_coefficient` using `numpy` and vectorized code.
1107+
1108+
You can compare the output of this new function with the one above, and note the speed differences.
1109+
```
1110+
1111+
```{solution-start} inequality_ex3
1112+
:class: dropdown
1113+
```
1114+
1115+
Let's take a look at some raw data for the US that is stored in `df_income_wealth`
1116+
1117+
```{code-cell} ipython3
1118+
df_income_wealth.describe()
1119+
```
1120+
1121+
```{code-cell} ipython3
1122+
df_income_wealth.head(n=4)
1123+
```
1124+
1125+
We will focus on wealth variable `n_wealth` to compute a Gini coefficient for the year 1990.
1126+
1127+
```{code-cell} ipython3
1128+
data = df_income_wealth[df_income_wealth.year == 2016]
1129+
```
1130+
1131+
```{code-cell} ipython3
1132+
data.head(n=2)
1133+
```
1134+
1135+
We can first compute the Gini coefficient using the function defined in the lecture above.
1136+
1137+
```{code-cell} ipython3
1138+
gini_coefficient(data.n_wealth.values)
1139+
```
1140+
1141+
Now we can write a vectorized version using `numpy`
1142+
1143+
```{code-cell} ipython3
1144+
def gini(y):
1145+
n = len(y)
1146+
y_1 = np.reshape(y, (n, 1))
1147+
y_2 = np.reshape(y, (1, n))
1148+
g_sum = np.sum(np.abs(y_1 - y_2))
1149+
return g_sum / (2 * n * np.sum(y))
1150+
```
1151+
```{code-cell} ipython3
1152+
gini(data.n_wealth.values)
1153+
```
1154+
Let's simulate five populations by drawing from a lognormal distribution as before
1155+
1156+
```{code-cell} ipython3
1157+
k = 5
1158+
σ_vals = np.linspace(0.2, 4, k)
1159+
n = 2_000
1160+
σ_vals = σ_vals.reshape((k,1))
1161+
μ_vals = -σ_vals**2/2
1162+
y_vals = np.exp(μ_vals + σ_vals*np.random.randn(n))
1163+
```
1164+
We can compute the Gini coefficient for these five populations using the vectorized function as follows,
1165+
1166+
```{code-cell} ipython3
1167+
gini_coefficients =[]
1168+
for i in range(k):
1169+
gini_coefficients.append(gini(simulated_data[i]))
1170+
```
1171+
1172+
This gives us the Gini coefficients for these five households.
1173+
1174+
```{code-cell} ipython3
1175+
gini_coefficients
1176+
```
1177+
```{solution-end}
1178+
```
1179+
1180+
1181+

0 commit comments

Comments
 (0)