2017-08-20 13:45:00+02:00
Why do people prefer steak cooked differently?
I stumbled across this dataset while looking through some of the datasets on FiveThirtyEight's github page. The data comes from a survey of 550 people that evaluates what types of risky behavior they engage in as well as how they prefer their steak cooked. There is an article on the results of the survey along with a link to the data here: https://fivethirtyeight.com/datalab/how-americans-like-their-steak/
# Import Useful libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# Put the data into a pandas DataFrame
sns.set_style('whitegrid')
steak_df = pd.read_csv('../data/fivethirtyeight-data/steak-survey/steak-risk-survey.csv')
steak_df.head()
# The column names will be replaced with shorter ones
steak_df.columns = ['ID', 'Lottery A/B', 'Smoke','Drink','Gamble','Skydived','DriveOverSpeedLimit','CheatedSO',
'EatSteak','SteakPrefer','Gender','Age','Income','Education','Location']
# The first row does not represent a participant in the survey so we will drop it
steak_df.drop(0,inplace=True)
First, I wanted to see how the particpants wanted their steak cooked in terms of total particpants.
# Made a new dataframe that will be used to create a bar chart
prefer_pct = (steak_df['SteakPrefer'].value_counts() / steak_df['SteakPrefer'].dropna().count() * 100).astype(int)
prefer_pct = prefer_pct.to_frame('Prepared').reset_index()
prefer_pct.columns = ['Prepared', 'Percent Preferred']
#Let's plot our data frame
fig = plt.figure(figsize=(6,3))
ax = sns.barplot(y='Prepared',x='Percent Preferred', data=prefer_pct,palette='OrRd')
ax.set(xlabel = 'Percent of Total',)
ax.set_title('How do you prefer your steak? (431 Responses)')
# This adds the number assigned to each bar in the plot
for i, v in enumerate(prefer_pct['Percent Preferred'].values):
ax.text(v + 0.5, i +.10, str(v))
As we can see, most people prefer their steak to be between medium and medium rare. Steak that is cooked rare seems to be the least preferred option.
Now let's look at the different types of risky behavior.
# We want to convert all the variables with Yes/No answers into 1/0 so we can do some manipulation with it
def convert_to_num(val):
if val == 'Yes':
return 1
else:
return 0
for i in steak_df.drop(['ID','Lottery A/B','Age','SteakPrefer','Income','Education','Location'],axis=1).columns:
steak_df[i] = steak_df[i].apply(convert_to_num)
print("% Engage in Risky Behavior by Steak Preference")
(steak_df.groupby('SteakPrefer').mean().drop(['ID','EatSteak','Gender'],axis=1) * 100).astype(int)
This table can be read as such: 'Of those who prefer their steak cooked Medium, 15% smoke, 82% drink, etc.' It seems as though those who prefer their steak cooked Rare are more likely to be enganged in risky behavior. However, when we run the line below we find that only 23 people preferred rare. Their group is more likely to be biased by such a small sample. Interestingly enough, we find that a smaller percent of those who eat steak Medium Rare have cheated on a significant other before.
steak_df.groupby('SteakPrefer')['ID'].count()
Now I want to look at demographic information with the use of crosstabs to see if anything sticks out.
# AGE
(pd.crosstab(steak_df['SteakPrefer'],steak_df['Age'],margins=True,normalize=True) * 100).astype(int)
# GENDER
(pd.crosstab(steak_df['SteakPrefer'],steak_df['Gender'],margins=True,normalize=True) * 100).astype(int)
# INCOME
(pd.crosstab(steak_df['SteakPrefer'],steak_df['Income'],margins=True,normalize=True) * 100).astype(int)
# LOCATION
(pd.crosstab(steak_df['SteakPrefer'],steak_df['Location'],margins=True,normalize=True) * 100).astype(int)
There isn't much of a relationship that can be teased out when simply looking at these crosstabs between demographic factor and steak preference.
Lastly, let's look at the results from the hypothetical lottery question. The question is this:
"Consider the following hypothetical situations: In Lottery A, you have a 50% chance of success, with a payout of 100. In Lottery B, you have a 90% chance of success, with a payout of 20. Assuming you have $10 to bet, would you play Lottery A or Lottery B?"
Picking Lottery A is considered to be the more risky behavior.
(pd.crosstab(steak_df['SteakPrefer'],steak_df['Lottery A/B'],margins=True,normalize=True) * 100).astype(int)
Nada, again nothing seems to stick out as meaningful here. For now, it seems we don't know why people prefer their steaks cooked differently.