Machine Learning Model Training
Timestamp | Age | Gender | Country | state | self_employed | family_history | treatment | work_interfere | no_employees | ... | leave | mental_health_consequence | phys_health_consequence | coworkers | supervisor | mental_health_interview | phys_health_interview | mental_vs_physical | obs_consequence | comments | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2014-08-27 11:29:31 | 37 | Female | United States | IL | NaN | No | Yes | Often | 6-25 | ... | Somewhat easy | No | No | Some of them | Yes | No | Maybe | Yes | No | NaN |
1 | 2014-08-27 11:29:37 | 44 | M | United States | IN | NaN | No | No | Rarely | More than 1000 | ... | Don't know | Maybe | No | No | No | No | No | Don't know | No | NaN |
2 | 2014-08-27 11:29:44 | 32 | Male | Canada | NaN | NaN | No | No | Rarely | 6-25 | ... | Somewhat difficult | No | No | Yes | Yes | Yes | Yes | No | No | NaN |
3 | 2014-08-27 11:29:46 | 31 | Male | United Kingdom | NaN | NaN | Yes | Yes | Often | 26-100 | ... | Somewhat difficult | Yes | Yes | Some of them | No | Maybe | Maybe | No | Yes | NaN |
4 | 2014-08-27 11:30:22 | 31 | Male | United States | TX | NaN | No | No | Never | 100-500 | ... | Don't know | No | No | Some of them | Yes | Yes | Yes | Don't know | No | NaN |
5 rows × 27 columns
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1259 entries, 0 to 1258
Data columns (total 27 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Timestamp 1259 non-null object
1 Age 1259 non-null int64
2 Gender 1259 non-null object
3 Country 1259 non-null object
4 state 744 non-null object
5 self_employed 1241 non-null object
6 family_history 1259 non-null object
7 treatment 1259 non-null object
8 work_interfere 995 non-null object
9 no_employees 1259 non-null object
10 remote_work 1259 non-null object
11 tech_company 1259 non-null object
12 benefits 1259 non-null object
13 care_options 1259 non-null object
14 wellness_program 1259 non-null object
15 seek_help 1259 non-null object
16 anonymity 1259 non-null object
17 leave 1259 non-null object
18 mental_health_consequence 1259 non-null object
19 phys_health_consequence 1259 non-null object
20 coworkers 1259 non-null object
21 supervisor 1259 non-null object
22 mental_health_interview 1259 non-null object
23 phys_health_interview 1259 non-null object
24 mental_vs_physical 1259 non-null object
25 obs_consequence 1259 non-null object
26 comments 164 non-null object
dtypes: int64(1), object(26)
memory usage: 265.7+ KB
comments 1095
state 515
work_interfere 264
self_employed 18
seek_help 0
obs_consequence 0
mental_vs_physical 0
phys_health_interview 0
mental_health_interview 0
supervisor 0
coworkers 0
phys_health_consequence 0
mental_health_consequence 0
leave 0
anonymity 0
Timestamp 0
wellness_program 0
Age 0
benefits 0
tech_company 0
remote_work 0
no_employees 0
treatment 0
family_history 0
Country 0
Gender 0
care_options 0
dtype: int64
<AxesSubplot:>
['31-45', '21-30', '46-55', '0-20', '55-100']
Categories (5, object): ['0-20' < '21-30' < '31-45' < '46-55' < '55-100']
['No' 'Yes']
['Often' 'Rarely' 'Never' 'Sometimes' "Don't know"]
['female' 'male' 'trans']
Gender.............................................................................................................
male -- 991
female -- 247
trans -- 19
self_employed.............................................................................................................
No -- 1113
Yes -- 144
family_history.............................................................................................................
No -- 767
Yes -- 490
treatment.............................................................................................................
Yes -- 635
No -- 622
work_interfere.............................................................................................................
Sometimes -- 465
Don't know -- 264
Never -- 213
Rarely -- 173
Often -- 142
no_employees.............................................................................................................
6-25 -- 290
26-100 -- 289
More than 1000 -- 282
100-500 -- 176
1-5 -- 160
500-1000 -- 60
remote_work.............................................................................................................
No -- 883
Yes -- 374
tech_company.............................................................................................................
Yes -- 1029
No -- 228
benefits.............................................................................................................
Yes -- 475
Don't know -- 408
No -- 374
care_options.............................................................................................................
No -- 501
Yes -- 442
Not sure -- 314
wellness_program.............................................................................................................
No -- 842
Yes -- 227
Don't know -- 188
seek_help.............................................................................................................
No -- 646
Don't know -- 363
Yes -- 248
anonymity.............................................................................................................
Don't know -- 819
Yes -- 373
No -- 65
leave.............................................................................................................
Don't know -- 563
Somewhat easy -- 266
Very easy -- 204
Somewhat difficult -- 126
Very difficult -- 98
mental_health_consequence.............................................................................................................
No -- 490
Maybe -- 477
Yes -- 290
phys_health_consequence.............................................................................................................
No -- 925
Maybe -- 273
Yes -- 59
coworkers.............................................................................................................
Some of them -- 774
No -- 260
Yes -- 223
supervisor.............................................................................................................
Yes -- 514
No -- 393
Some of them -- 350
mental_health_interview.............................................................................................................
No -- 1008
Maybe -- 207
Yes -- 42
phys_health_interview.............................................................................................................
Maybe -- 557
No -- 500
Yes -- 200
mental_vs_physical.............................................................................................................
Don't know -- 576
Yes -- 341
No -- 340
obs_consequence.............................................................................................................
No -- 1075
Yes -- 182
Age_Group.............................................................................................................
31-45 -- 622
21-30 -- 557
46-55 -- 42
0-20 -- 22
55-100 -- 14
Gender | self_employed | family_history | treatment | work_interfere | no_employees | remote_work | tech_company | benefits | care_options | ... | mental_health_consequence | phys_health_consequence | coworkers | supervisor | mental_health_interview | phys_health_interview | mental_vs_physical | obs_consequence | comments | Age_Group | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | female | No | No | Yes | Often | 6-25 | No | Yes | Yes | Not sure | ... | No | No | Some of them | Yes | No | Maybe | Yes | No | NaN | 31-45 |
1 | male | No | No | No | Rarely | More than 1000 | No | No | Don't know | No | ... | Maybe | No | No | No | No | No | Don't know | No | NaN | 31-45 |
2 | male | No | No | No | Rarely | 6-25 | No | Yes | No | No | ... | No | No | Yes | Yes | Yes | Yes | No | No | NaN | 31-45 |
3 | male | No | Yes | Yes | Often | 26-100 | No | Yes | No | Yes | ... | Yes | Yes | Some of them | No | Maybe | Maybe | No | Yes | NaN | 31-45 |
4 | male | No | No | No | Never | 100-500 | Yes | Yes | Yes | No | ... | No | No | Some of them | Yes | Yes | Yes | Don't know | No | NaN | 31-45 |
5 rows × 24 columns
((1257, 23), (1257,))
GridSearchCV(cv=5, estimator=LogisticRegression(),
param_grid={'C': [0.1, 2, 5, 10, 15, 20],
'max_iter': [100, 200, 300], 'penalty': ['l1', 'l2']},
scoring='f1')
{'C': 2, 'max_iter': 300, 'penalty': 'l2'}
Accuracy: 79.04761904761905
GridSearchCV(cv=5, estimator=DecisionTreeClassifier(),
param_grid={'criterion': ['gini', 'entropy'],
'max_depth': [None, 5, 10, 15],
'min_samples_leaf': [1, 2, 3],
'min_samples_split': [2, 5, 10]})
{'criterion': 'entropy', 'max_depth': 5, 'min_samples_leaf': 1, 'min_samples_split': 2}
Accuracy: 80.63492063492063
GridSearchCV(cv=5, estimator=RandomForestClassifier(),
param_grid={'max_depth': [None, 5, 10, 15, 20],
'min_samples_leaf': [1, 2, 3],
'min_samples_split': [2, 5, 10],
'n_estimators': [100, 200]})
{'max_depth': 5, 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}
Accuracy: 80.63492063492063
GridSearchCV(cv=5, estimator=SVC(),
param_grid={'C': [0.1, 1, 10], 'gamma': ['scale', 'auto'],
'kernel': ['linear', 'rbf', 'sigmoid']})
{'C': 1, 'gamma': 'scale', 'kernel': 'linear'}
Accuracy : 80.0
GridSearchCV(cv=5, estimator=GradientBoostingClassifier(),
param_grid={'learning_rate': [0.1, 0.01, 0.001],
'max_depth': [3, 5, 7], 'min_samples_split': [1, 2, 4],
'n_estimators': [100, 200, 300]})
{'learning_rate': 0.01, 'max_depth': 3, 'min_samples_split': 2, 'n_estimators': 300}
Accuracy: 81.26984126984127
Logistic Regression: 81.320922
Decision Tree Classifier: 75.370124
Random Forest Classifier: 82.382535
Support Vector Classifier: 52.347074
Gredient Boosting: 82.599734
0 Comments