Churners in the Telecom Industry.
network_age | Aggregate_Total_Rev | Aggregate_SMS_Rev | Aggregate_Data_Rev | Aggregate_Data_Vol | Aggregate_Calls | Aggregate_ONNET_REV | Aggregate_OFFNET_REV | Aggregate_complaint_count | aug_user_type | sep_user_type | aug_fav_a | sep_fav_a | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1914 | 1592.7200 | 23.26 | 2.5 | 1.161130e+01 | 375 | 25523 | 99000 | 1 | 2G | 2G | telenor | mobilink | Churned |
1 | 2073 | 1404.1496 | 174.45 | 27.5 | 2.531725e+03 | 389 | 14584 | 77299 | 1 | 2G | 2G | mobilink | ufone | Churned |
2 | 3139 | 85.5504 | 14.34 | 5.0 | 2.913306e+04 | 15 | 477 | 4194 | 1 | Other | Other | ptcl | telenor | Churned |
3 | 139 | 2315.2292 | 19.25 | 52.5 | 2.674413e+05 | 636 | 50316 | 52400 | 2 | 2G | 2G | telenor | ufone | Active |
4 | 139 | 227.8620 | 2.95 | 42.5 | 1.461621e+06 | 17 | 2568 | 1701 | 1 | NaN | NaN | mobilink | ufone | Active |
(2000, 14)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2000 entries, 0 to 1999 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 network_age 2000 non-null int64 1 Aggregate_Total_Rev 2000 non-null float64 2 Aggregate_SMS_Rev 2000 non-null float64 3 Aggregate_Data_Rev 2000 non-null float64 4 Aggregate_Data_Vol 2000 non-null float64 5 Aggregate_Calls 2000 non-null int64 6 Aggregate_ONNET_REV 2000 non-null int64 7 Aggregate_OFFNET_REV 2000 non-null int64 8 Aggregate_complaint_count 2000 non-null int64 9 aug_user_type 1755 non-null object 10 sep_user_type 1794 non-null object 11 aug_fav_a 1999 non-null object 12 sep_fav_a 1999 non-null object 13 Class 2000 non-null object dtypes: float64(4), int64(5), object(5) memory usage: 218.9+ KB
network_age | Aggregate_Total_Rev | Aggregate_SMS_Rev | Aggregate_Data_Rev | Aggregate_Data_Vol | Aggregate_Calls | Aggregate_ONNET_REV | Aggregate_OFFNET_REV | Aggregate_complaint_count | |
---|---|---|---|---|---|---|---|---|---|
count | 2000.000000 | 2000.000000 | 2000.000000 | 2000.000000 | 2.000000e+03 | 2000.000000 | 2000.000000 | 2000.000000 | 2000.000000 |
mean | 1469.554500 | 905.020106 | 31.108605 | 58.806080 | 2.773961e+06 | 240.910500 | 7411.284500 | 16457.577500 | 1.924500 |
std | 1286.753291 | 1151.308507 | 57.908418 | 247.459279 | 8.845272e+06 | 369.922258 | 16494.392836 | 34311.972061 | 2.265693 |
min | -8.000000 | 4.910000 | 0.000000 | 0.000000 | 5.860000e-02 | 1.000000 | 0.000000 | 0.000000 | 1.000000 |
25% | 323.500000 | 247.149600 | 3.500000 | 1.250000 | 2.675567e+03 | 25.000000 | 114.000000 | 1432.000000 | 1.000000 |
50% | 1194.500000 | 606.575000 | 14.810000 | 13.750000 | 1.822864e+05 | 99.000000 | 1940.500000 | 5039.000000 | 1.000000 |
75% | 2247.250000 | 1220.045000 | 34.140000 | 53.750000 | 1.544505e+06 | 331.250000 | 7941.000000 | 15790.000000 | 2.000000 |
max | 5451.000000 | 24438.830000 | 873.980000 | 8295.000000 | 1.550312e+08 | 5727.000000 | 381174.000000 | 431440.000000 | 49.000000 |
Index(['network_age', 'Aggregate_Total_Rev', 'Aggregate_SMS_Rev', 'Aggregate_Data_Rev', 'Aggregate_Data_Vol', 'Aggregate_Calls', 'Aggregate_ONNET_REV', 'Aggregate_OFFNET_REV', 'Aggregate_complaint_count', 'aug_user_type', 'sep_user_type', 'aug_fav_a', 'sep_fav_a', 'Class'], dtype='object')
network_age 0 Aggregate_Total_Rev 0 Aggregate_SMS_Rev 0 Aggregate_Data_Rev 0 Aggregate_Data_Vol 0 Aggregate_Calls 0 Aggregate_ONNET_REV 0 Aggregate_OFFNET_REV 0 Aggregate_complaint_count 0 aug_user_type 245 sep_user_type 206 aug_fav_a 1 sep_fav_a 1 Class 0 dtype: int64
<AxesSubplot:>
(1722, 14)
(3G 941 2G 407 Other 374 Name: aug_user_type, dtype: int64, 3G 966 2G 382 Other 374 Name: sep_user_type, dtype: int64)
(ptcl 399 ufone 388 mobilink 248 telenor 242 zong 211 warid 183 0 51 Name: aug_fav_a, dtype: int64, ufone 1059 ptcl 351 mobilink 123 telenor 67 warid 64 zong 58 Name: sep_fav_a, dtype: int64)
network_age | Aggregate_Total_Rev | Aggregate_SMS_Rev | Aggregate_Data_Rev | Aggregate_Data_Vol | Aggregate_Calls | Aggregate_ONNET_REV | Aggregate_OFFNET_REV | Aggregate_complaint_count | aug_user_type | sep_user_type | aug_fav_a | sep_fav_a | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1914 | 1592.7200 | 23.26 | 2.5 | 11.6113 | 375 | 25523 | 99000 | 1 | 0 | 0 | 3 | 0 | 1 |
1 | 2073 | 1404.1496 | 174.45 | 27.5 | 2531.7246 | 389 | 14584 | 77299 | 1 | 0 | 0 | 1 | 3 | 1 |
2 | 3139 | 85.5504 | 14.34 | 5.0 | 29133.0557 | 15 | 477 | 4194 | 1 | 2 | 2 | 2 | 2 | 1 |
3 | 139 | 2315.2292 | 19.25 | 52.5 | 267441.2813 | 636 | 50316 | 52400 | 2 | 0 | 0 | 3 | 3 | 0 |
5 | 143 | 973.9664 | 21.86 | 22.5 | 920871.0674 | 421 | 4032 | 15476 | 1 | 1 | 1 | 1 | 3 | 0 |
LD_Analysis: 0.719713 LR: 0.728446 DT: 0.652867 KNN: 0.674733 NB: 0.673945 SVM: 0.719708 RF: 0.746562 Ada_Boost: 0.726991 GD_Boost: 0.750952
0.7575161324447266
0.7509143610013175
Fitting 10 folds for each of 4 candidates, totalling 40 fits CPU times: total: 2.94 s Wall time: 7.94 s
RandomizedSearchCV(cv=10, estimator=RandomForestClassifier(n_jobs=-1), n_iter=4, param_distributions={'bootstrap': [True, False], 'criterion': ['gini', 'entropy'], 'max_depth': [3, 5, 10, 12, 15, 18], 'max_features': [0.5, 1, 'log2', 'sqrt', 'auto'], 'max_samples': [500, 750, 1000, 1200], 'min_samples_leaf': array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]), 'min_samples_split': array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]), 'n_estimators': array([10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95])}, verbose=True)
RandomForestClassifier(criterion='entropy', max_depth=5, max_features=1, max_samples=500, min_samples_leaf=2, min_samples_split=18, n_estimators=95, n_jobs=-1)
0.7320691843859093
0 Comments