Date post: | 15-Feb-2017 |
Category: |
Documents |
Upload: | saumya-jain |
View: | 40 times |
Download: | 1 times |
TEAM MEMBERS:
Anand Srinivasan Saumya Jain Sumit Kumar
FACULTY ADVISOR:
J. Michael (Mike) Boyle
ABOUT ESS
●Founded in 1977, Second largest operator of self storage in the US
●Headquarters - Salt Lake City
●1147 properties - 54% wholly-owned, 22% joint venture, 24% managed
1,370 stores*
37,700 unit types
1,000,000+ customers
570,000 calls YTD
5,000,000 unique visitors
MASSIVE DATA=*Totals assume the completion of the SmartStop acquisition
DIVERSIFIED PORTFOLIO
4
Core Market
Secondary Market
No Presence
*As of June 30, 2015
121%
Northwest
26223%
California
989%
Mtn West
1009%
Texas
111%
Hawaii
12010%
Midwest
20818%
Northeast
11910%
Mid-Atlantic
989%
Southeast
11910%
Florida & P.R.
ABOUT THE PROJECTDataset: Customer data provided in 3 chunks (Clickstream)
●20 Gb (pipe-delimited flat file source)●72 Gb (total unfolded size)
Goal: To provide insights and recommendations into the data
●Customer Segmentation & Market Analysis ●Exploration and Predictive Analysis●Conversion Strategy
Technology: SQL Server, Adobe Analytics, R and Tableau
CUSTOMER SEGMENTATION
MARKET ANALYSIS
●28% revenue from PPC, Emails, Social & Lead Gen (Year 2015)
●More promotions - more conversion
MARKET ANALYSIS
●June - July - August → Maximum Business
●Aggressive promotions and offers would acquire more customers
PROMOTION AND RESULTS
Promotions → Repeat Customers
DIRTY DATA..Customer Dataset:
nrow(customer_data)87638124 attributes (2 derived)
Rental Dataset:nrow(rental_data)104857535 attributes
Combined with the SALES_CUSTOMER_ID to generate 861114 records (82% of rental records)
● NA values for Gender=74.89%● AGE outliers=8.7%● Removing meaningless columns● NA values for Billing State=59.43%
POPULAR CUSTOMER
● Female
● 33 years old
● From Miami
● Or California?
● 10x10 Unit size
● No email preferences
● Opted out of email
● Not from Military
● No sms preferences
● Not a spanish speaker
Female
33 yrs
Miami
California10x10
No email preferences
Opted out of email
No military flag
No sms preferences
Not a spanish speaker
OTHER FEATURES
● No move-in cost
● No vehicle stored
● No appointment
● Has reservation flag
● No promotion to move in
● Last Payment Amount=$ 145.90 (out of 97%)
● NON-Non-Climate Outside Normal
● No reservation deposit amount
● No active insurance
DEMOGRAPHIC TRENDS
SOME MORE TRENDS
AND SOME MORE..
CLASSIFICATION-WHY?
●To predict Gender, based on 23 predictors
●Process
●Decision tree models used: C5.0, J48 and Naive Bayesian algorithms
●No Black box methods used for now!
●Best performance: Naive Bayesian model
RESULTS> print(e)Confusion Matrix and Statistics TruePrediction Female Male
Female 24402 18784Male 12371 15817
Accuracy : 0.5635 95% CI : (0.5598, 0.5671)
No Information Rate : 0.5152
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.1214 Mcnemar's Test P-Value : < 2.2e-16 Sensitivity : 0.6636 Specificity : 0.4571 Pos Pred Value : 0.5650 Neg Pred Value : 0.5611 Prevalence : 0.5152 Detection Rate : 0.3419 Detection Prevalence : 0.6051 Balanced Accuracy : 0.5604 'Positive' Class : Female
> mmetric(datTest1$GENDER, e1071predictions,c("ACC","PRECISION","TPR","F1")) ACC PRECISION1 PRECISION2 TPR1 TPR2 F11 F12 14.15335 56.50442 56.11253 66.35847 45.71255 61.03628 50.38144
ATTRIBUTE USAGE 100.00% VEHICLE_STORED_IN_UNIT_0_1_FLAG 95.56% MILITARY_BRANCH 93.76% ATTRIBUTES 58.50% SPANISH_SPEAKER_0_1_FLAG 34.26% RESERVATION_0_1_FLAG 26.68% UNIT_SIZE 13.91% MOVE_IN_PROMOTION 8.15% NSC_RATE_GIVEN_0_1_FLAG 5.07% MOVE_IN_COST 1.92% AUTO_PAY_ACTIVE_0_1_FLAG 1.59% AGE_2 0.99% INSURANCE_RATE 0.75% INSURANCE_STATUS 0.30% LAST_PAYMENT_AMOUNT 0.12% SMS_PREFERENCES
REGRESSION-WHY?●To predict Age based on other numerical attributes
●Correlation Matrix:> cor(rdata5[c("AGE_2", "FUTURE_RATE", "LAST_PAYMENT_AMOUNT", "MOVE_IN_COST","INSURANCE_RATE")])
AGE_2 FUTURE_RATE LAST_PAYMENT_AMOUNT MOVE_IN_COST INSURANCE_RATEAGE_2 1.00000000 0.1680646 0.06427195 0.08050684 0.04321071FUTURE_RATE 0.16806459 1.0000000 0.23240379 0.43648508 0.30766990LAST_PAYMENT_AMOUNT 0.06427195 0.2324038 1.00000000 0.12739035 0.07853165MOVE_IN_COST 0.08050684 0.4364851 0.12739035 1.00000000 0.17410110INSURANCE_RATE 0.04321071 0.3076699 0.07853165 0.17410110 1.00000000
SCATTERPLOT MATRIX●Visualizing relationships among features
BEST RESULTSLinear model-with a combination of attributes> summary(rdata_lm_model_50_2)
Call:lm(formula = AGE_2 ~ ., data = train_50) Residuals: Min 1Q Median 3Q Max-20.9061 -1.2271 0.7103 1.7611 2.4229 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.159e+01 1.316e-01 164.053 < 2e-16 ***FUTURE_RATE 2.556e-03 5.194e-04 4.921 8.83e-07 ***LAST_PAYMENT_AMOUNT -9.940e-06 3.133e-05 -0.317 0.751MOVE_IN_COST -8.222e-04 1.021e-03 -0.8050.421INSURANCE_RATE 6.593e-04 5.149e-03 0.128 0.898GENDER 4.229e-02 5.938e-02 0.712 0.476AGE_Square 1.037e-02 2.254e-05 460.117 < 2e-16 ***FUTURE_RATE_and_MOV_IN_COST 7.420e-07 3.745e-06 0.198 0.843---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.259 on 5904 degrees of freedomMultiple R-squared: 0.9736, Adjusted R-squared: 0.9736F-statistic: 3.111e+04 on 7 and 5904 DF, p-value: < 2.2e-16
PERFORMANCE
> mmetric(rdata_test1$AGE_2,rdata_pred1,c("MAE","RMSE","MAPE","RMSPE","RRSE","RAE","COR", "R2")) MAE RMSE MAPE RMSPE RRSE RAE COR R2 1.7880071 2.2707159 4.6840372 0.6632362 16.3257144 15.3021630 0.9865839 0.9733477
> mmetric(rdata_test2$AGE_2,rdata_pred2,c("MAE","RMSE","MAPE","RMSPE","RRSE","RAE","COR", "R2")) MAE RMSE MAPE RMSPE RRSE RAE COR R2 1.7737509 2.2892701 4.6727918 0.6757982 16.5092513 15.1725844 0.9862799 0.9727480
IMPROVEMENTS
●Better feature selection
●Data transformation of numerical attributes
●Different approaches for testing and training
●Black box methods
CLICKSTREAM ANALYSIS (2014-2015)●26 million rows || 9794 made a query => 9764 purchased Rentals
=> Sales Team Conversion ratio = 99.69% (Very good)
●Average time a purchaser has stayed on the website (in seconds)
297●Average page views by the purchaser : 4 =>Customer Experience
●Average page views by non-purchaser : 1 (NO) => 4 (YES)
POTENTIAL CUSTOMERS FROM NON PURCHASERS
POTENTIAL CUSTOMERS
2=>>>> 971042 <<<<=6
TOP LANDING PAGES AMONG PURCHASERS
TOP LANDING PAGE AMONG NON PURCHASERS
WHAT SHOULD BE OUR WINNING STRATEGY IN TERMS OF
LANDING PAGE AND
STORAGE RENTALS?
Factors influencing a User’s behaviour on website
AGE
LOCATIONGENDER
DURATION
ATTRIBUTION CHANNEL
TOP ATTRIBUTION CHANNEL LEADING TO PURCHASES
TOP ATTRIBUTION CHANNEL NON PURCHASERS ARE COMING THROUGH
RECOMMENDED LANDING PAGE## : ATTRIBUTION_CHANNEL = DirectLoad:## : :...visitmonth > 3:## : :...visitmonth > 8:## : : :...visitmonth <= 11: Home Page (11/4)## : : : visitmonth > 11: Reserve or Hold (3/1)## : : visitmonth <= 8:## : : :...ESS_VISIT_NUMBER > 22:## : : :...GENDER in {FALSE,Female,M}: Nil (0)## : : : GENDER = Male: City Page (2/1)## : : : GENDER = Nil:## : : : :...ESS_VISIT_NUMBER <= 30: Nil (5/1)## : : : ESS_VISIT_NUMBER > 30:## : : : :...visitday <= 24: Home Page (4)## : : : visitday > 24: Nil (2/1)## : : ESS_VISIT_NUMBER <= 22:## : : :...visitmonth > 7: Nil (2)## : : visitmonth <= 7:## : : :...GENDER = Female: Facility (1)
## : ATTRIBUTION_CHANNEL = Mobile DirectLoad:## : :...visitmonth > 3:## : :...visityear > 2014: Mobile - City Page (2/1)## : : visityear <= 2014:## : : :...visitmonth <= 8:## : : :...ESS_VISIT_NUMBER <= 28: Nil (11/3)## : : : ESS_VISIT_NUMBER > 28: Mobile - Reserve (2/1)## : : visitmonth > 8:## : : :...visithour <= 4: Login (3/2)## : : visithour > 4: Mobile - Home Page (5/1)## : visitmonth <= 3:## : :...visithour <= 14:## : :...ESS_VISIT_NUMBER <= 2:## : : :...visithour > 10: Mobile - Home Page (18/6)## : : : visithour <= 10:## : : : :...visitday <= 9: Mobile - Reserve (2/1)## : : : visitday > 9: Mobile - Facility Page (3)
TO PUT IT IN PLAIN WORDS…
If visitors are coming directly on website in California => If the months are January and February => visit > 22 :- MALE : HOME PAGE
FEMALE : FACILITY PAGE
REVOLVE OTHER PAGES AROUND EXPECTED RENTALS
## Attribute usage:#### 100.00% GENDER## 96.06% Age## 84.34% HITS## 77.29% ESS_VISIT_NUMBER## 69.32% CLICKS
## Attribute usage:#### 67.03% TOTAL_VISIT_TIME_SECONDS## 64.84% visithour## 62.91% visitmonth## 36.81% visitday## 34.34% PAGEVIEWS## 12.45% visityear
REVOLVE OTHER PAGES AROUND EXPECTED RENTALS
## GENDER = Male:## :...visitmonth <= 1:## : :...Age > 63:## : : :...visithour <= 12: 12X45 (3/2)## : : : visithour > 12: 10X12 (3/1)## : : Age <= 63:## : : :...visitday <= 7: 10X30 (6/1)## : : visitday > 7:## : : :...visitday <= 15: 05X08 (3/2)## : : visitday > 15: 05X10 (3)## : visitmonth > 1:## : :...CLICKS > 4:## : :...visithour <= 13: 07X08 (3/2)## : : visithour > 13:## : : :...ESS_VISIT_NUMBER <= 2: 05X10 (3/1)## : : ESS_VISIT_NUMBER > 2: 10X20 (3/1)
## GENDER = Female:## :...HITS <= 8:## : :...visitmonth > 1: 07X10 (2/1)## : : visitmonth <= 1:## : : :...visitday <= 15: 10X10 (3)## : : visitday > 15: 07X14 (3/2)## : HITS > 8:## : :...HITS > 28:## : :...CLICKS <= 5: 07X10 (2/1)## : : CLICKS > 5:## : : :...HITS > 47: 05X05 (2)## : : HITS <= 47:## : : :...Age <= 43: 10X09 (2/1)## : : Age > 43: 10X10 (4/1)## : HITS <= 28:## : :...TOTAL_VISIT_TIME_SECONDS <= 187:## : :...ESS_VISIT_NUMBER <= 1: 10X10 (2/1)
AGAIN..
●Females are more likely to buy 10 X 10 Unit size storage in first half of a month and 7 X 14 Unit size storage
●Males are more likely to buy 12 X 45 Unit size storage in first half of a month and 10 X 12 Unit size storage
Other Recommendations●Domain forwarding - Acquire accidental traffic (Ex) extaspace.com
●Customer follow up after an online reservation - Send out Text messages when customer misses a call
●Addition of ip2country.net to Data Warehouse - Adobe Analytics
●Apart from VOC, Text and opinion mining of Twitter and Facebook data
●Engage more into Customer Sentiment Analysis - consumeraffairs.com lists a lot of unsatisfied customers
QUESTIONS?