Final Report:
SALES OF ORTHOPEDIC EQUIPMENT
The objective of this study is to find ways to increase sales of orthopedic products from our
company to all hospitals in the United States. Find those who have high consumption of such
equipment but where our sales are low. Come up with a selected group where you think our efforts
will be rewarded.
The following description of the dataset includes variable names and some summaries of
variable.
A file with a shell SAS program that follows the analysis steps is provided in another link.
DATASET ORTHOPEDIC
VARIABLES:
ZIP : US POSTAL CODE
HID : HOSPITAL ID
CITY : CITY NAME
STATE : STATE NAME
BEDS : NUMBER OF HOSPITAL BEDS
RBEDS : NUMBER OF REHAB BEDS
OUT-V : NUMBER OF OUTPATIENT VISITS
ADM : ADMINISTRATIVE COST(In $1000's per year)
SIR : REVENUE FROM INPATIENT
SALES : SALES OF REHAB. EQUIP in $1000's per year
HIP : NUMBER OF HIP OPERATIONS
KNEE : NUMBER OF KNEE OPERATIONS
TH : TEACHING HOSPITAL 0, 1
TRAUMA : DO THEY HAVE A TRAUMA UNIT? 0, 1
REHAB : DO THEY HAVE A REHAB UNIT? 0, 1
HIP2 : NUMBER HIP OPERATIONS Year 2
KNEE2 : NUMBER KNEE OPERATIONS Year 2
FEMUR2 : NUMBER FEMUR OPERATIONS Year 2
Overview of the Analysis
Part 1. Select your market segment-s.
1. Select cases:
Select a group of states for the study (it is enough to select about 2000-2500
hospitals at random or by region). Set the zero values on SALES to missing
values.
2. Transformations:
Look at each individual variable and decide "if and which" transformation is
appropriate. Some transformations are log(1+c*x) where the constant c changes
from variable to variable ( 0.1,0.01,0.001,…) or sqrt transformation or any other.
3. Dimension reduction.
i) Separate the variables into the following groups:
Response: SALES, SALES=0 => SALES=NA
Another alternative approach but not so important here: SALESCAT = 1:0-
median 2: median-80% 3:80%-100%.
Demographics: BEDS, RBEDS, OUTV, ADM, SIR, TH, TRAUMA, REHAB
Operation numbers: HIP, KNEE, HIP2, KNEE2, FEMUR2
Typical transformations should be of the type below but not exactly, so you need
to try several possibilities for each variable untul the histogram looks acceptable.
HIP = sqrt(HIP) or SALES = log(1+0.1*SALES)
ii) Use the factor method to summarize the demographic variables and the
operation variables and come out with a final reduced list of factor variables
(perhaps 3 or 4). Use the rotated factors in order to find a good interpretation of
the factors and try to make a good story.
4. Market segmentations.
i) Independent variables are used to divide the list of hospitals (all possible
clients = the market) into subsets which we call market segments or
clusters.
Use cluster analysis to find the market segments or clusters. Since we are
summarizing the variables with factors then use the factors. One way of
choosing the number of clusters is to move the data into R and apply the
silhouette function with pam to calculate the silhouette statistic and of
cluster it to decide the number clusters. Then move the cluster variable
back to SAS if you prefer.
iii) Once the clusters are chosen we must study the summary statistics for
each cluster and try to describe their content. Interpretation is very
important at this stage. You do a boxplot of SALES or transformed SALES
VS CLUSTER_NUMBER and choose clusters with the highest SALES and
focus on the top cluster or clusters.
v) Finally we select the cluster or clusters that agree with our objectives.
These are clusters with high sales and with good characteristics, such as
high number of operations, etc.
In this study you are looking for segments with over all high sales but
where there are hospitals were the company's sales is NA so they are not
yet our customers. Some segments will have mostly low sales. This means
that those hospitals have few patients who would need our products so we
are not interested in them.
Part 2. Estimating potential gain in sales. Potential gain in sales is the difference
between current sales and the average of sales to similar hospitals. If you are
analyzing a very small cluster (N <20) then we might assume that the sales are
homogeneous and the “average sales to similar hospitals” is just the average sale to
that cluster. But if the cluster is larger we will need to obtain a regression estimate.
This is the procedure:
i) Do a regression for each of the t selected segments. Notice that since the
segments are very homogeneous you may expect that the R-square may
not be very high SO DO NOT BE CONCERNED WITH LOW RSQUARES.
ii) The hospitals with large negative residuals are the ones that have low
sales but their characteristics suggest that they are below their potential
sales (use predicted values as potential sales). Make a list of the hospitals
in your segment were sales can be improved.
iii) Give your estimate of the potential gains.
EXTRA CREDIT: All these parts are required to be performed using SAS. In
addition you could compare the results from SAS with a similar robust analysis
using R. The R analysis would apply the methods for robust clustering (pam) and
for classification and regression trees (rpart).
PAM: compare the clusters given by PAM with those from SAS, are they similar?
RPART: The idea here is to take the SALES variable that was defined earlier as
a response. Run the tree method and select one good node that have very high
sales and find hospitals on that group that have SALES=NA and estimate a
potential sale gain.
Use the rpart package in R. The rpart function is similar to lm in the sense that it
accepts “predict” for new data.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。