Introduction

When analyzing an NBA team’s season performance, the aspects most people turn to initially is there win and loss record and where the team ranks in their respective conference and division. Thus, in order to elaborate effectively these aspects, we must consider what is known in basketball as “The Four Factors”. These four metrics, which include a team’s effective field goal percentage (EFG), free throw rate (FTR), turnovers per possession (TPP) and offensive rebounding percentage (ORP), has been researched and concluded to represent the overall offensive and defensive performance metrics of a team, which can help us predict whether the team being analysed wins or losses a game.

Data Description

With the above in mind, in this assignment, I will be analyzing the data from the .csv file that represents NBA’s Miami Heat basketball team’s and their corresponding opponent’s performance metrics for each game of the 2010-2011 season. In addition, the file has a record of Miami Heat’s wins and losses for each game of the season. With regards to this data, I intend to verify whether the stated metrics in the introduction due indeed serve as crucial indicators in determining whether the Miami Heat team won or lost a game.

Exploratory Data Analysis (EDA)

We begin by first reading and loading the data followed by applying the attach command, which will allow us to call any object within the dataset by just typing the object’s corresponding name.

MHdata<-read.csv("MiamiHeat.csv")
attach(MHdata)
names(MHdata)

##  [1] "Game"         "Date"         "Location"     "Opp"         
##  [5] "Win"          "FG"           "FGA"          "EFG"         
##  [9] "FG3"          "FG3A"         "FT"           "FTA"         
## [13] "FTR"          "Rebounds"     "DR"           "OffReb"      
## [17] "ORP"          "Assists"      "Steals"       "Blocks"      
## [21] "Turnovers"    "TPP"          "Fouls"        "Points"      
## [25] "OppFG"        "OppFGA"       "DEFG"         "OppFG3"      
## [29] "OppFG3A"      "OppFT"        "OppFTA"       "DFTR"        
## [33] "OppOffReb"    "OppDR"        "OppRebounds"  "DORP"        
## [37] "OppAssists"   "OppSteals"    "OppBlocks"    "OppTurnovers"
## [41] "DTPP"         "OppFouls"     "Diff"         "OppPoints"

As mentioned in the introduction, the keys to success in a basketball game can be summarized by “The Four Facters”

\[ \begin{aligned} EFG = (FGM + 0.5 × 3PM)/FGA \end{aligned} \] \[ \begin{aligned} FTR = FTM/FGA \end{aligned} \] \[ \begin{aligned} TPP = TO_t /POSS_t \end{aligned} \] \[ \begin{aligned} ORP = OREB_t /(OREB_t+ DREB_o) \end{aligned} \]

With these formulas in mind, lets create a linear model of “Win” with respect to the offensive and defensive aspect of “The Four Factors”

MHdata.lr<-lm(Win~EFG+FTR+TPP+ORP+DEFG+DFTR+DTPP+DORP, MHdata)

Inference (hypothesis test, regression)

Now, lets use the summary command in order to verify whether or not these factors are indeed indicators for determining an NBA team’s game outcome.

summary(MHdata.lr)

## 
## Call:
## lm(formula = Win ~ EFG + FTR + TPP + ORP + DEFG + DFTR + DTPP + 
##     DORP, data = MHdata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.62507 -0.20060 -0.00156  0.22559  0.53717 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.562436   0.681544   0.825  0.41193    
## EFG          0.034768   0.005851   5.943 8.83e-08 ***
## FTR          0.011826   0.003507   3.372  0.00119 ** 
## TPP         -0.025934   0.009562  -2.712  0.00833 ** 
## ORP          0.015724   0.004924   3.193  0.00208 ** 
## DEFG        -0.024623   0.005771  -4.267 5.86e-05 ***
## DFTR        -0.007630   0.004535  -1.683  0.09673 .  
## DTPP         0.029401   0.011464   2.565  0.01238 *  
## DORP        -0.019203   0.007982  -2.406  0.01867 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2824 on 73 degrees of freedom
## Multiple R-squared:  0.6571, Adjusted R-squared:  0.6195 
## F-statistic: 17.49 on 8 and 73 DF,  p-value: 3.042e-14

Based on the results from the summary command, we obtained a p-value much less than 0.05, precisely 3.042e-14, which indicates pretty strong evidence against the null hypothesis that was established, which again is that “The Four Factors” has no influence with regards to determining that the team won or lost.
In addition, with regards to the Residual Standard Error, we got a value of 0.2824, which indicates how much the response (Wins) deviates from the true regression line.
Another aspect that we can base our analysis on is the F-Statistic, which is a good indicator of whether there is a relationship between our predictor and the response variables. The further the F-statistic is from 1 the better it is and sufficient to reject the null hypothesis (H0 : There is no relationship between Wins and The Four Factors). However, how much larger the F-statistic needs to be depends on both the number of data points and the number of predictors. In our example, the F-statistic is 17.49 which is relatively larger than 1 given the size of our data.

Discussion

As a final aspect that we can is generating a QQ (QuantileQuantile) plot, which is a graphical method for comparing the observed standardized residuals with the ordered theoretical residuals, which represent the expected residual values if errors/residuals are normally distributed. If the residuals are normally distributed, these points should fall roughly on a diagonal line. As we can confirm from the plot, the data are very well behaved with only point numbers 8,9, and 38 having a larger residual than expected.

plot(MHdata.lr,which=2)

Conclusion

As can be demonstrated from these analysis, we can firmly say already that these four factors are indeed related to winning or losing. However, lets look also at trying to predict whether the Heat won or lost the respective games based on the results of the four factors. Similar to previous reserach, I established a threshold that differentiates between a win and a loss. In this exmaple, based on the data, the threshold goes as follow:

\[ \begin{aligned} 1\space(win),\space if\space\space Y\space>=\space0.5 \\ 0\space(lose),\space if\space\space Y\space<\space0.5 \end{aligned} \]

W<-predict(MHdata.lr)
W

##           1           2           3           4           5           6 
##  0.13269009  0.78132622  0.95123613  1.30552487  1.42991846  0.25140707 
##           7           8           9          10          11          12 
##  0.82616845  0.62507351  0.52359223  0.67270045  1.08612284  0.50900481 
##          13          14          15          16          17          18 
##  0.37440558  0.11820039  0.07903593  0.63322954  0.32236433  0.85766581 
##          19          20          21          22          23          24 
##  1.17671705  1.20109335  0.79246505  0.74260814  1.10038613  1.12185162 
##          25          26          27          28          29          30 
##  1.22360394  0.74669935  0.85471420  0.89322030  0.56740869  0.25398047 
##          31          32          33          34          35          36 
##  0.69653393  0.73396158  1.00886933  0.71526181  0.98459868  1.08788289 
##          37          38          39          40          41          42 
##  0.80914410  0.46282776  0.69663953  0.42968176 -0.17096526  0.19913253 
##          43          44          45          46          47          48 
##  0.33728369  1.02536644  0.45818570  0.61723071  0.66298055  1.09670835 
##          49          50          51          52          53          54 
##  0.82244105  0.99424404  0.91460493  0.72885842  0.96939169  0.27773834 
##          55          56          57          58          59          60 
##  0.56606081  0.77047909  0.95177207  0.16775382  0.73226896  0.32528730 
##          61          62          63          64          65          66 
##  0.30743325 -0.16142810  0.42300456  0.35383120  0.64394860  1.41455770 
##          67          68          69          70          71          72 
##  1.20404909  0.17077875  1.12150279  0.67615083  0.74946765  1.03229808 
##          73          74          75          76          77          78 
##  0.90179323 -0.22789145  1.10748411  1.06604379  1.10264805  0.25052020 
##          79          80          81          82 
##  0.86055585  1.02492776  0.91420409  0.83748038

length(which(W > 0.5))

## [1] 59

We can verify the accuracy of this prediction by going to: Heat

References

1 2 3 4

502_Final_Project_Rojas.pdf

Javier Rojas

October 2, 2016