1 Background and Motivation

“Hey Riley, what champ do you think I’d be good at?” I’ve received this question a lot from my friends who play League of Legends and in the spirit of kindness (not exhaustion), I was curious to construct a champion recommender. Using Riots champion data, I constructed a K-means clustering model to help my fellow gamers pick a new character to pick up. Below is a step-by-step guide how to build a basic recommendation system for champion selection. In the future I will use this schema to design a more rigorous model. At the end is a radar chart to compare champions to each other and a link to the shiny app which allows live manipulation of the data!

2 The Data ETL

2.1 Grabbing the Data from Riot

Riot has a small JSON file with pretty up-to-date information on every Champion (or Character) in League of Legends.

#Getting JSON from RIOT
getLoLChamps<- fromJSON("http://ddragon.leagueoflegends.com/cdn/13.1.1/data/en_US/champion.json")

#Transforming JSON into usable data
champData <- getLoLChamps$data

#Refine ChampData so it grabs all their stats easily
champPri <- data.frame()

for(i in 1:length(champData)){
  #Extract List
  current_list<- champData[[i]]
  current_df <- data.frame(current_list$name,current_list$info$attack,current_list$info$defense,current_list$info$magic,current_list$info$difficulty)
  
  #Append Data 
  champPri <- rbind(champPri, current_df)
}

#Change Column Names 

colnames(champPri) <- c("ChampName","Attack","Defense","Magic","Difficulty")

2.2 Data Cleaning

&nbsp Within the Json file there are a few champions who have 0 values which do not makes sense (i.e Qiyana had 0 for attack). Since the data set is relatively small I was able to make a few spot corrections to the champions who I thought were off.

#fill in missing champ data 
replaceRows <- function(champion,attack,magic,defesne,diff){
  champPri[champPri$ChampName == champion,"Attack"] <- attack
  champPri[champPri$ChampName == champion,"Magic"] <- magic
  champPri[champPri$ChampName == champion,"Defense"] <- defesne
  champPri[champPri$ChampName == champion,"Difficulty"] <- diff
  return(champPri)
}

champPri <- replaceRows("Akshan",9,3,3,6)
champPri <- replaceRows("Qiyana",6,2,4,8)
champPri <- replaceRows("Vex",1,10,4,4)
champPri <- replaceRows("Lillia",2,10,2,8)
champPri <- replaceRows("Seraphine",3,7,4,2)
champPri <- replaceRows("Rell",3,3,8,6)

3 Summary Of Champion Types

3.1 Overall Distribution of Champion Attributes

To get a sense of the pool of champions we are picking from is it necessary to visualize the distribution of attributes. For example, if you’re looking to play a different super defense heavy champ or another incredibly easy character you might be hard pressed as there a fewer high defense and easy skill character to choose from.

#Histograms go here for each stat; 
histAttack <- champPri %>% ggplot(aes(x=Attack)) +  geom_histogram(binwidth=1,fill="#69b3a2",color="#e9ecef", alpha=0.9) + ggtitle("Attack Rating for Champ Distribution")+ scale_color_gradient(low = "blue", high = "red")+ylim(0, 35)

histMagic<- champPri %>% ggplot(aes(x=Magic)) +  geom_histogram(binwidth=1,fill="#69b3a2",color="#e9ecef", alpha=0.9) + ggtitle("Magic Rating for Champ Distribution")+ylim(0, 35)

histDefense <- champPri %>% ggplot(aes(x=Defense)) +  geom_histogram(binwidth=1,fill="#69b3a2",color="#e9ecef", alpha=0.9) + ggtitle("Defense Rating for Champ Distribution")+ylim(0, 35)

histDifficulty <- champPri %>% ggplot(aes(x=Difficulty)) +  geom_histogram(binwidth=1,fill="#69b3a2",color="#e9ecef", alpha=0.9) + ggtitle("Difficulty Rating for Champ Distribution")+ylim(0, 35)

3.1.1 Attack Historgram

The Attack Histogram is bimodal distribution with two peaks at around 2 and 8. Typically in League, champions are split between two sources of damage dealing: Attack or Magic. Although most champs will have a mix of both, it is apparent from this graph (and to those who play the game) that characters will skew in one direction or the other while few have a 50/50 split.

3.1.2 Magic Historgram

Similarly, the Magic Histogram is bimodal with peaks at 3 and 7. As mentioned previously, this is to be expected since characters will always have a mix (with preference) of both magic and attack.

3.1.3 Defense Historgram

The Defense Histogram tells us that Riot has very few tanks (high defense rating characters). We can see there is a somewhat normal distribution with a center at 5, and a slight right skew. With respect to the game, most characters need at least some defense, therefore it makes sense that a majority will fall in the 3-7 range. Oftentimes, tank characters are seen as boring and slow, therefore it also makes sense that there are few values of 8,9 and 10 for Defense.

3.1.4 Difficulty Histogram

Observing the Difficulty Histogram, it is evident that Riot has preferred to keep a normal distribution for its champs difficulty.Players are most engaged when they are playing at their skill level, and a majority of players will fall in that middle range of expertise, therefore this distribution makes sense.There is a slight left skew, which makes sense as most players who play this game will probably surpass the beginner phase and thus would be less incline to play champs who are “easy”.

3.2 Champ Difficulty Vs Primary Factors

There is an ongoing debate of which champions are the most “difficult”. Below are a few scatter plots examining the other three metrics against difficult to see if there is any correlations. It should be noted that since the values found below are all discrete, there is a lot of overlapping dots. I have made each dot transparent so illustrate where there may be denser concentrations of observations.

3.2.1 Attack Vs Difficulty

There is no obvious correlation between Attack and Difficulty, in fact it would appear as if there is an even distribution of max difficulty across different attack values. It should be noted that the largest spread of difficulty happens around the 8 mark for Attack.

##insert scatter plots against difficulty
attackDiff <- ggplot(champPri, aes(x=Attack, y=Difficulty)) + geom_point(color='darkred',alpha=.2)+ggtitle("Scatter plot Champ Attack Vs Difficulty")
attackDiff

3.2.2 Magic Vs Difficulty

Unlike the Attack vs Difficulty scatter, the Magic vs Difficulty scatter shows that there is a slight positive correlation between Magic and Difficulty. As the character gets closer to the 9-10 range in magic, a difficulty of <4 is harder to find.However, similar to the Attack graph, a value of 8 seems to have a large spread for difficulty.

magicDiff <- ggplot(champPri, aes(x=Magic, y=Difficulty)) + geom_point(color='darkblue',alpha=.2)+ggtitle("Scatter plot Champ Magic Vs Difficulty")
magicDiff

3.2.3 Defense Vs Difficulty

Although the correlation is weak, there is a slight negative correlation between defense and difficulty. The largest spread happens around close to 6 defense, meaning if you want a decent range of difficulty to chose from, a slightly tankier champ may be for you.

defenseDiff <- ggplot(champPri, aes(x=Defense, y=Difficulty)) + geom_point(color='darkgreen',alpha=.2)+ggtitle("Scatter plot Champ Defense Vs Difficulty")
defenseDiff

3.2.4 Attack Vs Magic

As tacitly known by players and Riot, the final scatter plot is evidence that there is a slight dichotomy between Attack champions and Magic champions. There appears to be a negative correlation implying you can have one or the other. As an aside, there seems to be no observations at the (5,5) coordinate implying League has no champ which is equally balanced between the two. (New Idea for a character?)

attackMagic <- ggplot(champPri, aes(x=Attack, y=Magic)) + geom_point(color='purple',alpha=.2)+ggtitle("Scatter plot Champ Attack Vs Magic")
attackMagic

3.3 Linear Regression Check

To validate the assertions made above, I ran a simple Linear Regression model and extracted the R2 values. Using the linear models we have a baseline to confirm the lack of (or very weak) correlation between a champions Attack, Magic or Defense stats and their difficulty. The R2 values are all low (>.1) which means most of the variability in attributes can not be well described by other attributes. The exception to this is a weak negative correlation between magic and attack a character has.

lmAttackDiff<- lm(Difficulty~Attack, data = champPri)
lmMagicDiff<- lm(Difficulty~Magic, data = champPri)
lmDefenseDiff <-lm(Difficulty~Defense, data = champPri)
lmAttackMag <-lm(Magic~Attack, data = champPri)

print(paste0("The R2 value for Attack on Diff is:",round(summary(lmAttackDiff)$r.squared,digits=5)))

## [1] "The R2 value for Attack on Diff is:0.01643"

print(paste0("The R2 value for Magic on Diff is:",round(summary(lmMagicDiff)$r.squared,digits=5)))

## [1] "The R2 value for Magic on Diff is:0.05232"

print(paste0("The R2 value for Defense on Diff is:",round(summary(lmDefenseDiff)$r.squared,digits=5)))

## [1] "The R2 value for Defense on Diff is:0.07291"

print(paste0("The R2 value for Attack on Magic is:",round(summary(lmAttackMag)$r.squared,digits=5)))

## [1] "The R2 value for Attack on Magic is:0.65971"

3.4 Key Takeaways

Riot does a decent job at balancing the amount of different champions across the main 4 factors, with the exception of heavy tanks.It appears that for each champion stat you can chose, there are handful of difficulty levels to play at. Attack and Magic users both have bimodal distributions, meaning that there are a larger amount of character that skew one way or the other without reaching the extremes.Lastly, one attribute does not predict another super well meaning that if you want to play an some what difficult, 50/50 magic/attack and mild defense champion you can (I’m looking at you Bard)!

4 Visualization on Each Category By Champion

If you want a simple way choosing a champion, it may be best to start by looking at the below graphs graphs and finding a champion you already like to play then choose another with similar stats. For example, someone who play “Vayne” may want to select a champion like “Xayah” since both rank 10 in Attack. Maybe you want a champ with a difficulty of 3 who is like “Braum” so you might chose “Amumu”. This method is agnostic to the role which these characters are played so be careful. Combining these stats and using more in-depth parameters would lead to a better recommendation. Use this sparingly!

The interactive graphs below have some champions removed from the Y axis to preserve space, however when hovered will display the champion name.

4.1 Attack Scatter Plot by Champion

#Dot Plots for Champs

figAttack <- plot_ly(champPri, x = ~Attack, y = ~ChampName, name = "Attack", type = 'scatter',
             mode = "markers", marker = list(color = champPri$Attack,colorscale='Viridis'))
figAttack<- layout(figAttack, yaxis = list(categoryorder = "array", categoryarray = unique(champPri$ChampName)))
figAttack

4.2 Magic Scatter Plot by Champion

figMagic <- plot_ly(champPri,x = ~Magic, y = ~ChampName, name = "Magic",type = 'scatter',
            mode = "markers", marker = list(color = champPri$Magic,colorscale='RdBu'))
figMagic<- layout(figMagic, yaxis = list(categoryorder = "array", categoryarray = unique(champPri$ChampName)))

figMagic

4.3 Defense Scatter Plot by Champion

figDefense <- plot_ly(champPri,x = ~Defense, y = ~ChampName, name = "Defense",type = 'scatter',
            mode = "markers", marker = list(color = champPri$Defense,colorscale='Blackbody'))
figDefense <- layout(figDefense , yaxis = list(categoryorder = "array", categoryarray = unique(champPri$ChampName)))

figDefense

4.4 Difficulty Scatter Plot by Champion

figDifficulty <- plot_ly(champPri,x = ~Difficulty, y = ~ChampName, name = "Difficulty",type = 'scatter',
            mode = "markers", marker = list(color = champPri$Difficulty,colorscale='Jet'))
figDifficulty<- layout(figDifficulty, yaxis = list(categoryorder = "array", categoryarray = unique(champPri$ChampName)))
figDifficulty

5 Clustering Champions by KMeans

To get a better picture of what champions are closely related to each other, a better approach would be to combine the 4 stats and compare the similarities between each. Using a method called K-Means we can cluster champions together in various groups based on the euclidean distance between their values. As explored earlier, Riot has done a fairly decent job of evenly distributing champions thus making K-means a good starting point.

5.1 Clustering Justifications

To figure out what cluster works best, I’ve used the eblow method to estimate the improvement in the within-cluster sum of squares.There is no obvious elbow, so trying values 5-16 increments of 4 to visualize and see if the clusters make sense. See next section for the 3D charts.

#Get champStats
champStats<-champPri[2:5]

# function to compute total within-cluster sum of square 
wss <- function(k) {
  kmeans(champStats, k, nstart = 25 )$tot.withinss
}

# Compute and plot wss for k = 1 to k = 15
k.values <- 1:30

# extract wss for 2-15 clusters
wss_values <- map_dbl(k.values, wss)

plot(k.values, wss_values,
       type="b", pch = 19, frame = FALSE, 
       xlab="Number of clusters K",
       ylab="Total within-clusters sum of squares")

5.2 Visualization of Clusters in 3D

Below are interactive 3D graphs, which cluster based on the 4 primary stats (Attack, Defense, Magic, and Difficulty). Using these graphs, hover over a champion with a similar color to a champ you make enjoy playing to chose a new champion which may play similarly to yours. It should be noted that each clustering system is slightly different and the visualization excludes difficulty. Moreover, for each model there may be a mis-grouping that more experienced League players will understand intuitively.

5.2.1 5 Cluster KMeans Model

set.seed(42)
grp <- kmeans(x=champStats, centers=5, nstart=25)
grpCluster <- grp$cluster
champPri$Cluster <- grpCluster

cluster <- plot_ly(champPri,x=~Attack,y=~Magic,z=~Defense,color=~Cluster, text =paste(champPri$ChampName,"Cluster:",champPri$Cluster))
cluster <- cluster %>% add_markers()
cluster

Groups decently well but there are a few obvious mis-groupings (Miss Fortune and Poppy,Sion and Seraphine…). It may be useful to explore a model with more clusters for a closer grouping.

5.2.2 8 Cluster KMeans Model

champStats<-champPri[2:5]

set.seed(42)
grp2 <- kmeans(x=champStats, centers=8, nstart=25)
grpCluster2 <- grp2$cluster
champPri$Cluster2 <- grpCluster2

cluster2 <- plot_ly(champPri,x=~Attack,y=~Magic,z=~Defense,color=~Cluster2,text =paste(champPri$ChampName,"Cluster:",champPri$Cluster2))
cluster2 <- cluster2 %>% add_markers()
cluster2

Groups a bit better than before(No Sion and Seraphine) and adds a little more nuance to the grouping.We can see groups like: Mages appear (Velkoz, Oriana, Malzahar), Fighters (Sett,Garen,Darius) and Tanks (Rammus,Galio,Rell). While there are still some mis-grouping the frequency is a lot less. It’s worth is to examine a higher number of clusters to see if we can find any other patterns or know our upper limit.

5.2.3 12 Cluster KMeans Model

champStats<-champPri[2:5]

set.seed(42)
grp3 <- kmeans(x=champStats, centers=12, nstart=25)
grpCluster3 <- grp3$cluster
champPri$Cluster3 <- grpCluster3

cluster3 <- plot_ly(champPri,x=~Attack,y=~Magic,z=~Defense,color=~Cluster3, text =paste(champPri$ChampName,"Cluster:",champPri$Cluster3))
cluster3 <- cluster3 %>% add_markers()
cluster3

Examining the 3D plot the 12 cluster model gives us slightly more insight (nuanced groups) than the previous model, but the marginal gain is smaller than the gain made from 5-8 cluster centers. Emerging in this model is the separation between Tanks (Rammus,Galio, Tahm Kench) and Wardens (Rell, Leona, Braum). Moreover we start to see the Hybrid Champion pool emerge (Shaco,Bard,GP).

5.2.4 16 Cluster Model

set.seed(42)
grp4 <- kmeans(x=champStats, centers=16, nstart=25)
grpCluster4 <- grp4$cluster
champPri$Cluster4 <- grpCluster4

cluster4 <- plot_ly(champPri,x=~Attack,y=~Magic,z=~Defense,color=~Cluster4, text =paste(champPri$ChampName,"Cluster:",champPri$Cluster4))
cluster4 <- cluster4 %>% add_markers()
cluster4

Not-surprisingly the 16 cluster model has a little more nuance than the previous 3 models. We can observe that the enchanter supports begging to have their own group (Nami,Sona,Yummi) but they are also tied in with unrelated character such as Teemo and Kennen. At this point I decided the model had reached it peak in the previous iteration (balancing nuance with sensibility).

5.3 Numerical Analysis of Clusters

To confirm our visual insight it is important to look at numbers such as the ratio of Between Sum of Squares (as a measure of difference between group means and total mean) and Total Sum of Squares (Between SS and Error SS). Using the ratio of BetweenSS/TotalSS in the KMeans package we can examine the incremental increase of our models ability to cluster well.

5.3.1 5 Cluster Analysis

set.seed(42)
print(kmeans(x=champStats, centers=5, nstart=25))

## K-means clustering with 5 clusters of sizes 30, 30, 38, 32, 32
## 
## Cluster means:
##     Attack  Defense    Magic Difficulty
## 1 6.300000 3.900000 6.400000   6.466667
## 2 8.200000 3.833333 2.600000   7.266667
## 3 2.289474 3.605263 8.789474   7.184211
## 4 8.218750 4.750000 2.468750   3.656250
## 5 3.437500 7.468750 5.843750   4.406250
## 
## Clustering vector:
##   [1] 4 3 1 2 5 5 3 3 2 4 3 1 1 3 5 3 5 2 4 3 5 1 4 1 2 5 1 1 3 1 3 4 1 5 2 4 1
##  [38] 5 4 1 2 3 4 1 3 3 5 1 2 1 2 2 2 5 3 3 3 1 2 1 2 4 2 1 1 3 2 5 3 3 2 5 3 5
##  [75] 3 5 4 4 4 5 5 1 1 5 3 1 2 4 5 4 3 5 4 4 2 2 4 3 5 4 5 3 4 2 2 3 3 2 5 1 5
## [112] 4 1 5 4 5 5 4 1 1 5 3 3 3 5 5 2 5 1 1 4 4 4 1 2 2 2 4 2 3 3 3 4 4 3 3 4 4
## [149] 4 3 4 2 2 1 1 5 2 2 3 3 5 3
## 
## Within cluster sum of squares by cluster:
## [1] 263.6667 180.0333 252.9211 164.6562 285.7812
##  (between_SS / total_SS =  70.0 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

The groups are all similar sizes and we have a BSS/TSS ratio of 70% indicating decent fit. Using the means chart we can see that the largest cluster/group in this model is Mages (high magic,high difficulty, low attack, low defense).

5.3.2 8 Cluster Analysis

set.seed(42)
print(kmeans(x=champStats, centers=8, nstart=25))

## K-means clustering with 8 clusters of sizes 17, 16, 17, 37, 19, 24, 14, 18
## 
## Cluster means:
##     Attack  Defense    Magic Difficulty
## 1 2.882353 4.117647 7.941176   3.823529
## 2 2.312500 6.000000 7.750000   6.812500
## 3 6.823529 5.352941 5.705882   6.176471
## 4 8.189189 4.783784 2.540541   3.972973
## 5 4.052632 8.421053 4.842105   4.473684
## 6 8.375000 3.125000 2.583333   7.416667
## 7 5.428571 3.000000 6.785714   8.214286
## 8 2.000000 2.722222 9.500000   7.722222
## 
## Clustering vector:
##   [1] 4 1 7 6 5 1 8 8 6 4 8 7 7 7 5 1 5 6 4 8 2 3 4 3 6 5 7 7 7 7 8 4 3 5 3 4 3
##  [38] 5 4 3 3 2 4 3 2 2 5 3 6 6 6 4 6 2 8 2 8 3 6 3 6 4 6 6 3 8 4 5 8 2 4 1 1 5
##  [75] 8 5 4 4 4 1 1 1 3 2 8 7 6 4 1 4 8 5 4 4 6 6 4 1 5 4 5 2 4 6 6 2 8 4 5 7 1
## [112] 4 7 5 4 5 5 4 3 1 1 2 1 8 5 2 6 5 7 3 4 4 4 7 6 3 6 4 6 8 8 1 4 4 8 2 4 4
## [149] 4 8 4 6 6 3 1 2 6 4 1 2 2 7
## 
## Within cluster sum of squares by cluster:
## [1]  98.94118  62.87500 102.35294 190.10811 104.84211 149.91667  64.14286
## [8]  61.72222
##  (between_SS / total_SS =  78.2 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

Upon clustering with 8 centers we can see an improvement in the BSS/TSS ratio to 78%. However now there is a significantly larger group which has 37 observations, and is characterized by high attack, mid defense, low difficulty and low magic. Most of the champs found here are Auto Attack heavy champions such as Quinn, Miss Fortune, or Varus.

5.3.3 12 Cluster Analysis

set.seed(42)
print(kmeans(x=champStats, centers=12, nstart=25))

## K-means clustering with 12 clusters of sizes 16, 11, 11, 17, 9, 13, 7, 11, 24, 14, 15, 14
## 
## Cluster means:
##      Attack  Defense    Magic Difficulty
## 1  1.750000 2.687500 9.562500   7.750000
## 2  7.818182 4.090909 2.363636   8.454545
## 3  2.454545 5.636364 7.909091   7.545455
## 4  7.411765 5.823529 3.058824   5.294118
## 5  4.333333 8.777778 3.555556   4.888889
## 6  8.846154 2.307692 2.769231   6.538462
## 7  6.571429 5.714286 6.000000   8.285714
## 8  4.636364 2.909091 7.363636   8.363636
## 9  8.583333 4.291667 2.458333   3.375000
## 10 3.785714 7.571429 6.214286   4.285714
## 11 2.000000 4.866667 8.133333   4.066667
## 12 6.285714 3.428571 6.642857   5.142857
## 
## Clustering vector:
##   [1]  9 11  8  6  5 11  1  1  2  9  1  8  8  8 10 11  5  6  4  1 10 12  9 12  2
##  [26] 10  8  7  8 12  1  9 12 10  7  4  7 10  9 12  4  3  4 12  3  3  5 12  2  6
##  [51]  6  4  6 11  1  3  8  7  2 12  6  9  6  6  7  1  4  5  1  3  4 11 11 10  1
##  [76] 10  9  9  9 10 11 12 12 10  1  8  2  9 10  9  8  5  9  4  6  2  9 11  5  9
## [101]  5  3  9  2  2  3  1  4 10 12 11  9  7  5  4 10  5  9  4 12 11  3 11  1 10
## [126] 11  6 10 12  7  9  4  9  8  6  4  2  9  6  1  1 11  9  4  1  3  4  9  9  1
## [151]  9  2  2  4 12  3  6  4 11  3 11  8
## 
## Within cluster sum of squares by cluster:
##  [1]  51.37500  51.81818  24.90909  55.05882  32.66667  50.00000  30.57143
##  [8]  38.54545 100.37500  55.00000  74.40000  79.21429
##  (between_SS / total_SS =  83.2 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

Comparing this model to the previous, the sizes of the groups are a little less variable and there is a marginal increase in the BSS/TSS ratio. Again the largest group seems to be Auto Attack heavy champions. As the clustering gets finer the means seem to be less descriptive of a particular group

5.3.4 16 Cluster Analysis

set.seed(42)
print(kmeans(x=champStats, centers=16, nstart=25))

## K-means clustering with 16 clusters of sizes 19, 10, 10, 10, 8, 7, 9, 8, 17, 4, 6, 11, 9, 15, 7, 12
## 
## Cluster means:
##      Attack  Defense    Magic Difficulty
## 1  8.315789 4.263158 2.473684   3.052632
## 2  2.000000 3.200000 8.900000   4.900000
## 3  7.400000 4.200000 2.800000   8.600000
## 4  8.000000 2.600000 5.200000   5.900000
## 5  3.375000 8.875000 5.500000   3.750000
## 6  4.571429 8.571429 3.285714   5.142857
## 7  8.888889 2.333333 1.666667   6.888889
## 8  5.625000 3.500000 7.375000   4.500000
## 9  7.411765 5.823529 3.058824   5.294118
## 10 2.000000 5.250000 7.500000   2.250000
## 11 9.666667 4.833333 1.833333   5.500000
## 12 2.454545 5.636364 7.909091   7.545455
## 13 4.777778 2.888889 7.000000   8.555556
## 14 2.066667 2.866667 9.533333   8.133333
## 15 6.571429 5.571429 6.285714   7.857143
## 16 3.333333 6.666667 6.916667   4.833333
## 
## Clustering vector:
##   [1]  1  2 13  7  6 10 14  2  3  1 14 13 13 13  5  2  5  7  9 14 16  4  1  8  7
##  [26] 16 13 15 13  4 14  1  8  5  3  9 15 16  1  4  9 12  9  4 12 12  6  8  3  4
##  [51]  4  9  4 16 14 12 14 15 11  8 11  1  7  4 15 14  9  6 14 12  9 16  2  5  2
##  [76]  5 11  1  1 16 10  8 15 16  2 13  3  1 16  1 14  6  1  9  7  3 11  2  5  1
## [101]  6 12  1  3  3 12 14  9 16  4 10  1 15  6  9 16  6  1  9  8 10 12  2 14  5
## [126] 16  7  5  8 15  4  9 11 13  7  9  3  1  7 14 14  2  1  9 14 12  9  1 11 14
## [151]  1  3  3  9  8 12  7  9  2 12 16 13
## 
## Within cluster sum of squares by cluster:
##  [1] 67.47368 25.40000 40.00000 30.90000 30.25000 21.71429 17.77778 39.75000
##  [9] 55.05882  8.50000 18.50000 24.90909 28.66667 40.13333 29.71429 41.91667
##  (between_SS / total_SS =  86.4 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

Finally, comparing the 16 center model with the other we have the best ratio of BSS/TSS however the means chart gives an even less intelligible grouping for champs.

5.4 Takeaways from Clustering

The story of league champs can be explained decently well through the primary statistics (Attack,Magic,Defense,Difficulty) but it is not the complete picture. As we increased the number of cluster centers we can observe more nuanced grouping for Champions.It appears that 16 centers works the best without being too redundant or mis-grouping. When examining the numerical data from the clusters (as well as the elbow plot) we can see that the diminishing returns on increasing cluster, especially after 12 centers. However, since it does do a decently good job at grouping I have integrated the 12 center Kmeans model into a recommendation system as seen below.

6 Recommender

Below is a sample of what the recommendation system will return for the champion Heimerdinger. A radar chart will be presented to compare the recommended champions attributes to the one that is chosen. Each champ can be toggled on or off for clarity.

#Take in a champion
champRec <- function(champion){
  curClusterNo <- champPri[champPri$ChampName == champion,]$Cluster3
  getRec <- champPri[champPri$Cluster3 == curClusterNo,]
  getRec<- getRec[getRec$ChampName != champion,]
  champRecList<- getRec$ChampName
  champRecList <- sample(champRecList,3)
  champRecList <- append(champRecList,champion)
  return(champRecList)
}

newRec <-champRec("Heimerdinger")

colors_border=c( rgb(0.2,0.5,0.5,0.9), rgb(0.8,0.2,0.5,0.9) , rgb(0.7,0.5,0.1,0.9),rgb(0.6,0.6,0.2,0.8))
colors_in=c( rgb(0.2,0.5,0.5,0.4), rgb(0.8,0.2,0.5,0.4) , rgb(0.7,0.5,0.1,0.4),rgb(0.6,0.6,0.2,0.8))


#Change the Row name to Champ name
radarData <- champPri[champPri$ChampName %in% newRec,]
rownames(radarData) <- radarData[,1]
radarData <- radarData[,2:5]

#Create For Loop for a Radar Chart for each
radar <- plot_ly(
    type = 'scatterpolar',
    fill = 'toself',
    mode='markers' 
  ) 
for( i in 1:nrow(radarData)){
  radar <- radar %>% add_trace(
    r = c(radarData[i,]$Attack,radarData[i,]$Magic,radarData[i,]$Defense,radarData[i,]$Difficulty),
    theta = c('Attack','Magic','Defense', 'Difficulty'),
    name = rownames(radarData[i,])
  ) 
}

radar

7 Conclusion

League of Legends has over 162 characters and its fun to try new champs but oftentimes in hard to get a good feel who might match your playstyle. As demonstrated with the histograms and scatter plots there are plenty of options to choose from depending on your affinity for the 4 primary stats (with the exception of super tanks and very easy champs). Although one may chose based on a single stat, it is important to consider that a combination of stats would give a better idea on which character you would like to play. Although the K-means method is quick and simple to understand, it does produce the occasional odd pairing when combined with difficulty (especially when its not visualized).

To create a more in-depth recommendation system, there are more granular parameters (20 more) which can be explored such as hpperlevel, or attackspeed. Moreover, Riot has labeled each champion with tags such as “Assassin” or “Mage”. These descriptive characteristics may give a better idea of who to play and will be explored in future iterations of this project.

For now I hope you have fun exploring the data with the interactive plots and can wrap your head around the nuanced stats/choices you have in character selection. See you on the Rift!

League Champ Recommender