Jump to content

A note on Elo ratings (and why your super high rating will probably go down)


tukeykramer
 Share

Recommended Posts

From the most recent patch notes:

 

Applied fix to ELO system. Scores should now properly normalize to reasonable values.

 

This implies that the previous system was not a proper Elo rating system and that previous rating values weren't very meaningful. However, this could already be inferred by some of the ridiculously high ratings. The implementation of a true Elo system will mean that people with extremely high (low) ratings under the old system will see their ratings drop (rise) over time until they are closer to the base rating. This is not a bug, but instead a reflection of the fact that the ratings in the previous system didn't mean much. To help people understand this, I thought I would explain some of the basics of an Elo system.

 

Essentially, an Elo ranking system works by using the ratings of players to create an expectation of who will win the game. For example, if two teams each have the same average rating, an Elo system will expect each team to have a 50% chance of winning. Once a winner is decided, ratings are updated to reflect the outcome (in particular, the ratings of the winners will go up and the ratings of the losers will go down). The change in rating is proportional to how surprising the outcome is. So, if the team that has an 85% chance of winning (according to the ratings in the system) wins, their ratings won't go up much, and the ratings of the losing team won't go down much, since that outcome was expected. However, if the team with a 15% chance of winning wins, their ratings will go up significantly, while the ratings of the losing team will go down significantly. Basically, the system is trying to get each player to a rating that accurately reflects their odds of winning any particular game. For additional information on the mechanics of Elo, go here.

 

By using the basic Elo formula, we can see how outrageous some players' ratings were. First, let's assume a base (and average) rating of 1500 and a maximum rating change of 60 per game. The following table illustrates how often the system expects a player to win based on their rating and the average impact of a win or loss for that player. For example, a player with a rating of 1853 would be expected to win about 60 percent of his or her games played and, on average, would gain about 24 rating points for wins and would lose about 36 rating points for losses.

 

[table]

RATINGEXPECTED WIN RATEAVERAGE RATING INCREASEAVERAGE RATING DECREASE

[/table]

 

Hopefully this will make it clear why it's incredibly unlikely that anyone would have an Elo rating in the 7500+ range. Basically that player would be expected to win 999 out of every 1,000 games, and the player wouldn't gain any rating (on average) for wins, while losing the maximum for any losses. As a more extreme example, I know of at least one player with a rating around 22000. If that were an actual Elo rating, that player would be expected to lose only once every 10 billion games played.

 

For players with a very high rating, this should also help you understand why your rating will probably go down over time. Basically you will get a very small rating increase when you win, while taking a significant rating decrease each time you lose. Unfortunately, the system can sometimes take a while to adjust. For example, if you were overrated by 6000 points when the system was changed, it will take, at the very least, 100+ games before your rating adjusts (realistically, it would probably take between 300-500 games for your rating to actually adjust).

 

For players who actually care about rating, this adjustment process could be pretty painful (e.g. months of playing where your rating pretty much only decreases regardless of whether you win or lose), meaning it might make more sense to ask the developers to reset all player ratings.

Link to comment
Share on other sites

Dam that dropon'd dude won 1 game/lost 1 and he's already top 10 player. I've never seen anyone gain that much elo between 2 games in my life. That guys a f'ing boss! Can he carry me to platinum for LoL? I mean it should only take one or two games using this system's logic. Soede is ranked 98 loooooool. I always knew he was a scrub. This system totally represents his skill level. Juanjocho is a waaaay better player than Soede and now we finally have the proof to back up the claim! This excel file should be the new wallpaper for AoS. I would totally like seeing this over Kerrigan's infested ass anyday. Kramer, this is your best work since Seinfeld. 10/10.

Link to comment
Share on other sites

Dam that dropon'd dude won 1 game/lost 1 and he's already top 10 player. I've never seen anyone gain that much elo between 2 games in my life. That guys a f'ing boss! Can he carry me to platinum for LoL? I mean it should only take one or two games using this system's logic. Soede is ranked 98 loooooool. I always knew he was a scrub. This system totally represents his skill level. Juanjocho is a waaaay better player than Soede and now we finally have the proof to back up the claim! This excel file should be the new wallpaper for AoS. I would totally like seeing this over Kerrigan's infested ass anyday. Kramer, this is your best work since Seinfeld. 10/10.

 

Are you talking about the NA IH rankings? Those are ages old now (and quite out of date). I was talking about the in-game rating system. Still, I'm happy to provide you with ammo that you can skew and/or misinterpret in order to prove your assumptions. :)

 

So is this up with new patch?

 

If yes I will delete my current stats 7000-8000 for playing with a better ELO system.

 

Yes, this is as of the latest patch. You might want to hold off on deleting your stats, however, just in case the devs implement some adjustment, rather than resetting ratings (e.g. they could adjust all ratings above 2500 to 2500 and all ratings below 800 to 800, or something along those lines).

Link to comment
Share on other sites

What assumption? You just said it was wrong. Wouldn't that make it a fact? I did read the wall of text btw, but the little tab at the bottom caught my eye, so I decided to check it out. It's kind of like those fireflies that run into the lantern, but I'm not the firefly =). If you were to do a poll on how effective your elo system is or rather how efficient it is, I'm not sure if the results would be positive. I believe these are the standard questions that arise from my time spent with people behind your back in the last 6 months. 1) There's no matchmaking system so how does this system help balance the game and what does it prove? 2) Aren't these numbers skewed since most players disconnect or rage quit games. 3) When people quit to start a game after that little starting period and it turns into an imbalanced 2v5, does the system account for the unbalanced teams? 4) How do you account for pubstomping teams that inflate their rating and then one member of the group goes solo a few games after, get wrecked, and gives a large amount points to players on that would normally be regarded as that of the lower end of the ELO system? 5) What do you say about the rumors that suggest you rigged the system to favor your friends while promoting a European bias towards "scrubs" like Soede and Chob.

Link to comment
Share on other sites

Yes, this is as of the latest patch. You might want to hold off on deleting your stats, however, just in case the devs implement some adjustment, rather than resetting ratings (e.g. they could adjust all ratings above 2500 to 2500 and all ratings below 800 to 800, or something along those lines).

 

So, if for example I have 24000 rating with a W/L ratio of 4, my rating should change to 2455? And another question, even if ratings changes, the bank files will remain the same? K/D/A W/L, talents and item builds?

Link to comment
Share on other sites

Rating system posted at start of topic...

 

The system above is inherently flawed because it assumes that peoples ratings based on game wins and losses can be used to determine the chance the team can win, but this simply isn't true because whether a team wins or looses a game is determined by teamwork and individual player skill and performance (neither of which is captured in your ratings system). Just because someone has won or lost a lot of games does not mean they are good or bad or have good or bad teamwork, it could easily mean they had good allies or bad allies in every game. There are no correlations that can be drawn from past games won or lost to chance to win present game, because no conclusions on teamwork can be made from past wins and losses and no conclusions about player skill can be drawn from past wins and losses. Basing the rating system entirely on games won and lost is totally meaningless for this reason. You cannot make conclusions about which team of five will win without looking at the skill levels of the individual players making up the teams.

 

MORE IMPORTANTLY THIS RATING SYSTEM IS NOT AN EFFECTIVE ELO SYSTEM!!!!

 

ELO by definition is a system that's meant to calculate relative skill levels of players(which is mentioned in the article you linked as well). At no point will the system listed above actually capture a players skill level since everything is based on TEAM wins and losses (the performance of 5 players as a whole). No attempt is made in system to actually find the skill level of the base player because it completely ignores individual performance.

 

An ideal ELO rating for AOS would be based largely on average KDA per game well as things like overall denys and tower assists per game(assuming tower assists are actually fixed in bank files) since it would need to count the individual players performance in games to have any meaning whatsoever as a skill metric. (creep last hit is technically also skill factor but its pretty much impossible to make any conclusions on a persons last hit skills based off bank data since games do not have a set time limit, and different heros have different CC rates making it impossible to compare on a per game basis)

 

You can factor in wins and losses on top of that to try to make assumptions on teamwork, but it really won't add much meaning because you would have to assume:

lots of wins with a good KDA etc means good teamwork (even though it could just mean good allies)

lots of losses with a good KDA etc means they got dicked over by bad teammates

lots of wins with a bad KDA etc means they got carried

los of losses with a bad KDA etc means they fed or played bad or have bad teamwork (even though it could just mean bad allies)

 

Many of you will be able to point out cases where those statements are not true. The truth is its very hard to make assumptions on teamwork from bank data even with all the other stats factored in because we cant know things like whether they save allies or run off, whether they gank or initiate or lane well, and what role they fill and how well they do it. So while teamwork is an important aspect of gameplay, we cannot reliably measure it only make vague guesses at it.

 

The only thing we can do is make judgement on their past performance:

I can easily conclude that someone that has 5X as many kills/assits as deaths per game, lots of tower assists per game, and lots of denys overall is probably a good player. Likewise I can also conclude that someone that has 5X as many deaths as kills/assists per game, very few tower assists per game, and no denys overall is probably a bad player.

 

YES KDA spreads are very subjective to hero role etc, but they still give a better sense of a players skill overall than team wins and losses. On average a good player will not feed in games and will have way more kills/assists than deaths because they are smart enough to not get out of position and watch for ganks. They will also deny more and help take out towers more. (they will also work better with their team and last hit more but as mentioned those parts are hard to track)

 

On average a bad player will feed far more than they kill or assist, will not help with towers much, are not likely to deny, will not be as good at last hitting, and will not work with their team. (again last hit and teamwork are very hard to track)

 

If ratings are based on these factors, it is easy to conclude that the team with better players will probably win. And thus ELO can easily be used to do things like balance teams, whereas with the current rating system that would be impossible because the value has no meaning and does not in any way reflect a persons relative skill compared to another player.

 

 

CHANGES NEEDED TO BANK FILES:

 

You need to fix it so that bank files actually record towers taken data (It would have to be based on proximity when tower dies) so that this can be used as part of determining a persons contribution to games (avg towers taken per game)

You could consider adding games raged (left early) to bank files although it wont actually help with ELO

 

This will allow you to draw more conclusions about people

Link to comment
Share on other sites

The system above is inherently flawed because it assumes that peoples ratings based on game wins and losses can be used to determine the chance the team can win, but this simply isn't true because whether a team wins or looses a game is determined by teamwork and individual player skill. Just because someone has won or lost a lot of games does not mean they are good or bad or have good or bad teamwork, it could easily mean they had good allies or bad allies in every game. There are no correlations that can be drawn from past games won or lost to chance to win present game, because no conclusions on teamwork can be made from past games won and lost and no conclusions about player skill can be drawn from past wins and losses. It is interesting that you try to skew the numbers so that consecutive wins or losses do not have the same effect, but even so it wont change the fact that wins and losses mean nothing in terms of individual skill.

 

MORE IMPORTANTLY THIS IS NOT AN ELO SYSTEM!!!!

 

ELO by definition is a system that's meant to calculate relative skill levels of players. At no point will the system listed actually capture a players skills since everything is based on TEAM wins and losses (the performance of 5 players as a whole) and no attempt is made to actually find the skill level of the base player.

 

A good ELO would be based largely on KDA and games played as well as things like deny and towers (maybe creep too) since it would need to count contribution to games to have any meaning whatsoever as an individual skill metric.

 

You can factor in wins and losses on top of that to try to make assumptions on teamwork, but it really won't add much meaning because you would have to assume:

lots of wins with a good KDA etc means good teamwork (even though it could just mean good allies)

lots of losses with a good KDA etc means they got dicked over by bad teammates

lots of wins with a bad KDA etc means they got carried

los of losses with a bad KDA etc means they fed or played bad or have bad teamwork (even though it could just mean bad allies)

 

Many of you will be able to point out cases where those statements are not true. Its very hard to make assumptions on teamwork from bank data even with all the other stats factored in because we have no way to know if they run off on their own or are just good at getting the killing shot and we cant know other teamwork things like whether they save allies or initiate well (assists dont necessarily denote teamwork either because they could just be nearby but not helping). So while teamwork is an important aspect of gameplay, we cannot reliably measure it to use in determining player skill all we can use is the base players performance.

 

There is nothing wrong with using performance as a skill metric though:

I can easily conclude that someone that has 5X as many kills/assits as deaths per game and lots of denys is probably a good player.

Likewise I can also conclude that someone that has 5X as many deaths as kills/assists per game and no denys is probably a bad player.

(I would factor in creep since lasthitting is a skill factor as well but its hard to draw conclusions on creep counts in bank files since you don't know how long a game went on and therefore cannot put it on a scale, I would also use towers taken as part of it but at present those are not recorded)

 

YES KDA spreads are very subjective to hero role etc, but they still give a better since of a players skill overall that just wins and losses. On average a good player will not feed in games and will have way more kills/assists than deaths. They will also deny more.

 

So yeah the rating system should be totally reworked if it is meant to be used for ELO because you cant learn anything from just wins and losses alone.

 

Also you need to add a metric for games left early (rage) to the bank files

And fix it so that bank files actually record towers taken data (It would have to be based on proximity when tower dies)

 

This will allow you to draw more conclusions about people

 

k go start coding this elo and let us know then u have it finished

Link to comment
Share on other sites

What assumption? You just said it was wrong. Wouldn't that make it a fact? I did read the wall of text btw, but the little tab at the bottom caught my eye, so I decided to check it out. It's kind of like those fireflies that run into the lantern, but I'm not the firefly =). If you were to do a poll on how effective your elo system is or rather how efficient it is, I'm not sure if the results would be positive. I believe these are the standard questions that arise from my time spent with people behind your back in the last 6 months. 1) There's no matchmaking system so how does this system help balance the game and what does it prove? 2) Aren't these numbers skewed since most players disconnect or rage quit games. 3) When people quit to start a game after that little starting period and it turns into an imbalanced 2v5, does the system account for the unbalanced teams? 4) How do you account for pubstomping teams that inflate their rating and then one member of the group goes solo a few games after, get wrecked, and gives a large amount points to players on that would normally be regarded as that of the lower end of the ELO system? 5) What do you say about the rumors that suggest you rigged the system to favor your friends while promoting a European bias towards "scrubs" like Soede and Chob.

 

The in-game system, which was the subject of my post, was not created or fixed by me. I'm not sure why people would be talking behind my back about a system in which I had no involvement. I also never suggested that new rating will be perfect or even a true reflection of player skill. However, it will be an improvement on the previous system and it will be closer to a true reflection of player skill, and my post simply explains what that means.

 

 

 

So, if for example I have 24000 rating with a W/L ratio of 4, my rating should change to 2455? And another question, even if ratings changes, the bank files will remain the same? K/D/A W/L, talents and item builds?

 

If we went by my suggestion, your rating would be changed to 2500. I'm sure the ratings could be changed or reset without reseting the rest of the information in the bank files.

 

 

 

The system above is inherently flawed because it assumes that peoples ratings based on game wins and losses can be used to determine the chance the team can win, but this simply isn't true because whether a team wins or looses a game is determined by teamwork and individual player skill. Just because someone has won or lost a lot of games does not mean they are good or bad or have good or bad teamwork, it could easily mean they had good allies or bad allies in every game. There are no correlations that can be drawn from past games won or lost to chance to win present game, because no conclusions on teamwork can be made from past games won and lost and no conclusions about player skill can be drawn from past wins and losses. It is interesting that you try to skew the numbers so that consecutive wins or losses do not have the same effect, but even so it wont change the fact that wins and losses mean nothing in terms of individual skill.

 

Actually, you are wrong. Once a system like the one I have described has incorporated a reasonable number of observations (i.e. games), it will reflect individual player contribution to the likelihood that a team will win. The best way to estimate that contribution is based on the only objective outcome that really matters--wins and losses. Further, the system accounts for the expected contribution of allies and opponents. As such, it really is a reflection of an individual's ability to influence the chance that their team will win or lose. As for me skewing numbers, I haven't skewed anything. Perhaps you need to brush up a bit on your math (or on Bayesian inference)...

 

MORE IMPORTANTLY THIS IS NOT AN ELO SYSTEM!!!!

 

ELO by definition is a system that's meant to calculate relative skill levels of players. At no point will the system listed actually capture a players skills since everything is based on TEAM wins and losses (the performance of 5 players as a whole) and no attempt is made to actually find the skill level of the base player.

 

Sorry, wrong again. Maybe you should reread my post (or look at the Wikipedia post I linked). This is an Elo system and it's quite similar to those used in other competitive team settings (e.g. LoL). I didn't go into detail about how individual player ratings are combined to form the team average because that's pretty straightforward (and my post was already long enough).

 

A good ELO would be based largely on KDA and games played as well as things like deny and towers (maybe creep too) since it would need to count contribution to games to have any meaning whatsoever as an individual skill metric.

 

Sorry, but there is no way an Elo system would be based on KDA unless it was trying to predict KDA. Since the point is to predict wins, the variable of interest has to be wins.

 

You can factor in wins and losses on top of that to try to make assumptions on teamwork, but it really won't add much meaning because you would have to assume:

lots of wins with a good KDA etc means good teamwork (even though it could just mean good allies)

lots of losses with a good KDA etc means they got dicked over by bad teammates

lots of wins with a bad KDA etc means they got carried

los of losses with a bad KDA etc means they fed or played bad or have bad teamwork (even though it could just mean bad allies)

 

Many of you will be able to point out cases where those statements are not true. Its very hard to make assumptions on teamwork from bank data even with all the other stats factored in because we have no way to know if they run off on their own or are just good at getting the killing shot and we cant know other teamwork things like whether they save allies or initiate well (assists dont necessarily denote teamwork either because they could just be nearby but not helping). So while teamwork is an important aspect of gameplay, we cannot reliably measure it to use in determining player skill all we can use is the base players performance.

 

Wait, somehow that is supposed to make the system more objective? At the end of the day, what really counts is a player's ability to contribute to a victory by their team. Of course there will be games where a player will get an undeserved win. However, the beauty of an Elo system is that it's self correcting. If a player's rating is inflated because they won a game they didn't deserve to win, over time their rating will revert to a more accurate reflection of their ability to contribute to their team winning. Trying to incorporate subjective, noisy garbage into a rating designed to predict player ability to contribute to team victory would only make the system worse.

 

There is nothing wrong with using performance as a skill metric though:

I can easily conclude that someone that has 5X as many kills/assits as deaths per game and lots of denys is probably a good player.

Likewise I can also conclude that someone that has 5X as many deaths as kills/assists per game and no denys is probably a bad player.

(I would factor in creep since lasthitting is a skill factor as well but its hard to draw conclusions on creep counts in bank files since you don't know how long a game went on and therefore cannot put it on a scale, I would also use towers taken as part of it but at present those are not recorded)

 

YES KDA spreads are very subjective to hero role etc, but they still give a better since of a players skill overall that just wins and losses. On average a good player will not feed in games and will have way more kills/assists than deaths. They will also deny more.

 

So yeah the rating system should be totally reworked if it is meant to be used for ELO because you cant learn anything from just wins and losses alone.

 

Also you need to add a metric for games left early (rage) to the bank files

And fix it so that bank files actually record towers taken data (It would have to be based on proximity when tower dies)

 

This will allow you to draw more conclusions about people

 

Look, you don't appear to understand how an Elo system works. I'd be happy to help you understand, but you need to stop assuming you understand and actually do a bit of reading (or just ask some clarifying questions rather than spouting off nonsense).

Link to comment
Share on other sites

There's no such thing as an Elo rating in a multi-player game where each player has their own individual rating. Elo ratings are meant to gauge either in a 1v1 setup or two static teams (team members are always the same). There are various rating systems for measuring individual players in variable team setups, but they aren't Elo systems. What we probably need for AoS is a TrueSkill system.

Link to comment
Share on other sites

K/D/A is not a good metric of skill.

 

A player can potentially acquire a good K/D/A by playing like a total loner and not assisting their team at all where it is necessary. An example would be somebody who is never there for teamfights and allows their entire team to die before charging in for a couple of kills.

 

A player can also have a bad K/D/A while actually being a good player. K/D/A doesn't track things like how many bosses or towers you killed, or how many allies you saved through various means. Maybe an enemy gets fed and your job as Brine is to rush in and pull that player, and you might die every time, but as long as you are fulfilling the role that your team needs you to, you are a good player.

Link to comment
Share on other sites

There's no such thing as an Elo rating in a multi-player game where each player has their own individual rating. Elo ratings are meant to gauge either in a 1v1 setup or two static teams (team members are always the same). There are various rating systems for measuring individual players in variable team setups, but they aren't Elo systems. What we probably need for AoS is a TrueSkill system.

 

It's not the elo of the player- it's a factor of their own elo which goes into the system which makes it average out into a total elo of the team, which is then used to calculate the game differences and such.]

Link to comment
Share on other sites

There's no such thing as an Elo rating in a multi-player game where each player has their own individual rating. Elo ratings are meant to gauge either in a 1v1 setup or two static teams (team members are always the same). There are various rating systems for measuring individual players in variable team setups, but they aren't Elo systems. What we probably need for AoS is a TrueSkill system.

 

Not quite. That is how Elo initially created the rating system, however it can and has been extended beyond 1v1 and static teams (I already cited LoL as an example). The reason it works is because the Elo system is based on bayesian inference which can be extended beyond a 1v1 or static team setting. At a theoretical level, TrueSkill is very similar to Elo. The main difference between the two is how they characterize uncertainty about player ability. Elo's system uses a k-value which decreases as you play more games (basically the more you play, the less your rating is adjusted with each game, reflecting greater certainty about your skill level), whereas TrueSkill uses sigma which is based on variance and will still decrease as you play more games. In other words, an Elo system is completely appropriate here and TrueSkill, mechanically, would give us a similar level of accuracy. Further, besides the fact that it wouldn't result in much more accurate ratings (if at all), TrueSkill would likely require an impractical amount of data to be stored in player bank files (not to mention the fact that TrueSkill is proprietary and hasn't been publicly revealed, so we would have to make some educated guesses in the calculation).

Link to comment
Share on other sites

Actually, you are wrong. Once a system like the one I have described has incorporated a reasonable number of observations (i.e. games), it will reflect individual player contribution to the likelihood that a team will win-see below not true The best way to estimate that contribution is based on the only objective outcome that really matters--wins and losses -see below not true. Further, the system accounts for the expected contribution of allies and opponents-not accurately. As such, it really is a reflection of an individual's ability to influence the chance that their team will win or lose-In present form its not

 

...

Wait, somehow that is supposed to make the system more objective? YES At the end of the day, what really counts is a player's ability to contribute to a victory by their team. Of course there will be games where a player will get an undeserved win. However, the beauty of an Elo system is that it's self correcting. If a player's rating is inflated because they won a game they didn't deserve to win, over time their rating will revert to a more accurate reflection of their ability to contribute to their team winning.-there are ways to improve the accuracy readily available Trying to incorporate subjective, noisy garbage into a rating designed to predict player ability to contribute to team victory would only make the system worse. -Not garbage, performance data

 

 

Game wins or losses are decided by the combined individual performance of each team member and the teamwork between members relative to another team. One person does not win a game, and a game can be lost even if that person does everything right and has flawless teamwork and performance. You cannot accurately predict a players ability to contribute to future team victory based purely on past wins and losses because by definition past wins and losses were made at the team VS team level and are completely meaningless and subjective at the individual level. It does not matter how many observations are used or how often you adjust your rating guesses, the system will never be accurate at predicting individual contribution to victory if it only accounts for past wins and losses because the individual is not necessarily responsible for any wins or losses or losses that occur and their actual contribution to a win or loss can only be seen through performance data. Because the ratings themselves are not an accurate representation of the relative skill of each player, adding them up to make predictions on who should win and adjusting ratings accordingly after will not work properly since again there is no correlation with actual performance made. Simply put the rating system has to take past performance data into account if it is going to be used to try to predict contribution to future games for ratings to actually be accurate as an ELO metric.

 

These are my thoughts on how that might be done:

 

At a high level you can roughly gauge a players individual skill and teamwork based on average KDA per game, the average number of towers per game they help kill, average denies per game, and how well they last hit compared to everyone else in a given game(which cant be used for reasons explained below)

 

You are correct that these values can be subjective to a hero and role, but not when you look at them at a high level since all that really matters is net gain to the team (contribution to victory).Towers is the most important metric since the game is won via killing towers. KDA wise at the highest level what matters is whether or not you got enough kills and assists for your team to balance out the number of times you died in a game (did you feed or not). If you get way more kills/assists than deaths on average you are beneficial to your team in most games. If you get far more deaths than kills/assists on average you are detrimental to your team in most games. Taking towers and denying creep can be used to mitigate feeding or to add extra value to the team if you weren't feeding and should be factored into the assessment of net gain or loss on average for team. Denying creep costs the enemy team minerals and makes it easier for your team to win, making it a valuable skill to factor in when finding a players relative skill level. Unfortunately since games can last over 2 hours or be over in minutes and every hero CC's differently its impossible to actually figure out how well someone last hits compared to other players from bank data, and thus we have to throw that metric out. This also limits the weight that can be applied to denies, but many players do not deny at all so it is still useful when comparing relative skill.

 

In order to track towers, a change is needed to make use of the tower slot in the bank file already by having the game add +1 to all heros in proximity when a tower is killed (so really it would track tower assists).

 

These metrics would be far more accurate than just team wins and losses at figuring out a players potential contribution to team victory. If you get way more kills/assists than deaths, help take a lot of towers each game, and deny heavily you are far more likely to contribute to a teams victory than someone that dies way more than they kill/assist, never helps with towers, and never denies.

 

Technically to be truly accurate you would want to look at trends instead of just historical data. Its far more useful to look at your last 50-100 games for these metrics than it is to look at your overall record since you would have gotten better at the game over time. At present I do not have a good solution to address this easily but I will post when I do. Either way this is still going to be more accurate than purely basing things off of wins and losses because it allows you to predict individual contribution despite random teams.

 

Look, you don't appear to understand how an Elo system works. I'd be happy to help you understand, but you need to stop assuming you understand and actually do a bit of reading (or just ask some clarifying questions rather than spouting off nonsense).

 

I am particularly curious about your research into Trueskill, as it sounds like a fascinating system. However I am not really spouting nonsense, just statistic and analytical theory. The system can guess all it wants about player contribution and try to get more accurate over time, but it will never be right if it only factors in team wins and losses because AOS has purely random teams in every match and no team balancing/matchmaking.

 

You referenced LOL as having a similar ELO but they use individual player performance data as part of their ELO it is NOT based purely on team wins and losses. Its also important to note that unlike AOS LOL does matchmaking and tries to pair up even teams (teams are balanced with no team having odds greater than 55% to win). This makes winning and loosing a much more accurate representation of skill than it would be in AOS where you get stuck with whoever enters the lobby.

 

I would very much like our rating system to have meaning and be somewhat accurate so that at some point we might be given the option to auto balance teams based off of it in pubs (to try to mitigate totally one sided games). That is why I am arguing this point.

Link to comment
Share on other sites

so your a math student/work in any profession that involves statistical analysis?

Second one

 

And I like ELO's I am just pointing out that ours will not work right (I made some edits on the other page to make it more clear after you replied)

 

I have mentioned multiple times that:

 

Game wins or losses are decided by the combined individual performance of each team member and the teamwork between members relative to another team. One person does not win a game, and a game can be lost even if that person does everything right and has flawless teamwork and performance. You cannot accurately predict a players ability to contribute to future team victory based purely on past wins and losses because by definition past wins and losses were made at the team VS team level and are completely meaningless and subjective at the individual level. It does not matter how many observations are used or how often you adjust your rating guesses, the system will never be accurate at predicting individual contribution to victory if it only accounts for past wins and losses because the individual is not necessarily responsible for any wins or losses or losses that occur and their actual contribution to a win or loss can only be seen through performance data. Because the ratings themselves are not an accurate representation of the relative skill of each player, adding them up to make predictions on who should win and adjusting ratings accordingly after will not work properly since again there is no correlation with actual performance made. Simply put the rating system has to take past performance data into account if it is going to be used to try to predict contribution to future games for ratings to actually be accurate as an ELO metric.

 

 

I also provided metrics to use in gauging relative skill level based on past performance to use to make ratings more effective.

And pointed out how LOL uses performance data as part of their ELO and also uses both ratings and a level system in matchmaking to set up balanced games where each team has an equal chance to win, whereas we rely on pure random chance for all things.

 

I would very much like our rating system to actually have meaning which is why I took the time to write out a way to make it better on the last page. If ratings actually denote relative skill level we may finally be able to implement an auto balance teams option in pubs someday.

Link to comment
Share on other sites

Your point is moot because, as I explained thoroughly in my post, K/D/A is not a useful metric. Additionally, if somebody frequently plays with the same team of better (or worse) players, then the ELO will reflect that accurately. ELO is used to measure expected win/loss rates, and if on any given day you would expect that same team to be playing, then ELO has done its job.

 

Edit: To reword what I am saying above,

 

If you are suggesting that ELO will be influenced by the players that somebody regularly teams up with, then you're absolutely correct. But this just means that the ELO is then accounting for the fact that they regularly play with good players, and then will still more accurately estimate whether they will win or lose. This doesn't just apply to one team you play with, it applies to any statistical variance in the skill level of players you join up with. If you are more likely to end up on a team with good players for any reason, then the ELO will then reflect it.

Link to comment
Share on other sites

Your point is moot because, as I explained thoroughly in my post, K/D/A is not a useful metric. Additionally, if somebody frequently plays with the same team of better (or worse) players, then the ELO will reflect that accurately. ELO is used to measure expected win/loss rates, and if on any given day you would expect that same team to be playing, then ELO has done its job.

 

First, that pretty much means that AOS ELO will only accurate ish in IH, and is totally worthless in pubs. Where as the changes I proposed would be accurate in both. Which is what I keep pointing out in the first place, that ELO based purely on wins and losses is only accurate if teams are mostly static or players are of similar skill levels. With random teams individual performance has to be taken into account.

 

To be fair even If you play with the same team against the same team over and over the system will not be able to tell who is the one actually contributing the most to victory without consulting individual performance per game.

 

Even if you swapped out people one at a time you still wouldn't be sure

 

Blizzard uses wins and losses in ladder because your teammates don't change.

 

LOL uses a completely different system to try to give wins and losses meaning by pairing up teams of similar skill

But on top of that it also takes personal performance into account

 

 

Second:

 

You did not post about KDA in this thread but KDA is useful as a metric if looked at on a per game basis

 

QUOTE FROM LAST PAGE:

At a high level you can roughly gauge a players individual skill and teamwork based on average KDA per game, the average number of towers per game they help kill, average denies per game, and how well they last hit compared to everyone else in a given game(which cant be used for reasons explained below)

 

You are correct that these values can be subjective to a hero and role, but not when you look at them at a high level since all that really matters is net gain to the team (contribution to victory).Towers is the most important metric since the game is won via killing towers. KDA wise at the highest level what matters is whether or not you got enough kills and assists for your team to balance out the number of times you died in a game (did you feed or not). If you get way more kills/assists than deaths on average you are beneficial to your team in most games. If you get far more deaths than kills/assists on average you are detrimental to your team in most games. Taking towers and denying creep can be used to mitigate feeding or to add extra value to the team if you weren't feeding and should be factored into the assessment of net gain or loss on average for team. Denying creep costs the enemy team minerals and makes it easier for your team to win, making it a valuable skill to factor in when finding a players relative skill level. Unfortunately since games can last over 2 hours or be over in minutes and every hero CC's differently its impossible to actually figure out how well someone last hits compared to other players from bank data, and thus we have to throw that metric out. This also limits the weight that can be applied to denies, but many players do not deny at all so it is still useful when comparing relative skill.

 

In order to track towers, a change is needed to make use of the tower slot in the bank file already by having the game add +1 to all heros in proximity when a tower is killed (so really it would track tower assists).

 

These metrics would be far more accurate than just team wins and losses at figuring out a players potential contribution to team victory. If you get way more kills/assists than deaths, help take a lot of towers each game, and deny heavily you are far more likely to contribute to a teams victory than someone that dies way more than they kill/assist, never helps with towers, and never denies.

 

Technically to be truly accurate you would want to look at trends instead of just historical data. Its far more useful to look at your last 50-100 games for these metrics than it is to look at your overall record since you would have gotten better at the game over time. At present I do not have a good solution to address this easily but I will post when I do. Either way this is still going to be more accurate than purely basing things off of wins and losses because it allows you to predict individual contribution despite random teams.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

×
×
  • Create New...