Clustering Football Players by Using FIFA 19 Data

Clustering Football Players by Using FIFA 19 Data

A comprehensive machine learning project to explore football skills and cluster football players based on their attributes. The source code of this project can be accessed via Kaggle.

Introduction

FIFA 19 is a football video game developed by Electronic Arts and released on September 28th, 2018 on platforms such as PC, PlayStation, Xbox [1]. In order to provide a realistic and immersive game experience, Electronic Arts creates digital representations of football players based on their skills in the real world as much as possible. Each football player is rated mostly between 0 and 100 in terms of different features which characterizes their playing skills such as shot power, short passing, tackling skills etc. According to an interview made with Michael Mueller-Moehring who is one of the producers of Electronic Arts in charge of rating players in FIFA games, player abilities are determined by the cooperation of a network consisting of more than 9000 data reviewers who are actually coaches, professional scouts and football fans actively visiting stadiums to watch many footballers playing and approximately 300 editors working for Electronic Arts who help Mueller-Moehring to assign rating values for the players in the game by utilizing data reviewers’ feedback and stats in real football matches [2].

The data generated for FIFA 19 with nearly 18,000 players and millions of data points forms a rich dataset to be analyzed and extracted insights by data scientists. The aim of this project is to cluster football players playing in forward positions such as strikers, forwards and wingers in FIFA 19 based on their skills and attributes. Fulfilling this aim to some extent can contribute to the following areas:

  • The football specialists such as managers and scouts in the industry (in the real world) can utilize from this analysis as a first step to find and target new players whose skills and attributes are matched with their expectations. Besides, managers can use similarities within and differences between the clusters to form different line ups and tactics. Since the FIFA 19 rating information is generated by the experts in the field based on stats and reviews, the related analyses with this dataset are valid for not only the game but also the real world.?
  • Gamers who play FIFA 19 modes such as Career and FUT which are simulations of creating and managing a team can utilize from this analysis to enhance their gaming experience similar to the real world example above.?

Data

The FIFA 19 dataset used in this project is publicly available at Kaggle [3].

The original dataset has been filtered in terms of features and samples in line with the scope and aim of the project. Related features are determined and selected in the beginning and players who do not play on forward positions according to FIFA’s criteria are eliminated. Some of the features are used only in Exploratory Data Analysis (preprocessing) and Results stages to get a better understanding about the data, determine the most distinctive features for players and discuss the outcomes. Some of the features are used in Modelling as well as exploration and discussion. The features are sorted below based on their categories. The features starting from Name up to Position are called personal information, and the features starting from ST up to LW are called positional skills, and the rest are called playing skills in the scope of this project. The explanations for the features in the dataset are obtained from [4]. The attributes used in the project are:

  • Name: Name of the player. Used only in Results step.
  • Age: Age of the player.
  • Height: Height of the player in inches (transformed to centimeters in preprocessing).
  • Overall: General performance quality and value of the player representing the key positional skills and international reputation rated between 1-99. Overall attribute is used only in preprocessing and discussion stages because using it in modelling could lead to domination by this feature. The aim of the project is not basically sort and categorize the players using their overall talent and international reputation, but to cluster them based on using their whole skillset.
  • Potential: Maximum Overall rating expected to be reached by a player in the top of his career rated between 1-99.
  • PreferredFoot: Right or Left. Label encoder is applied as 0 for left and 1 for right.
  • WeakFoot: Represents how well a player uses his weak foot (e.g. left for righties) rated between 1 to 5.
  • WorkRate: Degree of the effort the player puts in terms of attack and defense rated as low, medium and high. This feature is divided into two new features as AttackWorkRate and DefenseWorkRate. Besides, label encoder is applied as 0 for low, 0.5 for medium and 1 for high.
  • Position: Position of the players on the pitch which determines their roles and responsibilities in the team. Forward positions in the football and FIFA 19 can be grouped as striker (ST: center striker, RS: right striker, LS: left striker), forward (CF: center forward, RF: right forward, LF: left forward) and winger (RW: right winger, LW: left winger). The word, forward, is used both as a general term and a special position. Strikers are positioned in front of forwards and wingers and very closed to the opposing goal. Their main responsibilities are attacking and scoring goals, that’s why their ball control, shooting and finishing skills are expected to be well. Center forwards are positioned right behind the strikers. They are expected to receive balls from the others and score assists to the others or goals. In addition to the skills expected from strikers, they have to be good at passing. Right forwards and left forwards are positioned at the right and left of the center forwards with the same expectations.?Wingers are positioned near the touchlines to create chances for strikers and forwards from the right and left side of the field by breakthrough and crosses and to score goals. They are expected to be good at dribbling, acceleration, passing and crossing. Positions are used only in preprocessing and discussion stages. More information is given in the following parts about the positions. The formation of positions on the pitch can be seen in in Fig. 1.?

No alt text provided for this image

Fig. 1. Positions in football [5]

  • ST: Positional skill. Player’s general ability when playing in ST position rated between 1-99.
  • RS: Positional skill. Player’s general ability when playing in in RS position rated between 1-99.
  • LS: Positional skill. Player’s general ability when playing in in LS position rated between 1-99.
  • CF: Positional skill. Player’s general ability when playing in in CF position rated between 1-99.
  • RF: Positional skill. Player’s general ability when playing in in RF position rated between 1-99.
  • LF: Positional skill. Player’s general ability when playing in in LF position rated between 1-99.
  • RW: Positional skill. Player’s general ability when playing in in RW position rated between 1-99.
  • LW: Positional skill. Player’s general ability when playing in in LW position rated between 1-99.
  • Crossing: Crossing skill of the player rated between 1-99. Cross is a long-range pass from wings to center.
  • Finishing: Finishing skill of the player rated between 1-99. Finishing in football refers to finish an attack by scoring a goal.
  • HeadingAccuracy: Player’s accuracy to pass or shoot by using his head rated between 1-99.
  • ShortPassing: Player’s accuracy for short passes rated between 1-99.
  • LongPassing: Player’s accuracy for long passes rated between 1-99.
  • Dribbling: Dribbling skill of the player rated between 1-99. Dribbling is carrying the ball without losing while moving in one particular direction.
  • SprintSpeed: Speed rate of the player rated between 1-99.
  • Acceleration: Shows how fast a player can reach his maximum sprint speed rated between 1-99.
  • FKAccuracy: Player’s accuracy to score free kick goals rated between 1-99.
  • BallControl: Player’s ability to control the ball rated between 1-99.
  • Balance: Player’s ability to remain steady while running, carrying and controlling the ball rated between 1-99.
  • ShotPower: Player’s strength level of shooting the ball rated between 1-99.
  • Jumping: Player’s jumping skill rated between 1-99.
  • Penalties: Player’s accuracy to score goals from penalty rated between 1-99.
  • Strength: Physical strength of the player rated between 1-99.
  • Agility: Gracefulness and quickness of the player while controlling the ball rated between 1-99.
  • Reactions: Acting speed of the player to what happens in his environment rated between 1-99.
  • Aggression: Aggression level of the player while pushing, pulling and tackling rated between 1-99.
  • Positioning: Player’s ability to place himself in the right position to receive the ball or score goals rated between 1-99.
  • Vision: Player’s mental awareness about the other players in the team for passing rated between 1-99.
  • Volleys: Player’s ability to perform volleys rated between 1-99.
  • LongShots: Player’s accuracy of shoots from long distances rated between 1-99.
  • Stamina: Player’s ability to sustain his stamina level during the match rated between 1-99. Players with lower stamina get tired fast.
  • Composure: Player’s ability to control his calmness and frustration during the match rated between 1-99.
  • Curve: Player’s ability to curve the ball while passing or shooting rated between 1-99.
  • Interceptions: Player’s ability to intercept the ball while opposite team’s players are passing rated between 1-99. It is a defensive skill.
  • StandingTackle: Player’s ability to perform tackle (take the ball from the opposite player) while standing rated between 1-99. It is a defensive skill.
  • SlidingTackle: Player’s ability to perform tackle by sliding rated between 1-99. It is a defensive skill.
  • Marking: Player’s ability to apply strategies to prevent opposing team from taking the ball rated between 1-99. It is a defensive skill.??

Exploratory Data Analysis

Exploratory data analysis is conducted to extract insights from the data and to see the relationship between variables, and these findings are used in the Modelling stage.

First of all, there is no missing data in the dataset.

The count plot in Fig. 2 visualizes the number of players in each position in which many of the forward players are strikers followed by wingers and a smaller number of center, left and right forwards.?

No alt text provided for this image

Fig. 2. Count plot for Position

The pair plot of positional skills such as ST, CF, etc. demonstrates that the general ability of a player when he plays in different positions is highly positively correlated with the general ability of him in his own position. For example, a player who is playing in ST position with high ST skill is also good at in other forward positions such as CF and RW. The correlation within the groups (striker group: ST, LS, RS; forward group: CF, RF, LF; winger group: RW, LW) is approximately 1. The skill similarity between wingers and forwards are also high with low variance. The highest difference is between striker skills and winger skills with higher variance. Besides, all positional skill ratings are close to normal distribution.

No alt text provided for this image

Fig. 3. Pair plot of positional skills

The box plot below carries the analysis one step further while supporting the findings in Fig. 3. Fig. 4a shows that the median RS skill of players who play in LS position are higher than the median of RS position players. It is also valid for LF position players. Similar relationship is also seen in CF skill in Fig. 4b. The result deduced here is the playing position of a player does not matter too much, but the skill ratings are more crucial. Positions cannot be used for grouping or clustering players alone because it would be lacking. For example, if a manager looks for a new striker and checks only for players playing in ST, RS or LS positions in this dataset, he/she will miss the opportunity to find players playing in other positions like CF, LF who also have high skills in striker positions and could be successful at striker positions. Clustering based on using all skills prevents this situation. So, the Position variable is only used in preprocessing, on the other hand the positional and playing skills are used in the modelling in this project.

No alt text provided for this image

Fig. 4. (a) RS skill vs Position; (b) CF skill vs Position

The correlation between positional skills and playing skills is in line with the responsibilities expected from the positions as explained above. For example, strikers are expected to have high skills at finishing, shot power and ball control and heatmap below demonstrates that ST, LS, and RS are highly positively correlated with Finishing, ShotPower, LongShots, Positioning, Reactions and Composure. Similarly, CF, RF and LF are highly correlated with BallControl, Dribbling, ShortPassing, Positioning, Reactions and Composure. Lastly, RW and LW have high correlation with Crossing, Dribbling, ShortPassing, BallControl and Vision. This analysis allows us to determine important skills in football in terms of positions. Again, as explained in Fig. 3, positional skills within the groups are highly correlated, also winger and forward positional skills have high similarity with 97% correlation value which is followed by the relationship between forward and striker skills with 93% correlation value. So, the forward players resembles both striker and winger players like a bridge between them in terms of both location on the pitch and skills they have. In addition, defensive skills such as Interception, Marking, SlidingTackle and StandingTackle have high correlation. The most negative correlations are between Height and Balance with Balance and Stamina which means players’ height affect their Balance negatively and players with high balance put too much effort to control the ball while running and carrying the ball and get tired quickly.

In addition to the relationship shows up between positional skills and playing skills, the relation between playing skills and positions is examined. For that purpose, players with Overall attribute higher than 65 is filtered to get rid of players with relatively low skills. Then, filtered high skill players are grouped by their Position and median value for each playing skill is calculated for each position. The top 5 attribute is shown for each position in Fig. 6, Fig. 7, and Fig. 8. For example, the highest median values are obtained in Strength, SprintSpeed, ShotPower, Jumping and Finishing for ST position. This analysis allows us to differentiate crucial skills again in terms of positions. As a result, the top features for striker group are SprintSpeed, Strength, ShotPower, Positioning, Finishing and Jumping. For forward group, they are Agility, Balance, Acceleration, Dribbling and BallControl. For wingers, they are Acceleration, Agility, SprintSpeed, Balance and Dribbling.

Based on two analyses above, the distinctive playing skills for striker group are ShotPower, Positioning, Finishing, Strength and for forward group ShortPassing, BallControl, Positioning, Acceleration, SprintSpeed, Agility, Dribbling, Balance, and for winger group Crossing, LongPassing, ShortPassing, Acceleration, SprintSpeed, Agility, Dribbling, Balance. Again, winger and forward group features are similar.?????

The same analysis is repeated to find 5 features with the lowest median values for each position and as a result, defensive skills, Interception, Marking, SlidingTackle and StandingTackle are the features at the bottom. The example can be seen for ST position in Fig. 9.????

The relation between discrete variables such as WeakFoot, AttackWorkRate, DefenseWorkRate and Overall attribute shows that players who can use their weak foot very well has high overall rating. Besides, players with high overall skills put high working effort during the game both in attack and defense. The interesting part is although some players have high skill rates, they put low effort, but it does not change the fact that they are talented and add value to their teams. The median Overall values for each element of these discrete variables can be seen from Fig. 10.?

No alt text provided for this image

Fig. 5. Heatmap

No alt text provided for this image

Fig. 6. Top five skills for striker group (a) ST; (b) RS; (c) LS

No alt text provided for this image

Fig. 7. Top five skills for forward group (a) CF; (b) RF; (c) LF

No alt text provided for this image

Fig. 8. Top five skills for forward group (a) RW; (b) LW

No alt text provided for this image

Fig. 9. The lowest five skills for ST position

No alt text provided for this image

Fig. 10. Overall vs. (a) WeakFoot; (b) AttackWorkRate; (c) DefenseWorkRate

Table 1 expresses that AttackingWorkRate is positively related to median forward and winger playing skills such as Acceleration, SprintSpeed, Balance, Dribbling, Agility, so these skills require more effort during the match in terms of energy and work. On the other hand, striker skills such as ShotPower, Finishing, Positioning, BallControl and Strength requires less effort, but more intelligence and natural talent. Defensive reciprocal of this analysis can be seen in Table 2. As expected, forward players with higher defensive skills put more defensive effort in the pitch. In addition, winger and forward positional skills have higher correlation with defensive skills as seen in heatmap in Fig. 5. So, we can say that winger and forward players takes more defensive responsibilities than strikers. As both analyses show, since work rate is dependent on physical endurance of a player, players having high Stamina are able to put more effort and work rate.

No alt text provided for this image

Table 1. AttackWorkRate and Playing Skills

No alt text provided for this image

Table 2. DefenseWorkRate and Playing Skills

Preferred Foot is also a distinctive skill to be used in clustering. The positional right and left locations is not directly related with the preferred foot of the players in some positions. For example, lefty players in RF position have higher RF skill than righty ones in the same position. Similarly, righty players in LW position have higher LW skill than lefty ones in the same position.

Fig. 12 shows median Overall and Potential skills versus age and it seems that players reach their potential skills around at 26-27 years old.?

No alt text provided for this image

Fig. 11. Box plot showing positions skills based on PreferredFoot (a) RF; (b) LW

No alt text provided for this image

Fig. 12. Overall and Potential vs. Age

Modelling

The findings obtained from Exploratory Data Analysis is used to determine which features will be inputs to the model and how. The distinctive features which can be useful in clustering are selected as all features explained above except Name, Overall and Position.

K-Means clustering, hierarchical clustering with average linkage and Ward’s method are applied as clustering algorithms on the dataset. K-Means algorithm clusters data by trying to separate samples in K number of groups of equal variance, minimizing a criterion known as the inertia (within-cluster-sum-of-square-error) which is distance based and also requires the number of clusters to be specified [6]. Since K-Means algorithm has problems when clusters have different sizes or densities and the dataset has outliers, hierarchical clustering methods which are less susceptible to outliers and noise are chosen to compare with K-Means. Hierarchical clustering is clustering approach that creates clusters by either using bottom-up technique (Agglomerative) or top-down approach (Divisive). At the beginning of bottom-up technique which is the type of average linkage and Ward’s method, all data points are assumed as separate clusters and next, two nearest clusters in terms of similarity are merged until that all the data points have merged and created only one cluster [7]. Similarity used in these techniques are distance-based. In average linkage, the average distances between cluster points are used whereas in Ward’s method, similarity of two clusters is based on the increase in squared error when two clusters are merged. There is no need to define number of clusters in hierarchical clustering at the beginning.

Since all methods are distance-based, normalization should be applied to the dataset to prevent high magnitude features from affecting and dominating the distance calculations negatively. Data standardization is a term which is used as a specific type of data normalization where it scales values of the features into a similar range such as [0, 1]. Standardization removes the mean from each sample and scales the data to unit variance [8]. Standardization also considers that all samples for each attribute are normally distributed. However, standardization can be influenced by outliers easily. When the given dataset contains outliers, the results of standardization can be misguiding. In these kinds of conditions, it is better to use different approaches which are robust against outliers such as robust standardization which uses the interquartile range. It scales features using statistics that are robust to outliers. This method removes the median and scales the data in the range between 1st quartile and 3rd quartile. i.e., in between 25th quantile and 75th quantile range [9]. Since FIFA 19 dataset contains outliers in some of its features, both standardization and robust standardization are applied on the dataset and compared to see the outlier effect.?

No alt text provided for this image

Fig. 13. Features with outliers (a) Finishing; (b) ShortPassing

The number of dimensions can be reduced by applying Principal Component Analysis (PCA) especially to highly correlated features. PCA method is obtaining components which explain the highest variance or most of the information in the data with a smaller number of attributes. As explained in Exploratory Data Analysis, positional skills are highly correlated within each group. So, PCA is used to obtain Striker component from ST, RS, LS, and Forward component from CF, RF, LF, and lastly Wing component from RW and LW. The explained variance ratio is 100% for each PCA application with both standardization method. The same method is also used for defensive skills, Marking, Interceptions, SlidingTackle and StandingTackle because these features are highly correlated and one component obtained from them which will be called Defensive is enough to represent a forward player’s defensive skills with DefenseWorkRate. The explained variance ratio is 70.3% for standardization and 71.6% for robust standardization.

In order to determine the number of clusters, inertia is calculated for different K values from 1 to 9 with K-Means algorithm. Besides, Silhouette Coeficient is calculated for K values from 2 to 6. Silhouette Coefficient for a set of samples can be found by taking the average of Silhouette Coefficient for each sample by using the formula below. The results for each standardization method can be seen in Fig. 15 and Table 3. According to Elbow method and Silhouette Coefficient values, the number of clusters is determined as 4 for K-Means algorithm and robust standardization is chosen as normalization method for all algorithms.?

No alt text provided for this image

Fig. 14. Formula for (a) Inertia; (b) Silhouette Coefficient for one sample [10]

No alt text provided for this image

Fig. 15. Inertia vs. K for K-Means

No alt text provided for this image

Table 3. Silhouette Scores and K values in K-Means with different normalization methods

After selecting the number of clusters for K-Means algorithm?and normalization method for all algorithms, the clustering methods, K-Means, hierarchical clustering with average linkage and Ward’s method are compared based on a performance metric which is again Silhouette Coefficient. The results can be seen in Table 4. According to the results, K-Means algorihtm with 4 clusters is selected as the final model.

No alt text provided for this image

Table 4. Silhouette Scores and K values with different algorithms (with robust standardization)

Results

The characteristics of each cluster are examined in this part. First of all, each cluster has close number of samples, players. As expected, each cluster has players from each position, since players are grouped according to their abilities and attributes, not positions. However, there are more strikers in Cluster 0 than each cluster and Cluster 1 has more wingers and forwards. Fig. 16 shows the number of players in each cluster and their distribution based on the position (Striker=ST, RS, LS; Forward=CF, RF, LF; Wing=RW, LW).

No alt text provided for this image

Fig. 16. Count plot of (a) clusters; (b) clusters in terms of positions

Table 5 shows the average Overall, Age and Potential for each cluster. Fig. 17 demonstrates more details about these features of clusters with a pair plot. From this information, it is understood that Cluster 3 includes high skilled and more mature players and Cluster 2 includes low skilled and mature players, also very young players with high potential. Cluster 0 and Cluster 1 show similar characteristics based on these features except Cluster 1 has younger players than Cluster 0 does. In order to differentiate Cluster 0 and Cluster 1 more and get a better understanding about the abilities of each cluster, average distinctive playing skills are examined for each cluster in the radar plots in Fig. 18 and Fig. 19. In the first radar plot, average distinctive playing skills are showed for each cluster, however in the second one, they are shown for only Cluster 0 and Cluster 1. Approximately for each skill, the highest averages, that is highest skilled players, are in Cluster 3 and the lowest ones are in Cluster 2. The interesting part is the average distinctive playing skills for strikers such as ShotPower, Finishing, Positioning and Strength are higher for Cluster 0 than Cluster 1, on the other hand the average distinctive playing skills for wingers and forwards such as Acceleration, Agility, Dribbling, SprintSpeed, LongPassing, ShortPassing and Crossing are higher for Cluster 1 than Cluster 0. The same result can be obtained with Striker, Forward and Wing features produced by PCA which is shown in Table 6. So, it can be concluded that Cluster 0 consists of players mostly with striker-related skills and Cluster 1 consists of players mostly with forward-related and winger-related skills. This differentiation is independent from the positions of the players, but it is directly related with the positional and playing skills of them. Besides, Striker and Wing feature differences between two clusters are higher than Forward feature difference because forward position players are like a bridge between striker and winger position players, and also forward positional skills are highly correlated with both striker and winger positional skills which can be seen in the heatmap of Fig. 5.?

No alt text provided for this image

Table 5. Mean Overall, Age and Potential in each cluster

No alt text provided for this image

Fig. 17. Pair plot based on clusters

No alt text provided for this image

Fig. 18. Radar plot for each cluster

No alt text provided for this image

Fig. 19. Radar plot for Cluster 0 and Cluster 1

No alt text provided for this image

Table 6. Average Striker, Forward and Wing skills with respect to each cluster

The other average playing skills, work rates and Defensive feature result of PCA for each cluster can be seen in the following tables. The differences between each cluster are in line with the previous findings. For example, skills more related to striker positions and highly correlated with Striker positional skills such as HeadingAccuracy, Reactions, Jumping, Aggression average is higher for Cluster 0. In contrast, Curve and Vision are higher for Cluster 1 as more forward and winger related skills. DefenseWorkRate and Defensive skill is higher for Cluster 1 which is in line with the previous results about the defensive skills and positions.

No alt text provided for this image

Table 7. Average playing skills

No alt text provided for this image

Table 8. Average work rates and defensive skills

By looking closer to Fig. 20, the details of Cluster 3, which has high skilled players, and Cluster 2, which has low skilled players as well as very young players which have low rating skills right now but high potential for the future. Cluster 3 includes top players such as Lionel Messi, Cristiano Ronaldo, Neymar Jr, Eden Hazard, etc. In addition to them, there are some players which have lower Overall rating close to the other clusters. When these players are explored in detail, it can be seen that they are specialized and skilled too high in some playing skills and that’s why they are clustered with the top players in this model although they do not have high Overall ratings. For example, D. Braaten who has 67 Overall rating has Strength rating of 88. Similarly, J. Berget who has 69 Overall rating has Stamina rating of 93. Lastly, Nino, who has 70 Overall rating and is 38 years old which can be counted as old for football, has Composure rating of 83 and Balance rating of 80. The model allows us to find these special players which could be omitted by traditional filtering methods or clustering methods which take Overall rating as input.

No alt text provided for this image

Fig. 20. Pair plot based on Cluster 2 and Cluster 3

Conclusion

This project presents a clustering method for football players who play in forward positions based on their skills and attributes by using FIFA 19 dataset which is in the essence a video game but provides a realistic representation of football players. By exploring the data, the most distinctive features for players are determined to use in the modelling. Standardization and feature extraction methods are applied on the dataset to prepare it for modelling. K-Means and hierarchical clustering algorithms are compared and K-Means algorithm with 4 clusters is selected as the most appropriate model. Forward players are grouped as 1) highly skilled top, elite players and players specialized at some skills very well, 2) players with mostly striker-related skills, 3) players with mostly forward-related and winger-related skills, 4) low skilled players and high potential young players.

References

Cover photo: Photo by JESHOOTS.COM on Unsplash

[1] C. Vaz, “FIFA 19 Review – Just Football Perfection,” wccftech.com, Oct. 2, 2018. [Online]. Available: https://wccftech.com/review/fifa-19-review-its-football-perfection/. [Accessed: Jan. 10, 2021].

[2] S. Saed, “EA explains how FIFA player ratings are calculated,” vg247.com, Sep. 27, 2016. [Online]. Available: https://www.vg247.com/2016/09/27/how-ea-calculates-fifa-17-player-ratings/. [Accessed: Jan. 8, 2021].

[3] FIFA 19 Complete Player Dataset, 2018. [Online]. Available: https://www.kaggle.com/karangadiya/fifa19

[4] FIFA Encyclopedia. [Online]. Available: https://www.fifplay.com/encyclopedia/

[5] Position. [Online]. Available: https://www.fifplay.com/encyclopedia/position/

[6] K-means. [Online]. Available: https://scikit-learn.org/stable/modules/clustering.html#k-means

[7] S. Kaushik, “An Introduction to Clustering and different methods of clustering,” analyticsvidhya.com, Nov. 3, 2016. [Online]. Available: https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-clustering/. [Accessed: Jan. 11, 2021].

[8] Compare the Effect of Different Scalers on Data with Outliers. [Online]. Available: https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html

[9] “StandardScaler, MinMaxScaler and RobustScaler Techniques – ML,” geeksforgeeks.com, Jul. 16, 2020. [Online]. Available: https://www.geeksforgeeks.org/standardscaler-minmaxscaler-and-robustscaler-techniques-ml/. [Accessed: Jan. 15, 2021].

[10] Clustering Performance Evaluation. [Online]. Available: https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation









要查看或添加评论,请登录

O?uz Can Yurteri的更多文章

社区洞察

其他会员也浏览了