Document Type : Research article
Authors
Center for Remote Sensing and GIS Research, Faculty of Earth Sciences, Shahid Beheshti University, Tehran, Iran
10.22059/jurbangeo.2024.368861.1885
Abstract
ABSTRACT
Clustering is a vital technique for revealing structures and discerning groupings within extensive datasets, particularly in spatial data analysis, where the primary objective is to segregate data into clusters with shared characteristics. Artificial neural networks are established tools for clustering large and multidimensional datasets. This research focuses on clustering census block data, encompassing 21 socio-economic variables and access to services relevant to sustainable urban development. The study employs the Neural Gas (NG) network without spatial parameters. Then, it introduces the geographic coordinates of census blocks as spatial parameters, comparing the outcomes of the two approaches (NG & CNG). The NG algorithm, a prevalent choice for clustering high-dimensional data, and its spatially enhanced version, the Contextual Neural Gas (CNG) algorithm, were employed in clustering Isfahan city's census blocks. Results indicated a notable distinction in the clusters derived from the implementation of the NG and CNG algorithms. Clustering with the NG algorithm yielded heterogeneous clusters, whereas the CNG algorithm produced homogeneous clusters benefiting from spatial parameters. Evaluation of clustering quality, performed by calculating the average Silhouette coefficient for census blocks, showed the superior performance of the CNG algorithm, attaining a silhouette coefficient of 0.29 compared to the NG algorithm's -0.02. This research affirmed the positive impact of spatial parameters on creating homogeneous clusters within the urban environment. Leveraging the CNG algorithm and extracting homogenous areas based on sustainable development variables contributed to streamlined urban planning and management. The clustering of census blocks using variables related to sustainable urban development and a location-based approach using the CNG algorithm is one of the innovations of this research
Extended Abstract
Introduction
In recent years, there has been a dramatic increase in the volume of available spatial data. Consequently, it is necessary to comprehensively assess spatial data, considering each location's distinctive characteristics, to extract meaningful insights. With the abundance and diversity of urban spatial data, the primary challenge lies in effectively representing the knowledge derived from these data and illuminating the relationships between the data and their respective locations, incorporating various studied variables. Spatial data mining employs artificial neural networks (ANN) to unveil patterns and unknown relationships within data, transforming this information into new and potentially valuable knowledge. Clustering, a pivotal aspect of unsupervised machine learning, is an effective method for extracting knowledge from spatial data, aiming to segregate data into clusters with similar characteristics. It is crucial to note that the clustering algorithm for spatial data diverges fundamentally from that used for non-spatial data. This study focuses on clustering the census blocks of Isfahan city based on sustainable development data, encompassing socioeconomic information and access to services. The process employs the Contextual Neural Gas (CNG) algorithm, and the results are compared with those obtained from implementing the Neural Gas (NG) algorithm. This comparative analysis sheds light on the efficacy of these algorithms in clustering spatial data and extracting meaningful insights related to sustainable development in the urban texture.
Methodology
In this study, data from the Isfahan census blocks (2015), compiled by the Iran Statistics Center, was utilized, alongside information on medical-emergency, cultural-educational, and transportation service points provided by Isfahan Municipality. The research incorporates 13,361 statistical blocks, with 21 socioeconomic variables and indicators related to various urban services associated with sustainable urban development used for the clustering process. Both the Neural Gas (NG) and Contextual Neural Gas (CNG) algorithms were deployed to cluster socioeconomic data of census blocks, and the outcomes were subjected to a comparative analysis. The Neural Gas network, a competitive neural network employing an unsupervised learning model, specializes in solving clustering problems and topology learning. In the NG algorithm, neurons, lacking neighboring connections, dynamically distribute in the input space during training, mirroring the behavior of physical gas. During training, input vectors are presented, a specific vector is chosen, and neurons move towards it, with the displacement influenced by neuron ranking, distance to the input vector, learning rate, and neighborhood range. Importantly, NG lacks a predefined topology representing relationships between neurons. Topology learning is facilitated through Hebb's competitive learning in the post-processing step. The Contextual Neural Gas Network (CNG), an extension of the NG algorithm, integrates spatial characteristics of input data vectors into the clustering process. While neuron adaptation remains consistent in both NG and CNG, their distinction lies in the definition of rank order. CNG accommodates spatial autocorrelation between observations and neurons by leveraging spatial ordering. However, due to the absence of a topologically ordered network in CNG, a two-step procedure is employed to determine rank ordering, incorporating spatial autocorrelation. The Silhouette coefficient was employed in this research to evaluate clustering results. This coefficient, calculated for each sample, class, and the entire dataset, measures the similarity within clusters and dissimilarity between clusters. The overall quality of clustering was assessed using the average Silhouette coefficient for the entire dataset, providing a comprehensive evaluation of the effectiveness of both NG and CNG algorithms in clustering the Isfahan census blocks.
Results and discussion
The outcomes underscore a fundamental distinction between the two algorithms, primarily rooted in their approach to mapping input vectors onto network neurons, resulting in disparate classifications within the respective clusters. The NG algorithm employs a distance criterion to map input vectors, yielding intertwined and heterogeneous clusters. The comparison of the clustered census blocks graph network derived from both algorithms reveals obvious differences in results. Notably, the CNG algorithm, with an average silhouette coefficient of 0.29, demonstrates superior clustering performance compared to the NG algorithm, which yields a notably lower average silhouette coefficient of -0.02. This emphasizes the enhanced ability of the CNG algorithm to form cohesive and meaningful clusters based on socioeconomic and service access data related to sustainable development in Isfahan city.
Conclusion
This research applied neural networks to cluster census blocks in Isfahan, focusing on variables related to sustainable urban development. The study aimed to explore the impact of spatial parameters on neural network clustering results, incorporating geographic coordinates of census block centroids alongside non-spatial inputs. A comparative analysis of algorithm outcomes with and without spatial parameters positively influenced clustering, creating more homogeneous clusters. The Silhouette coefficient and the overall average of the Silhouette, employed for result evaluation, served as affirmative indicators of the beneficial role played by spatial parameters in the clustering process of the Contextual Neural Gas (CNG) algorithm. Consequently, compared with the Neural Gas (NG) algorithm, the CNG algorithm demonstrated its proficiency in generating appropriate and cohesive clusters of census blocks, emphasizing their similarity and spatial characteristics. This research showed the potential of the CNG algorithm in defining homogeneous regions and identifying similar blocks within a tangible dataset. The utility of this algorithm extends to facilitating urban planning by pinpointing homogeneous areas based on selected variables aligned with a sustainable urban development approach. The findings underscore the practical significance of the CNG algorithm as a valuable tool for informed decision-making in urban development and planning initiatives.
Funding
There is no funding support.
Authors’ Contribution
Authors contributed equally to the conceptualization and writing of the article. All of the authors approved thecontent of the manuscript and agreed on all aspects of the work declaration of competing interest none.
Conflict of Interest
Authors declared no conflict of interest.
Acknowledgments
We are grateful to all the scientific consultants of this paper.
Keywords