汽车工程 ›› 2022, Vol. 44 ›› Issue (5): 691-700.doi: 10.19562/j.chinasae.qcgc.2022.05.006

所属专题: 智能网联汽车技术专题-规划&控制2022年

• • 上一篇    下一篇

DHSSA优化的K均值互补迭代车型信息数据聚类

黄鹤1,2(),李文龙1,2,杨澜1,王会峰1,王飚1,茹锋1,2   

  1. 1.长安大学,西安  710064
    2.西安市智慧高速公路信息融合与控制重点实验室,西安  710064
  • 收稿日期:2021-11-22 修回日期:2021-12-19 出版日期:2022-05-25 发布日期:2022-05-27
  • 通讯作者: 黄鹤 E-mail:huanghe@chd.edu.cn
  • 基金资助:
    国家重点研发计划(2018YFB1600600);国家自然科学基金面上项目(52172324);陕西省重点研发计划(2021SF-483);陕西省自然科学基础研究计划项目(2021UM-184);陕西省博士后科研项目(2018BSHYDZZ64);西安市智慧高速公路信息融合与控制重点实验室(长安大学)开放基金项目(300102321502);中央高校基本科研业务费资助项目(300102240203)

K-means Complementary Iterative Vehicle Information Data Clustering Based on DHSSA Optimization

He Huang1,2(),Wenlong Li1,2,Lan Yang1,Huifeng Wang1,Biao Wang1,Feng Ru1,2   

  1. 1.Chang’an University,Xi’an  710064
    2.Xi’an Key Laboratory of Intelligent Expressway Information Fusion and Control,Xi’an  710064
  • Received:2021-11-22 Revised:2021-12-19 Online:2022-05-25 Published:2022-05-27
  • Contact: He Huang E-mail:huanghe@chd.edu.cn

摘要:

针对传统方法在车型信息数据聚类过程中受初始化中心点的影响较大导致聚类精度低、鲁棒性差以及在迭代过程中求取均值选择聚类中心受离群点影响大的问题,提出了一种DHSSA优化的K均值互补迭代车型信息数据聚类方法。首先,针对SSA算法中发现者位置更新不足和种群多样性不足的问题,设计了一种扰动因子-领头雀优化策略,通过自适应领头雀策略加强了最优个体的影响力,利用扰动因子扩大搜索空间,提升了寻找聚类中心的准确率;其次,设计了基于筛选最大最小距离积方法SMMP优化聚类中心的初始化,在MMP基础上增加了筛选机制,使初始化的中心尽可能更均匀地分布在每个簇中;最后,融合DHSSA和SMMP来优化K均值互补迭代,在减小迭代次数的同时增加搜索效率,得到较好的聚类结果。利用多种数据集进行测试,通过试验结果中的收敛曲线和性能指标可以看出,提出的DHSSA-KMC方法相对于SSA-KMC、IMFO-KMC、KMC和KMC++具有更高的搜索精度、收敛速度和更低的聚类代价,并且耗时相对于SSA-KMC和IMFO-KMC有所减少,证明了算法的有效性和优越性。在车型信息数据处理过程中,DHSSA-KMC可以高效聚类生成竞品车型供消费者选择,应用价值明显。

关键词: K均值聚类, 筛选最大最小距离积法, 麻雀搜索算法, 数据集, 车型信息数据

Abstract:

For the problems that the traditional method is greatly affected by the initialization center in the process of vehicle information data clustering, resulting in low clustering accuracy and poor robustness, and the selection of clustering center by calculating the mean in the iterative process is greatly affected by the outliers, a K-means complementary iterative vehicle information data clustering optimized by DHSSA is proposed. Firstly, for the problem of insufficient update of discoverer position and insufficient population diversity in SSA algorithm, a disturbance factor-head optimization strategy is designed. The influence of the optimal individual is strengthened by the adaptive head strategy, and the search space is expanded by the disturbance factor, which improves the accuracy of cluster center searching. Secondly, the initialization of cluster centers optimized by screening maximum and minimum distance product method (SMMP) is designed, and the screening mechanism is added on the basis of MMP, so that the initial centers are more evenly distributed in each cluster as much as possible. Finally, DHSSA and SMMP are integrated to optimize the K-means complementary iteration, which reduces the number of iterations and increases the search efficiency to obtain better clustering results. Using a variety of data sets for testing, through the convergence curve and performance indicators in the experimental results, it can be seen that the proposed DHSSA-KMC method is of higher search accuracy, convergence speed and lower clustering cost than SSA-KMC, IMFO-KMC, KMC and KMC++, and the time consumption is reduced compared with SSA-KMC and IMFO-KMC, which proves the effectiveness and superiority of the algorithm. In the process of vehicle information data processing, DHSSA-KMC can efficiently cluster and generate competitive models for consumers to choose, with obvious application value.

Key words: KMC, screening maximum and minimum distance product, SSA, data sets, car type information data