查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于逆强化学习的个性化驾驶人模型构建方法研究
姓名：	李瑶
学号：	20205224077
保密级别：	公开
论文语种：	chi
学科代码：	085201
学科名称：	工学 - 工程 - 机械工程
学生类型：	硕士
学位级别：	工学硕士
学位年度：	2023
培养单位：	西安科技大学
院系：	机械工程学院
专业：	机械工程
研究方向：	交通安全
第一导师姓名：	赵栓峰
第一导师单位：	西安科技大学
论文提交日期：	2023-06-16
论文答辩日期：	2023-06-01
论文外文题名：	Research on Construction Method of Personalized Driver Model Based on Reverse Reinforcement Learning
论文中文关键词：	驾驶习性 ; 逆强化学习 ; 马尔可夫决策 ; 驾驶人模型 ; 驾驶行为预测
论文外文关键词：	Driving habits ; Reverse reinforcement learning ; Markov decision making ; Driver model ; Driving behavior prediction
论文中文摘要：	︿驾驶人个性化驾驶特性对于车辆控制和安全性能具有重要影响。在人—车—路三者构成的闭环交通系统中，驾驶人作为车辆的操控者，其行为是影响交通安全的主要因素之一。因此，对于能够反映驾驶特性的个性化驾驶人模型的需求变得越来越迫切，然而，目前关于个性化驾驶人模型的相关研究还很欠缺，而交通事故的发生往往与驾驶人驾驶操作行为有关。针对上述问题，本文提出了一种基于逆强化学习的个性化驾驶人模型方法研究，该模型能够表现出驾驶人的个性化和人性化特性。主要研究内容如下：（1）研究了CAN总线现场采集系统的硬件设施和软件开发。通过摄像机和无人机等硬件设施搭建实车数据采集平台。采用USBCAN-II Pro分析仪作为USB转CAN适配器。设计开发了基于CAN总线数据处理的电脑端软件，采用C++编程语言和.NET平台实现。针对驾驶人驾驶习性受经验、习惯等因素的影响，在线招募驾驶人进行实车驾驶，采集实时状态下的车辆行驶状态数据和驾驶人操作数据等多维驾驶数据信息，数据经USB转CAN总线上传至上位机，经DBC解析模块和CAN报文解析模块对驾驶数据进行解析、分析，为构建个性化驾驶人模型提供数据支撑。（2）特征参数获取和驾驶行为识别模型。对采集的原始数据进行预处理和特征参数提取，考虑到特征参数之间存在信息冗余，会影响个性化驾驶特性识别效果，因此基于PCA进行特征降维，得到表征驾驶行为的潜在个性化因子，采用K-Means++聚类算法将驾驶风格划分为三类，通过随机森林搭建个性化驾驶行为识别模型，利用对比实验，表明该方法具有较高的识别准确率，以筛选后的特征维度构建个性化驾驶行为识别模型与未特征筛选的维度构建的模型进行对比，以此证明本实验筛选的特征维度的有效性。（3）构建基于DDPG算法的个性化驾驶决策模型。采用马尔可夫决策过程进行建模分析，具体包括状态空间、动作空间、回报函数和模型网络结构的设计。对于回报函数的设计，使用逆强化学习方法，对真实驾驶行为数据重采样后作为若干组专家策略应用于回报函数的建立，最后对基于DDPG算法的个性化驾驶决策预测模型进行训练，从收敛性和个性化效果的角度分析模型的性能及效果。（4）基于逆强化学习的个性化驾驶人模型实验验证。首先在交通场景驾驶模拟器LGSVL中针对真实交通场景进行模拟，利用Apollo6.0对模拟的驾驶场景实时仿真，生成虚拟地图。基于虚拟地图进行车辆行驶数据预测，通过引入原DDPG算法、DQN和Actor-Critic算法对本模型设计对比实验，验证模型的收敛性能和个性化决策效果，同时通过搭建实车数据采集平台，采集换道场景和弯道场景两种典型驾驶工况下的3名驾驶人驾驶数据，结合真实工况和实际驾驶数据分析预测驾驶人操作动作的合理性，从而验证了模型的可靠性和泛化性。﹀
论文外文摘要：	︿ The driver's individual driving characteristics have a significant impact on vehicle control and safety performance. In the closed-loop traffic system of human-vehicle-road, the driver's behaviour as the operator of the vehicle is one of the main factors affecting traffic safety. Therefore, the need for personalised driver models that reflect driving characteristics is becoming more and more urgent. However, there is still a lack of research on personalised driver models, and traffic accidents are often related to the driver's driving behaviour. In this paper, a personalised driver model based on inverse reinforcement learning is proposed, which is able to show the personalised and humanised characteristics of the driver. The main research elements are as follows: (1) Hardware facilities and software development for a CAN bus field acquisition system are studied. A live vehicle data acquisition platform was built with hardware facilities such as cameras and drones. The USBCAN-II Pro analyser was used as a USB to CAN adapter. The computer-side software based on CAN bus data processing was designed and developed and implemented using the C++ programming language and the .NET platform. The driver's driving habits are influenced by experience, habits and other factors, and the driver is recruited online to drive a real car, collecting multi-dimensional driving data information such as vehicle driving status data and driver operation data in real time, and the data is uploaded to the host computer via the USB to CAN bus. The DBC analysis module and CAN message analysis module analyse and analyse the driving data, providing data support for the construction of personalised driver models. (2) Feature parameter acquisition and driving behaviour recognition model. The original data collected was pre-processed and feature parameters extracted. Considering that there is information redundancy between feature parameters, which will affect the recognition effect of personalised driving characteristics, feature dimensionality reduction was carried out based on PCA to obtain potential personalisation factors characterising driving behaviour, and the K-Means++ clustering algorithm was used to classify driving styles into three classes, and a personalised driving behaviour recognition model was built through random forest. Using comparison experiments, it was shown that the method has a high recognition accuracy, and the screened feature dimensions were used to construct a personalised driving behaviour recognition model to compare with the model constructed from the un-featured screened dimensions, thus proving the effectiveness of the screened feature dimensions in this experiment. (3) A personalised driving decision model based on the DDPG algorithm is constructed. A Markov decision process is used for modelling and analysis, specifically including the design of the state space, action space, payoff function and model network structure. For the design of the payoff function, an inverse reinforcement learning method is used to resample real driving behaviour data and apply it as several groups of expert policies to the build of the payoff function. Finally, the personalised driving decision prediction model based on the DDPG algorithm is trained and the performance and effectiveness of the model is analysed in terms of convergence and personalisation effect. (4) Experimental validation of a personalised driver model based on inverse reinforcement learning. Firstly, simulations are carried out in the traffic scenario driving simulator LGSVL for real traffic scenarios, and Apollo 6.0 is used to simulate the simulated driving scenarios in real time and generate a virtual map. Based on the virtual map for vehicle driving data prediction, the original DDPG algorithm, DQN and Actor-Critic algorithm were introduced to design comparison experiments for this model to verify the convergence performance and personalised decision making effect of the model, while three drivers' driving data under two typical driving conditions, namely lane change scenario and bend scenario, were collected by building a real vehicle data collection platform, combining real working conditions and The model's reliability and generalisability were verified by analysing and predicting the reasonableness of the driver's manoeuvres by combining real driving conditions and actual driving data. ﹀
参考文献：	︿ [1] Liu Kai, Gong Jianwei, Arda Kurt, et al. Dynamic Modeling and Control of High-Speed Automated Vehicles for Lane Change Maneuver[J]. IEEE transactions on intelligent vehicles,2018,3(3):329-339. [2] Salifu Nanga, Nii Afotey Odai, Anani Lotsi. Survival pattern of first accident among commercial drivers in the Greater Accra Region of Ghana[J]. Accident Analysis and Prevention,2017,103:92-95. [3] Deborah Estrin, Kuppuswamy Mani Chandy, Robert Michael Young, et al. Participatory sensing: applications and architecture [J]. IEEE Internet Computing,2010,14(1):12-42. [4] Yan Chun, Wang Xindong, Liu Xinhong, et al. Research on the UBI Car Insurance Rate Determination Model Based on the CNN-HVSVM Algorithm[J]. IEEE Access,2020,8: 160762-160773. [5] 郑炜荣.商业车险市场化下中小险企面临的挑战及对策——以福建省为例[J].福建金融,2016(8):50-54. [6] Li Yufang, Chen Mingnuo, Zhao Wanzhong. Investigating long-term vehicle speed prediction based on BP-LSTM algorithms[J]. IET Intelligent Transport Systems,2019, 13(8):1281-1290. [7] MacAdam Charles Jackson. An Optimal Preview Control for Linear Systems[J]. Journal of Dynamic Systems Measurement & Control,1980,102(3):188. [8] Manfred Plöchl, Johannes Edelmann. Driver models in automobile dynamics application[J]. Vehicle System Dynamics,2007,45(7-8):699-741. [9] Cole David J, Pick Anthony John, Odhams AMC. Predictive and linear quadratic methods for potential application to modelling driver steering control[J]. Vehicle System Dynamics,2006,44(3): 259-284. [10] Schnelle Scott, Wang Junmin, Su Haijun, et al. A driver steering model with personalized desired path generation[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2016,47(1):111-120. [11] Schnelle Scott, Wang Junmin, Su Haijun, et al. A personalizable driver steering model capable of predicting driver behaviors in vehicle collision avoidance maneuvers[J]. IEEE Transactions on Human-Machine Systems,2016,47(5):625-635. [12] Hatazawa Yusuke, Hamada Ayaka, Oikawa Shoko, et al. Construction of personalized driver model for car-following behavior on highways using LSTM[J]. Journal of Advanced Mechanical Design, Systems, and Manufacturing, 2023, 17(2): JAMDSM0022-JAMDSM0022. [13] Hausknecht Matthew, Stone Peter. Deep Reinforcement Learning in Parameterized Action Space[J]. Computer Science, arXiv preprint arXiv:1511.04143, 2015. [14] Li Zirui, Wang Boyang, Gong Jianwei, et al. Development and evaluation of two learning-based personalized driver models for pure pursuit path-tracking behaviors[C]//2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China,2018:79-84. [15] Yang Wei, Liu Jiajun, Zhou Kaixia, et al. An automatic emergency braking model considering Driver’s intention recognition of the front vehicle[J]. Journal of advanced transportation,2020,2020:1-15. [16] Kyunghan Min, Kyuhwan Yeon, Yuhyeok Jo, et al. Vehicle Deceleration Prediction Based on Deep Neural Network at Braking Conditions[J]. International Journal of Automotive Technology,2020,21(1):91-102. [17] Huang Zhiyu, Wu Jingda, Lv Chen. Driving behavior modeling using naturalistic human driving data with inverse reinforcement learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(8):10239-10251. [18] Lu Chao, Gong Jianwei, Lv Chen, et al. A personalized behavior learning system for human-like longitudinal speed control of autonomous vehicles[J]. Sensors, 2019, 19(17): 3672. [19] Nechyba Michael C,Xu Yangsheng. On the fidelity of human skill models[C]//Proceedings of IEEE International Conference on Robotics and Automation. IEEE, Minneapolis, MN, USA ,1996,3:2688-2693. [20] Nechyba Michael C, Xu Yangsheng. Human control strategy: abstraction, verification, and replication[J]. IEEE Control Systems Magazine, 1997,17(5):48-61. [21] 杨柳.基于脑电数据分析的驾驶行为研究[D].北京:北京交通大学,2019. [22] Zhang Yilu, William C.Miller, Yuen-kwok Steve Chin. Data-Driven Driving Skill Characterization: Algorithm Comparison and Decision Fusion[C]//SAE 2009 World Congress. Society of Automotive Engineers (SAE), 2009:11329-11336. [23] Tang Xidong. Driving skill recognition: new approaches and their comparison[C]//2009 American control conference. IEEE, 2009:420-425. [24] Abdul Wahab, Toh Guang Wen, Norhaslinda Kamaruddin. Understanding driver behavior using multi-dimensional CMAC[C]//2007 6th international conference on information, communications & signal processing. IEEE,2007:1-5. [25] Raz Ofer, Fleishman Hod, Mulchadsky Itamar. System and method for vehicle driver behavior analysis and evaluation: U.S. Patent 7,389,178[P].2008-6-17. [26] Toledo Tomer, Lotan Tsippy. In-vehicle data recorder for evaluation of driving behavior and safety[J]. Transportation Research Record,2006,1953(1):112-119. [27] Toledo Tomer, Musicant Oren, Lotan Tsippy. In-vehicle data recorders for monitoring and feedback on drivers’ behavior[J]. Transportation Research Part C: Emerging Technologies, 2008,16(3):320-331. [28] Meiring Gert Albertus, Myburgh Hendrik Christoffel. A Review of Intelligent Driving Style Analysis Systems and Related Artificial Intelligence Algorithms[J]. Sensors,2015,15(12):30653-30682. [29] Iván Silva, José Eugenio Naranjo. A Systematic Methodology to Evaluate Prediction Models for Driving Style Classification[J]. Sensors,2020,20(6):1692. [30] Bryan Higgs, Montasir Abbas. Segmentation and Clustering of Car-Following Behavior: Recognition of Driving Patterns[J]. IEEE Transactions on Intelligent Transportation Systems,2014,16(1):81-90. [31] Kalman Rudolf Emil. When is a linear control system optimal?[J]. Journal of Basic Engineering,1964,86(1):51-60. [32] Andrew Yan-Tak Ng, Stuart Russell. Algorithm for inverse reinforcement learning[C] //Proceedings of the Seventeenth International Conference on Machine learning (ECML 2000), June 29-July 2,2000, Stanford University/.2000:663-670. [33] Abbeel Pieter, Andrew Yan-Tak Ng. Apprenticeship Learning via Inverse Reinforcement Learning[C]//Proceedings of the Twenty-First International Conference on Machine Learning.2004:1-8. [34] Nathan Ratliff, Andrew Bagnell, Martin Zinkevich. Maximum Margin Planning[C] //Proceedings of the 23rd international conference on Machine learning,2006:729-736. [35] Ziebart Brian, Andrew Maas, Andrew Bagnell, et al. Maximum entropy inverse reinforcement learning[C]//Aaai, Chicago, IL, USA,2008,8:1433-1438. [36] Wulfmeier Markus, Ondruska Peter, Posner Ingmer. Maximum entropy deep inverse reinforcement learning[J]. arXiv preprint arXiv:1507.04888,2015. [37] Chelsea Finn, Sergey Levine, Pieter Abbeel. Guided cost learning: Deep inverse optimal control via policy optimization[C]//International conference on machine learning. PMLR, 2016:49-58. [38] Chen Ting, Guo Changxin, Li Hao, et al. An Improved Multimodal Trajectory Prediction Method Based on Deep Inverse Reinforcement Learning[J]. Electronics,2022,11(24):4097. [39] Boularias Abdeslam, Kober Jens, Peters Jan. Relative Entropy Inverse Reinforcement Learning[C]//Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings,2011:182-189. [40] Jonathan Ho, Gupta Jayesh , Stefano Ermon. Model-Free Imitation Learning with Policy Optimization[C]//International Conference on Machine Learning. PMLR,2016:2760-2769. [41] 任国奇.深度逆强化学习在脓毒症中的应用[D].辽宁:大连理工大学,2020. [42] 刘珏.基于逆强化学习的舰载机牵引车路径规划研究[D].黑龙江:哈尔滨工程大学,2017. [43] Kato Yuki, Kanoh Masaki, Nakamura Takashi. Reward Function Using Inverse Reinforcement Learning and Fuzzy Reasoning[C]//2020 Joint 11th International Conference on Soft Computing and Intelligent Systems and 21st International Symposium on Advanced Intelligent Systems (SCIS-ISIS).2020:1-6. [44] Takemura Kanon, Hirakawa Tsubasa, Mizutani Yuichi, et al. Trajectories Prediction of the Black-Tailed Gull Using the Inverse Reinforcement Learning[C]//International Conference on Pervasive Computing and Communications Workshops. Graduate school of Env, Kyoto,2019:715-717. [45] Ng Andrew Yan-Tak, Russell Stuart. Algorithms for inverse reinforcement learning[C]// Proceedings of 17th International Conference on Machine Learning, Morgan Kaufmann,2000:663-670. [46] 冯超.强化学习精要:核心算法与TensorFlow实现[M]. 北京:电子工业出版社,2018: 329-371. [47] 周志华.机器学习[M].北京:清华大学出版社,2016:390-392. [48] Daiko Kishikawa, Sachiyo Arai. Estimation of personal driving style via deep inverse reinforcement learning[J]. Artificial Life and Robotics,2021,26(3):1-9. [49] Zakaria Noor Jamaliah, Shapiai Mohd Ikwan, Wahid Nur. A study of multiple reward function performances for vehicle collision avoidance systems applying the DQN algorithm in reinforcement learning[J]. IOP Conference Series: Materials Science and Engineering,2021,1176(1): 012033. [50] Klein Edouard, Geist Matthieu, Piot Bilal, et al. Inverse reinforcement learning through structured classification[C]//Advances in Neural Information Processing Systems, 2012:1007-1015. [51] 王璇喆.基于OBD的车载远程数据终端的设计与开发[D].吉林:吉林大学, 2017. [52] 黄晶,蓟仲勋,彭晓燕,等.考虑驾驶人风格的换道轨迹规划与控制[J].中国公路学报, 2019, 032(006):226-239. [53] 孙石磊,王超,赵元棣.基于轮廓系数的参数无关空中交通轨迹聚类方法[J].计算机应用,2019,39(11):3293-3297. [54] Domingos Pedro. The master algorithm: How the quest for the ultimate learning machine will remake our world[M]. Basic Books, 2015. [55] Nguyen Quoc Phong, Bryan Kian Hsiang Low, Patrick Jaillet. Inverse reinforcement learning with locally consistent reward functions[J]. 2015,28:32-39. [56] 王振宇.基于连续控制任务的确定性策略梯度算法研究[D].黑龙江:哈尔滨理工大学,2021. [57] 唐姚姚.基于NGSIM数据的高速公路合流区换道行为研究[D].四川:西南交通大学,2021. [58] 李臻,兰天然,蒋朝阳,等.基于KITTI数据集的无人车单目惯性SLAM算法评估[J].实验技术与管理,2022,39(02):50-55+72. [59] 刘爽.基于语义分割的车道线检测算法研究[D].北京:北京交通大学, 2021. [60] Qi Wang, Zhiheng Li, Li Li. Investigation of Discretionary Lane-Change Characteristics Using Next-Generation Simulation Data Sets[J]. Journal of Intelligent Transportation Systems,2014, 18(3):246-253. ﹀
中图分类号：	U471.3
开放日期：	2023-06-19

附件下载