论文中文题名: | 基于逆强化学习的个性化驾驶人模型构建方法研究 |
姓名: | |
学号: | 20205224077 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 085201 |
学科名称: | 工学 - 工程 - 机械工程 |
学生类型: | 硕士 |
学位级别: | 工学硕士 |
学位年度: | 2023 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 交通安全 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2023-06-16 |
论文答辩日期: | 2023-06-01 |
论文外文题名: | Research on Construction Method of Personalized Driver Model Based on Reverse Reinforcement Learning |
论文中文关键词: | |
论文外文关键词: | Driving habits ; Reverse reinforcement learning ; Markov decision making ; Driver model ; Driving behavior prediction |
论文中文摘要: |
驾驶人个性化驾驶特性对于车辆控制和安全性能具有重要影响。在人—车—路三者构成的闭环交通系统中,驾驶人作为车辆的操控者,其行为是影响交通安全的主要因素之一。因此,对于能够反映驾驶特性的个性化驾驶人模型的需求变得越来越迫切,然而,目前关于个性化驾驶人模型的相关研究还很欠缺,而交通事故的发生往往与驾驶人驾驶操作行为有关。针对上述问题,本文提出了一种基于逆强化学习的个性化驾驶人模型方法研究,该模型能够表现出驾驶人的个性化和人性化特性。主要研究内容如下: (1)研究了CAN总线现场采集系统的硬件设施和软件开发。通过摄像机和无人机等硬件设施搭建实车数据采集平台。采用USBCAN-II Pro分析仪作为USB转CAN适配器。设计开发了基于CAN总线数据处理的电脑端软件,采用C++编程语言和.NET平台实现。针对驾驶人驾驶习性受经验、习惯等因素的影响,在线招募驾驶人进行实车驾驶,采集实时状态下的车辆行驶状态数据和驾驶人操作数据等多维驾驶数据信息,数据经USB转CAN总线上传至上位机,经DBC解析模块和CAN报文解析模块对驾驶数据进行解析、分析,为构建个性化驾驶人模型提供数据支撑。 (2)特征参数获取和驾驶行为识别模型。对采集的原始数据进行预处理和特征参数提取,考虑到特征参数之间存在信息冗余,会影响个性化驾驶特性识别效果,因此基于PCA进行特征降维,得到表征驾驶行为的潜在个性化因子,采用K-Means++聚类算法将驾驶风格划分为三类,通过随机森林搭建个性化驾驶行为识别模型,利用对比实验,表明该方法具有较高的识别准确率,以筛选后的特征维度构建个性化驾驶行为识别模型与未特征筛选的维度构建的模型进行对比,以此证明本实验筛选的特征维度的有效性。 (3)构建基于DDPG算法的个性化驾驶决策模型。采用马尔可夫决策过程进行建模分析,具体包括状态空间、动作空间、回报函数和模型网络结构的设计。对于回报函数的设计,使用逆强化学习方法,对真实驾驶行为数据重采样后作为若干组专家策略应用于回报函数的建立,最后对基于DDPG算法的个性化驾驶决策预测模型进行训练,从收敛性和个性化效果的角度分析模型的性能及效果。 (4)基于逆强化学习的个性化驾驶人模型实验验证。首先在交通场景驾驶模拟器LGSVL中针对真实交通场景进行模拟,利用Apollo6.0对模拟的驾驶场景实时仿真,生成虚拟地图。基于虚拟地图进行车辆行驶数据预测,通过引入原DDPG算法、DQN和Actor-Critic算法对本模型设计对比实验,验证模型的收敛性能和个性化决策效果,同时通过搭建实车数据采集平台,采集换道场景和弯道场景两种典型驾驶工况下的3名驾驶人驾驶数据,结合真实工况和实际驾驶数据分析预测驾驶人操作动作的合理性,从而验证了模型的可靠性和泛化性。 |
论文外文摘要: |
The driver's individual driving characteristics have a significant impact on vehicle control and safety performance. In the closed-loop traffic system of human-vehicle-road, the driver's behaviour as the operator of the vehicle is one of the main factors affecting traffic safety. Therefore, the need for personalised driver models that reflect driving characteristics is becoming more and more urgent. However, there is still a lack of research on personalised driver models, and traffic accidents are often related to the driver's driving behaviour. In this paper, a personalised driver model based on inverse reinforcement learning is proposed, which is able to show the personalised and humanised characteristics of the driver. The main research elements are as follows: (1) Hardware facilities and software development for a CAN bus field acquisition system are studied. A live vehicle data acquisition platform was built with hardware facilities such as cameras and drones. The USBCAN-II Pro analyser was used as a USB to CAN adapter. The computer-side software based on CAN bus data processing was designed and developed and implemented using the C++ programming language and the .NET platform. The driver's driving habits are influenced by experience, habits and other factors, and the driver is recruited online to drive a real car, collecting multi-dimensional driving data information such as vehicle driving status data and driver operation data in real time, and the data is uploaded to the host computer via the USB to CAN bus. The DBC analysis module and CAN message analysis module analyse and analyse the driving data, providing data support for the construction of personalised driver models. (2) Feature parameter acquisition and driving behaviour recognition model. The original data collected was pre-processed and feature parameters extracted. Considering that there is information redundancy between feature parameters, which will affect the recognition effect of personalised driving characteristics, feature dimensionality reduction was carried out based on PCA to obtain potential personalisation factors characterising driving behaviour, and the K-Means++ clustering algorithm was used to classify driving styles into three classes, and a personalised driving behaviour recognition model was built through random forest. Using comparison experiments, it was shown that the method has a high recognition accuracy, and the screened feature dimensions were used to construct a personalised driving behaviour recognition model to compare with the model constructed from the un-featured screened dimensions, thus proving the effectiveness of the screened feature dimensions in this experiment. (3) A personalised driving decision model based on the DDPG algorithm is constructed. A Markov decision process is used for modelling and analysis, specifically including the design of the state space, action space, payoff function and model network structure. For the design of the payoff function, an inverse reinforcement learning method is used to resample real driving behaviour data and apply it as several groups of expert policies to the build of the payoff function. Finally, the personalised driving decision prediction model based on the DDPG algorithm is trained and the performance and effectiveness of the model is analysed in terms of convergence and personalisation effect. (4) Experimental validation of a personalised driver model based on inverse reinforcement learning. Firstly, simulations are carried out in the traffic scenario driving simulator LGSVL for real traffic scenarios, and Apollo 6.0 is used to simulate the simulated driving scenarios in real time and generate a virtual map. Based on the virtual map for vehicle driving data prediction, the original DDPG algorithm, DQN and Actor-Critic algorithm were introduced to design comparison experiments for this model to verify the convergence performance and personalised decision making effect of the model, while three drivers' driving data under two typical driving conditions, namely lane change scenario and bend scenario, were collected by building a real vehicle data collection platform, combining real working conditions and The model's reliability and generalisability were verified by analysing and predicting the reasonableness of the driver's manoeuvres by combining real driving conditions and actual driving data. |
参考文献: |
[5] 郑炜荣.商业车险市场化下中小险企面临的挑战及对策——以福建省为例[J].福建 金融,2016(8):50-54. [21] 杨柳.基于脑电数据分析的驾驶行为研究[D].北京:北京交通大学,2019. [41] 任国奇.深度逆强化学习在脓毒症中的应用[D].辽宁:大连理工大学,2020. [42] 刘珏.基于逆强化学习的舰载机牵引车路径规划研究[D].黑龙江:哈尔滨工程大学,2017. [46] 冯超.强化学习精要:核心算法与TensorFlow实现[M]. 北京:电子工业出版社,2018: 329-371. [47] 周志华.机器学习[M].北京:清华大学出版社,2016:390-392. [51] 王璇喆.基于OBD的车载远程数据终端的设计与开发[D].吉林:吉林大学, 2017. [52] 黄晶,蓟仲勋,彭晓燕,等.考虑驾驶人风格的换道轨迹规划与控制[J].中国公路学报, 2019, 032(006):226-239. [53] 孙石磊,王超,赵元棣.基于轮廓系数的参数无关空中交通轨迹聚类方法[J].计算机应用,2019,39(11):3293-3297. [56] 王振宇.基于连续控制任务的确定性策略梯度算法研究[D].黑龙江:哈尔滨理工大学,2021. [57] 唐姚姚.基于NGSIM数据的高速公路合流区换道行为研究[D].四川:西南交通大学,2021. [58] 李臻,兰天然,蒋朝阳,等.基于KITTI数据集的无人车单目惯性SLAM算法评估[J].实验技术与管理,2022,39(02):50-55+72. |
中图分类号: | U471.3 |
开放日期: | 2023-06-19 |