查看论文信息

查看全文

免费浏览

查看论文信息

论文中文题名：	基于深度学习的驾驶人视觉感知特性研究
姓名：	唐增辉
学号：	19205016001
保密级别：	公开
论文语种：	chi
学科代码：	0802
学科名称：	工学 - 机械工程
学生类型：	硕士
学位级别：	工学硕士
学位年度：	2023
培养单位：	西安科技大学
院系：	机械工程学院
专业：	机械工程
研究方向：	交通安全
第一导师姓名：	赵栓峰
第一导师单位：	西安科技大学
论文提交日期：	2023-06-16
论文答辩日期：	2023-06-01
论文外文题名：	Research on Driver Visual Perception Characteristics Based on Deep Learning
论文中文关键词：	交通驾驶场景 ; 深度学习 ; 视觉方向估计 ; 视觉感知模型
论文外文关键词：	Traffic driving scene ; Deep learning ; Visual direction estimation ; Visual perception model
论文中文摘要：	︿驾驶人的视觉感知特性是影响驾驶行为和安全的重要因素，了解驾驶人在不同道路环境下的视觉信息感知规律，对于提高驾驶人的主动安全和主观舒适度，以及设计更合理的智能辅助系统，具有重要的理论和实际意义。因此，本论文围绕驾驶人视觉感知特性展开研究，主要研究内容如下：（1）构建驾驶人视觉感知数据集。利用眼动仪、相机等设备，在不同道路环境（如城市道路、高速公路、乡村道路等）和任务场景（如直行、转弯、变道等）下，采集大量真实场景图像数据，并通过视觉方向估计技术，获取驾驶人在每一帧图像中的注视点坐标，从而得到包含多模态信息（如交通场景视频、交通要素语义标签、驾驶人注视点等）的数据集。（2）构建基于深度学习的驾驶人视觉感知模型。基于驾驶人视觉感知数据集，设计并训练一个基于深度学习的端到端模型，该模型以交通场景视频为输入，输出一个概率分布图，表示每个像素点被注视的概率。该模型综合考虑了交通场景中各种交通要素，如车辆、行人、车道线、交通标志等以及这些要素的动态信息和时空特性对驾驶人视觉注意力的影响。（3）驾驶人视觉感知特性实验验证及分析。利用眼动仪和相机等设备，在不同道路环境和任务场景下，对30名被试人员进行实验测试，并记录他们在每一帧图像中的真实注视点坐标。然后将真实注视点坐标与模型输出的概率分布图进行对比，计算各种评价指标（如准确率、召回率、F1值等），评估模型在不同条件下的泛化能力和预测效果。﹀
论文外文摘要：	︿ The visual perception characteristics of drivers are important factors that affect driving behavior and safety. Understanding the visual information perception patterns of drivers in different road environments is of great theoretical and practical significance for improving their active safety and subjective comfort, as well as designing more reasonable intelligent assistance systems. Therefore, this paper focuses on the visual perception characteristics of drivers, and the main research content is as follows: Build a driver visual perception dataset. Using eye trackers, cameras and other equipment, under different road environments (such as urban roads, highways, rural roads, etc.) and task scenes (such as straight, turning, lane changing, etc.), a large number of real scene image data are collected, and the driver's gaze point coordinates in each image frame are obtained through visual direction estimation technology, Thus, a dataset containing multimodal information such as traffic scene videos, traffic element semantic labels, driver gaze points, etc. is obtained. Construct a driver visual perception model based on deep learning. Based on the driver's visual perception dataset, design and train an end-to-end model based on deep learning. The model takes traffic scene videos as input and outputs a probability distribution map that represents the probability of each pixel being watched. This model comprehensively considers various traffic elements in the traffic scene, such as vehicles, pedestrians, lane lines, traffic signs, and the impact of their dynamic information and spatiotemporal characteristics on the driver's visual attention. Experimental verification and analysis of driver visual perception characteristics. Using devices such as eye trackers and cameras, experimental tests were conducted on 30 participants in different road environments and task scenarios, and their true gaze coordinates were recorded in each frame of the image. Then compare the real fixation point coordinates with the probability distribution map output by the model, calculate various evaluation indicators (such as accuracy, recall, F1 value, etc.), and evaluate the model's generalization ability and prediction performance under different conditions. ﹀
参考文献：	︿ [1]http://www.caam.org.cn/chn/7/cate_76/con_5235729.html [2]http://https-data-stats-gov-cn.proxy.www.stats.gov.cn/search.htm?s=交通事故 [3]缪小冬. 车辆行驶中的视觉显著目标检测及语义分析研究[D]. 南京：南京航空航天大学, 2014. [4]Laurent Itti, Christof Koch, Ernst Niebur. A model of saliency-based visual attention for rapid scene analysis[J]. IEEE Transactions on pattern analysis and machine intelligence, 1998, 20(11): 1254-1259. [5]Johannes P. H. Reulen, Jacobus T. Marcus, Dirk Koops, et al. Precise recording of eye movement: the IRIS technique Part 1[J]. Medical & Biological Engineering & Computing, 1988, 26: 20-26. [6]Robinson, David A. A method of measuring eye movements using a scleral search coil in a magnetic field[J]. 1963, 10(4):137-145. [7]Shackel B. Note on Mobile Eye Viewpoint Recording[J]. Journal of the Optical Society of America, 1960, 50(8):763-768. [8]Director&, nbsp et al. Reading Your Mind: EEG duri. [J]. Artificial Intelligence & Image Processing, 1974,10(2): 246-272. [9]Thang Vo,Tamas Gedeon. Reading your mind: EEG during reading task[M]. Neural Information Proc, Springer Berlin Heidelberg, 2011: 396-403. [10]Keneth Holmqvist, Marcus Nyström,Richard Andersson, et al. Eye tracking: A comprehensive guide to methods and measures[M]. OUP Oxford, 2011: 1834-1840. [11]Yusuke Sugano, Yasuyuki Matsushita, Yoichi Sato. Learning-by-synthesis for appearance-based 3d gaze estimation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 1821-1828. [12]Knenth Alberto Funes Mora, Jean-Marc Odobez. Gaze estimation from multimodal kinect data[C]//2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 2012: 25-30. [13]Xucong Zhang, YusukeSugano Sugano, Mario Fritz , et al. It's written all over your face: Full-face appearance-based gaze estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017: 51-60. [14]Wangjiang Zhu, Haoping Deng. Monocular free-head 3d gaze tracking with deep learning and geometry constraints[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 3143-3152. [15]Xucong Zhang, YusukeSugano Sugano, Mario Fritz , et al. Appearance-based gaze estimation in the wild[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 4511-4520. [16]Xucong Zhang, YusukeSugano Sugano, Mario Fritz ,et al. Mpiigaze: Real-world dataset and deep appearance-based gaze estimation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 41(1): 162-175. [17]Yihua Cheng, Feng Lu, Xucong Zhang. Appearance-based gaze estimation via evaluation-guided asymmetric regression[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 100-115. [18]Kyle Krafka, Aditya Khosla, Petr Kellnhofer, et al. Eye tracking for everyone[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2176-2184. [19]Junfeng He, Khoi Pham, Nachiappan Valliappan, et al. On-device few-shot personalization for real-time gaze estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 2019: 1149-1158. [20]Harel Jonathan, Koch Christof, Perona Perona. Graph-based visual saliency[J]. Advances in neural information processing systems, 2006, 19: 545-552. [21]Jianming Zhang, Stan Sclaro Saliency detection: A boolean map approach[C]//Proceedings of the IEEE international conference on computer vision. 2013: 153-160. [22]Stas Goferman, Lihi Zelnik-Manor, Ayellet Tal. Context-aware saliency detection[J]. IEEE transactions on pattern analysis and machine intelligence, 2011, 34(10): 1915-1926. [23]Ali Borji, Itti Lauren. State-of-the-art in visual attention modeling[J]. IEEE transactions on pattern analysis and machine intelligence, 2012, 35(1): 185-207. [24]Olivier Le Meur, Patrick Le Callet, Denis Thoreau, et al. A coherent computational approach to model bottom-up visual attention[J]. IEEE transactions on pattern analysis and machine intelligence, 2006, 28(5): 802-817. [25]Lingyun Zhang, Tong Matthew H, Marks Tim K, et al. SUN: A Bayesian framework for saliency using natural statistics[J]. Journal of vision, 2008, 8(7): 32-32. [26]Dashan Gao, Nuno Vasconcelos. Discriminant saliency for visual recognition from cluttered scenes[J]. Advances in neural information processing systems, 2004, 17: 481-488. [27]Bruce N D B, Tsotsos J K. Saliency, attention, and visual search: An information theoretic approach[J]. Journal of vision, 2009, 9(3): 1-30. [28]Xiaodi Hou, Liqing Zhang. Saliency detection: A spectral residual approach[C]//2007 IEEE Conference on computer vision and pattern recognition. Ieee, 2007: 1-8. [29]Tilke Judd, Krista Ehinger, Fredo Durand, et al. Learning to predict where humans look[C]//2009 IEEE 12th international conference on computer vision. IEEE, 2009: 2106-2113. [30]Eleonora Vig , Michanel Dorr, David Cox. Large-scale optimization of hierarchical features for saliency prediction in natural images[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 2798-2805. [31]Tilke Judd, Krista Ehinger, Fredo Durand, et al. Learning to predict where humans look[C]//2009 IEEE 12th international conference on computer vision. IEEE, 2009: 2106-2113. [32]Lior Elazary, IttiLaurent Itti. A Bayesian model for efficient visual search and recognition[J]. Vision research, 2010, 50(14): 1338-1352. [33]Jun Zhu, Yuanyuan Qiu, Rui Zhang, et al. Top-down saliency detection via contextual pooling[J]. Journal of Signal Processing Systems, 2014, 74(1): 33-46. [34]Ji Hyoun Lim, Yili Liu, Omer Tsimhoni. Investigation of driver performance with night-vision and pedestrian-detection systems-Part 2: Queuing network human performance modeling[J]. IEEE Transactions on Intelligent Transportation Systems, 2010, 11(4):765–772. [35]Lev Fridman, Philipp Langhans, Jiwon Lee, et al. Driver Gaze Region Estimation Without Using Eye Movement[J]. 2015, 07: 47-60. [36]In Ho Choi, Seung Kee Hong, Yong Gil Kim. Real-time categorization of driver's gaze zone using the deep learning techniques[C]//2016 International conference on big data and smart computing (BigComp). IEEE, 2016: 143-148. [37]Mikael Lundgren, Lars Hammarstrand, Tomas McKelvey Driver-gaze zone estimation using Bayesian filtering and Gaussian processes[J]. IEEE transactions on intelligent transportation systems, 2016, 17(10): 2739-2750. [38]Ashish Tawari, Andreas Møgelmose, Sujitha Martin, et al. Attention estimation by simultaneous analysis of viewer and view[C]//17th International IEEE Conference on Intelligent Transportation Systems (ITSC). IEEE, 2014: 1381-1387. [39]Shengfeng He, Rynson W H Lau, Qingxiong Yang. Exemplar-driven top-down saliency detection via deep association[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 5723-5732. [40]Jianming Zhang, Sarath Adel Bargal, Zhe Lin, et al. Top-down neural attention by excitation backprop[J]. International Journal of Computer Vision, 2018, 126(10): 1084-1102. [41]Palazzi A, Abati D, Solera F, et al. Predicting the Driver's Focus of Attention: the DR (eye) VE Project[J]. IEEE transactions on pattern analysis and machine intelligence, 2018, 41(7): 1720-1733. [42]Andrea Palazzi, Frances Solera, Simone Calderara, et al. Learning where to attend like a human driver[C]//2017 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2017: 920-925. [43]邓涛. 基于视觉注意的驾驶场景显著性检测模型研究[D]. 成都：电子科技大学, 2018. [44]Guodong Han, Shuanfeng Zhao, Pengfei Wang, et al. Driver Attention Area Extraction Method Based on Deep Network Feature Visualization[J]. Applied Sciences, 2020, 10(16): 5474-5482. [45]Bolie Zhou, Aditya Khosla, Agata Lapedriza, et al. Learning deep features for discriminative localization[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2921-2929. [46]Jakub Čech, Tereza Soukupová. Real-time eye blink detection using facial landmarks[J]. Cent. Mach. Perception, Dep. Cybern. Fac. Electr. Eng. Czech Tech. Univ. Prague, 2016: 1-8. [47]Michael Kass, Andrew Witkin, & Demetri Terzopoulos. Snakes: Active contour models[J]. International Journal of Computer Vision, 1987, (4), 321-331. [48]Hua Tao, Lei Chen, Lei Wei, and Ronghua Wang. An enhanced LeNet-5 deep learning model for classification of breast cancer histology images. IEEE Access2019, 7:927–934. [49]Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2016, 770–778. [50]Sepp Hochreiter, Jürgen Schmidhuber. Long short-term memory[J]. Neural computation, 1997,.9(8), 1735-1780. [51]Jeffrey L. Elman. Finding structure in time[J]. Cognitive science, 1990, 14(2), 179-211. [52]KyungHyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, & Yoshua Bengio. On the properties of neural machine translation: Encoder–decoder approaches [C] //In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, 2014: 103-111. [53]Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, & Manohar Paluri. A closer look at spatiotemporal convolutions for action recognition [C] //In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2015: 6450-6459. [54]Olaf Ronneberger, Philipp Fischer, & Thomas Brox. U-net: Convolutional networks for biomedical image segmentation [C]//In International Conference on Medical image computing and computer-assisted intervention, 2015: 234-241. [55]C. Hillmen Sudre, Wenqi Li, Vercauteren, Tom Vercauteren, Sébastien Ourselin, & Jorge Cardoso, M. Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations [C]//In Deep learning in medical image analysis and multimodal learning for clinical decision support, 2017: 240-248. ﹀
中图分类号：	U471.3
开放日期：	2023-06-19

附件下载