Analyzing road-test data is important for developing automated vehicles. L3Pilot is a European pilot project on level 3 automation, including 34 partners among manufacturers, suppliers and research institutions. Targeting around 100 cars and 1000 test subjects, the project will generate large amounts of data. We present a data format, allowing efficient data collection, handling and analysis by multiple organizations.
A project of the scope of L3Pilot involves various challenges. Data come from a multitude of heterogeneous sources and are processed by a variety of tools. Recorded data span all data types generated in various vehicular sensors/systems and are enriched with external data sources. Videos supplement time-series data as external files. Derived measures and performance indicators – required to answer research questions about effectiveness of automated driving – are processed by analysis partners and included for each test session.
As a file format, we chose HDF5, which offers a data model and software libraries for storing and managing data. HDF5 is designed for flexible and efficient I/O and for high volume and complex data. The usage of different computing environments for specific tasks is facilitated by the portability that comes with the format. Portability is also important for exploiting the rising potential within artificial intelligence (e.g. automatic scene detection and video annotation).
Based on lessons learned from past field tests, we defined a general frame for the common data format that is aligned with the data processing steps of FESTA “V” evaluation methodology. The definitions include representation of the source signals and a hierarchical structure for including multiple datasets that are gradually supplemented (post-processed or annotated) during the various analysis steps. By using the HDF5 format, analysis partners have the freedom to exploit their familiar tools: MATLAB, Java, Python, R, etc. First comparisons between time-series data in previous projects (e.g. AdaptIVe) and the proposed data format show a reduction in storage size of around 80 %, without losses in performance. Much of that is due to efficient internal compression and structuring of data. Considering the amount of objective data involved in automated driving, this leads to a great benefit, in terms of usability.
This paper presents a compact, portable, and extensible format aimed at handling extremely large amounts of field test data collected in automated driving pilots. As a harmonized format between tens of organizations performing tests in the L3Pilot project, the proposed format has the potential to promote data sharing as well as development of common tools and gain popularity for use in other projects. The format is designed to allow efficient storing of data and its iterative processing with analysis and evaluation tools. The format also considers the requirements of AI tools supporting neural network training and use.