To improve the interface design of in-vehicle infotainment systems, robust evaluation methods are required. The Eye Glance measurement using Driving Simulator test (EGDS) defined in the Visual-manual NHTSA Driver Distraction Guidelines for in-Vehicle Electronic Devices is a promising candidate. However, the present study indicates that EGDS needs further refinement to become sufficiently robust. When two randomly selected groups of 24 drivers tested the same ten in-vehicle tasks following the EGDS protocol, test outcomes were not the same for the two groups. The analysis showed this to be a consequence of how the EGDS pass/fail criteria are calculated. As currently formulated, they make test outcomes highly dependent on between-driver variability. To assess the problem magnitude with repeated EGDS testing, another eight virtual test groups were created by for each group randomly selecting 24 of the 48 participants’ test scores. The analysis showed that EGDS outcomes were 60 % consistent between these ten groups. While six tasks consistently passed or failed, the outcome for the other four depended on which group had tested them. This EGDS reliability problem could possibly be overcome by matching the criteria calculation principles to the underlying population variability.