## CORRECTION article

Front. Future Transp., 11 December 2023
Sec. Connected Mobility and Automation
Volume 4 - 2023 | https://doi.org/10.3389/ffutr.2023.1320940

# Corrigendum: Optimizing trajectories for highway driving with offline reinforcement learning

Branka Mirchevska1* Moritz Werling2 Joschka Boedecker1,3
• 1Department of Computer Science, University of Freiburg, Freiburg, Germany
• 2BMW Group, Munich, Germany
• 3IMBIT // BrainLinks-BrainTools, University of Freiburg, Freiburg, Germany

by Mirchevska B, Werling M and Boedecker J (2023). Front. Future Transp. 4:1076439. doi: 10.3389/ffutr.2023.1076439

In the published article, there was an error. Algorithm 2: alo should be $alatp$.

A correction has been made to 3 Approach, 3.2 Decision making. This sentence previously stated:

$πθ(s)=(atv,alatd,alond,alo)$.”

The corrected sentence appears below:

$πθ(s)=(atv,alatd,alond,alatp)$.”

In the published article, there was an error. Algorithm 2: alo should be $alatp$.

A correction has been made to 3 Approach, 3.2 Decision making. This sentence previously stated:

$t=generate_traj(s,atv,alatd,alond,alo)$.”

The corrected sentence appears below:

$t=generate_traj(s,atv,alatd,alond,alatp)$.”

A correction has been made to 4 MDP Formalization, 4.3 Reward. This sentence previously stated:

“For the first objective, not causing collisions and remaining within the road boundaries, we define an indicator indf signaling when the agent has failed in the following way:”

The corrected sentence appears below:

“For the first objective, not causing collisions and remaining within the road boundaries, we define an indicator f signaling when the agent has failed in the following way:”

A correction has been made to 4 MDP Formalization, 4.3 Reward. This equation previously stated:

$indf=1,if the agent has failed0,otherwise(1)$

The corrected equation appears below:

$f=1,if the agent has failed0,otherwise(1)$

A correction has been made to 4 MDP Formalization, 4.3 Reward. This equation previously stated:

$indv=1,vlon

The corrected equation appears below:

$vs=1,vlon

A correction has been made to 4 MDP formalization, 4.3 Reward. This equation previously stated:

$rs,a=indf−0.5+1−indfindv1−δv/vdes+1−indv+indjlonpjlonsqjlona/jlonmax+1−indjlonpjlon+indjlatpjlatsqjlata/jlatmax+1−indjlatpjlat(7)$

The corrected equation appears below:

$rs,a=f−0.5+1−fvs1−δv/vdes+1−vs+indjlonpjlonsqjlona/jlonmax+1−indjlonpjlon+indjlatpjlatsqjlata/jlatmax+1−indjlatpjlat(7)$

A correction has been made to 6 Experiments and results, 6.3 Smoothness analysis. This equation previously stated:

$rs,a=f−0.5+1−fvs1−δvel/vdes+1−vs+jsjrw−jcosta/jcostub+1−js−jrw(8)$

The corrected equation appears below:

$rs,a=f−0.5+1−fvs1−δv/vdes+1−vs+jsjrw−jcosta/jcostub+1−js−jrw(8)$

A correction has been made to 6 Experiments and results, 6.3 Smoothness analysis. This sentence previously stated:

“The results indicate that the best performance in terms of jerk is yielded when the reward function from Eq. 8 is used and when jw is assigned a value around 2. However, is important to note that the performance is not very sensitive to the value chosen for jw and performs similarly well in a range of values. It is interesting to note that when the value for jw is too low, e.g., 0.5, the agent deems the jerk-related reward component less significant which results in higher jerk values.”

The corrected sentence appears below:

“The results indicate that the best performance in terms of jerk is yielded when the reward function from Eq. 8 is used and when jrw is assigned a value around 2. However, is important to note that the performance is not very sensitive to the value chosen for jrw and performs similarly well in a range of values. It is interesting to note that when the value for jrw is too low, e.g., 0.5, the agent deems the jerk-related reward component less significant which results in higher jerk values.”

A correction has been made to Appendix, Trajectory generation details. This equation previously stated:

$trajlonp=b0+b1t+b2t2+b3t3+b4t4wheret=0.0,dt,2dt,…,alonpdt(A1)$

The corrected equation appears below:

$trajlonp=b0+b1t+b2t2+b3t3+b4t4wheret=0.0,dt,2dt,…,alond(A1)$

A correction has been made to Appendix, Trajectory generation details. This equation previously stated:

$trajlatp=c0+c1t+c2t2+c3t3+c4t4+c5t5wheret=0.0,dt,2dt,…,alatpdt(A2)$

The corrected equation appears below:

$trajlatp=c0+c1t+c2t2+c3t3+c4t4+c5t5wheret=0.0,dt,2dt,…,alatd(A2)$

The authors apologize for these errors and state that this does not change the scientific conclusions of the article in any way. The original article has been updated.

## Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Keywords: reinforcement learning, trajectory optimization, autonomous driving, offline reinforcement learning, continuous control

Citation: Mirchevska B, Werling M and Boedecker J (2023) Corrigendum: Optimizing trajectories for highway driving with offline reinforcement learning. Front. Future Transp. 4:1320940. doi: 10.3389/ffutr.2023.1320940

Received: 13 October 2023; Accepted: 09 November 2023;
Published: 11 December 2023.

Approved by:

Frontiers Editorial Office, Frontiers Media SA, Switzerland

Copyright © 2023 Mirchevska, Werling and Boedecker. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Branka Mirchevska, mirchevb@informatik.uni-freiburg.de