End to End Networks for Vehicle Control

Johann Fernandez; Abhishek Brijraj Vishwakarma; Shreyas Chaudhary; Yichi Cheng

Download Paper | Permalink

Conference: 2025 ASEE PSW Conference
Location: California Polytechnic University, California
Publication Date: April 10, 2025
Start Date: April 10, 2025
End Date: April 12, 2025
DOI: 10.18260/1-2--55169
Permanent URL: https://peer.asee.org/55169

Abstract

Convolutional neural network (CNN) architecture for end-to-end autonomous vehicle control.

This paper introduces a novel computer program, a convolutional neural network (CNN), designed for autonomous vehicle operation, we intend to follow up with a poster or full-paper. The CNN processes visual data from the vehicle’s camera, interpreting it into commands for steering wheel control and accelerator pressure adjustment. This approach eliminates the need for separate components responsible for tasks like lane detection, obstacle avoidance, and path planning, consolidating these functions into a single model. The input to the model is a normalized image of the vehicle’s surroundings, measuring 66 by 200 pixels, corresponding to the car’s field of view. Normalization ensures that the computer can disregard irrelevant elements like shadows and variations in lighting, which can hinder its learning capabilities. The CNN comprises multiple layers that analyze the input image, identifying patterns and structures. These layers serve as filters, detecting specific elements like edges, textures, and shapes. Initially, the layers focus on simpler elements like lines and shapes, progressively refining their detection to include more complex features like road markings, curves, and other vehicles and pedestrians. As the network progresses, the number of feature maps decreases, enabling it to concentrate on more pertinent aspects of the image without compromising essential details. This refinement facilitates the computer’s ability to make sound decisions amidst the clutter of the background. The initial layer uses a 5x5 kernel to generate 24 feature maps. Subsequent layers introduce deeper layers with 36, 48, and 64 feature maps, but with a 3x3 kernel. This compression strategy efficiently encodes spatial information while reducing image quality. The network flattens features into a comprehensive list for interconnected neurons to make decisions. The initial layer has 1164 neurons but reduces to 10 in subsequent layers. The final layer generates control signals for steering and acceleration. The network learns from raw sensor data to manage various driving scenarios. It normalizes data to mitigate lighting, shadows, and weather effects. Convolutional layers extract features from data at local and global scales, handling diverse road geometries and driving scenarios. This architectural design is highly efficient and can be implemented in real-time. Unlike traditional systems that separate perception, planning, and control, this end-to-end approach integrates these functions into a single model. This simplifies the system’s design and reduces the need for handcrafted features or rule-based systems. The model can handle diverse driving scenarios without additional training or fine-tuning. One notable advantage is its scalability. The modular design allows for the addition of more sensors or the modification of the input data format. While this implementation uses RGB images, the underlying principles can be generalized to accommodate other data formats, such as LiDAR or radar. The architecture is also efficient, making it suitable for embedded hardware platforms with limited processing power. This architecture offers several advantages over conventional approaches. Firstly, the end-to-end learning paradigm allows the model to acquire knowledge directly from raw data, eliminating the need for human intervention in feature extraction or model optimization. Secondly, the hierarchical structure of the CNN ensures that the model captures both intricate details and broader context, providing a comprehensive understanding of the driving environment. Lastly, the fully connected layers serve as the model’s decision-making mechanism, adept at transforming complex visual information into actionable control signals. Extensive testing on diverse driving scenarios, including varying lighting and concealed objects, revealed the model’s remarkable adaptability and generalization capabilities. It consistently performs exceptionally well, maintaining its trajectory in intricate situations. Its prompt decision-making enables quick reactions to evolving road conditions. This architecture is ideal for autonomous vehicles due to its efficiency, adaptability, and ability to handle road challenges. It represents a sophisticated robotic driver ensuring occupant safety during highway travel.

Citation
Format

Fernandez, J., & Vishwakarma, A. B., & Chaudhary, S., & Cheng, Y. (2025, April), End to End Networks for Vehicle Control Paper presented at 2025 ASEE PSW Conference, California Polytechnic University, California. 10.18260/1-2--55169

TY  - CPAPER
AB  - Convolutional neural network (CNN) architecture for end-to-end autonomous vehicle
control.

This paper introduces a novel computer program, a convolutional neural network (CNN), designed
for autonomous vehicle operation, we intend to follow up with a poster or full-paper. The CNN processes visual data from the vehicle’s camera,
interpreting it into commands for steering wheel control and accelerator pressure adjustment. This
approach eliminates the need for separate components responsible for tasks like lane detection,
obstacle avoidance, and path planning, consolidating these functions into a single model. The input
to the model is a normalized image of the vehicle’s surroundings, measuring 66 by 200 pixels,
corresponding to the car’s field of view. Normalization ensures that the computer can disregard
irrelevant elements like shadows and variations in lighting, which can hinder its learning
capabilities.
The CNN comprises multiple layers that analyze the input image, identifying patterns and
structures. These layers serve as filters, detecting specific elements like edges, textures, and shapes.
Initially, the layers focus on simpler elements like lines and shapes, progressively refining their
detection to include more complex features like road markings, curves, and other vehicles and
pedestrians. As the network progresses, the number of feature maps decreases, enabling it to
concentrate on more pertinent aspects of the image without compromising essential details. This
refinement facilitates the computer’s ability to make sound decisions amidst the clutter of the
background. The initial layer uses a 5x5 kernel to generate 24 feature maps. Subsequent layers
introduce deeper layers with 36, 48, and 64 feature maps, but with a 3x3 kernel. This compression
strategy efficiently encodes spatial information while reducing image quality. The network flattens
features into a comprehensive list for interconnected neurons to make decisions. The initial layer
has 1164 neurons but reduces to 10 in subsequent layers. The final layer generates control signals
for steering and acceleration. The network learns from raw sensor data to manage various driving
scenarios. It normalizes data to mitigate lighting, shadows, and weather effects. Convolutional
layers extract features from data at local and global scales, handling diverse road geometries and
driving scenarios. This architectural design is highly efficient and can be implemented in real-time.
Unlike traditional systems that separate perception, planning, and control, this end-to-end approach
integrates these functions into a single model.
This simplifies the system’s design and reduces the need for handcrafted features or rule-based
systems. The model can handle diverse driving scenarios without additional training or fine-tuning.
One notable advantage is its scalability. The modular design allows for the addition of more sensors
or the modification of the input data format. While this implementation uses RGB images, the
underlying principles can be generalized to accommodate other data formats, such as LiDAR or
radar. The architecture is also efficient, making it suitable for embedded hardware platforms with
limited processing power. This architecture offers several advantages over conventional
approaches. Firstly, the end-to-end learning paradigm allows the model to acquire knowledge
directly from raw data, eliminating the need for human intervention in feature extraction or model
optimization. Secondly, the hierarchical structure of the CNN ensures that the model captures both
intricate details and broader context, providing a comprehensive understanding of the driving
environment. Lastly, the fully connected layers serve as the model’s decision-making mechanism, 
adept at transforming complex visual information into actionable control signals. Extensive testing
on diverse driving scenarios, including varying lighting and concealed objects, revealed the
model’s remarkable adaptability and generalization capabilities. It consistently performs
exceptionally well, maintaining its trajectory in intricate situations. Its prompt decision-making
enables quick reactions to evolving road conditions.
This architecture is ideal for autonomous vehicles due to its efficiency, adaptability, and ability to
handle road challenges. It represents a sophisticated robotic driver ensuring occupant safety during
highway travel.
AU  - Johann Fernandez
AU  - Abhishek Brijraj Vishwakarma
AU  - Shreyas Chaudhary
AU  - Yichi Cheng
CY  - California Polytechnic University, California
DA  - 2025/04/10
PB  - ASEE Conferences
TI  - End to End Networks for Vehicle Control
UR  - https://peer.asee.org/55169
DO  - 10.18260/1-2--55169
ER  -

End to End Networks for Vehicle Control

Paper Authors

Johann Fernandez California State Polytechnic University, Pomona

Abhishek Brijraj Vishwakarma California State Polytechnic University, Pomona

Shreyas Chaudhary California State Polytechnic University, Pomona

Yichi Cheng California State Polytechnic University, Pomona

Abstract

Citation

APA

APA - LaTeX bibitem

MLA

MLA - LaTeX bibitem

Bibtex

EndNote - RIS