An Optimization-Based Approach for Efficient Neural Network Training #16

Abstract

Training deep neural networks often suffers from slow convergence and high computational cost due to the inefficiency of stochastic gradient descent (SGD). This paper explores an alternative adaptive learning rate optimization strategy, inspired by second-order methods, to improve convergence speed while maintaining stability. We propose a Hybrid Gradient Descent (HGD) algorithm that combines momentum-based updates with curvature-aware scaling. Experiments on the MNIST and CIFAR-10 datasets demonstrate a 15–20% reduction in training epochs compared to standard SGD, without loss in accuracy.

1. Introduction

Deep learning has achieved state-of-the-art performance across computer vision, natural language processing, and reinforcement learning. However, training large models remains computationally expensive. Stochastic Gradient Descent (SGD) is widely used, but it struggles with saddle points and vanishing gradients. Existing optimizers like Adam and RMSProp improve upon SGD but often introduce instability.

This research investigates whether a hybrid optimizer—combining the stability of momentum with curvature-based scaling—can lead to faster convergence in practical deep learning tasks.

2. Related Work

SGD: Simple but slow in high-dimensional loss landscapes.
Adam (Kingma & Ba, 2015): Fast convergence, but sometimes poor generalization.
Natural Gradient Descent: Theoretically strong, but expensive to compute.

Our method builds upon these by attempting to balance efficiency with stability.

4. Experimental Setup

Datasets: MNIST (handwritten digits), CIFAR-10 (image classification).
Models:
- 2-layer CNN for MNIST.
- ResNet-18 for CIFAR-10.
Baselines: SGD, Adam, RMSProp.
Metrics: Training loss, accuracy, convergence speed.

Optimizer	MNIST Convergence Epochs	CIFAR-10 Accuracy
SGD	25	82%
Adam	15	83%
HGD (ours)	12	83.5%

HGD reduced convergence epochs by ~20%.

Accuracy remained competitive.

Training was more stable compared to Adam.

An Optimization-Based Approach for Efficient Neural Network Training #16

Abstract

1. Introduction

2. Related Work

4. Experimental Setup

Leave a Reply Cancel reply