Meet Inspiring Speakers and Experts at our 3000+ Global Conference Series Events with over 1000+ Conferences, 1000+ Symposiums
and 1000+ Workshops on Medical, Pharma, Engineering, Science, Technology and Business.

Explore and learn more about Conference Series : World's leading Event Organizer

Back

Nabil Belgasmi

Nabil Belgasmi

Banque de Tunisie, Tunisia

Title: Multiobjective deep reinforcement learning approach for ATM cash replenishment planning

Biography

Biography: Nabil Belgasmi

Abstract

The current framework of reinforcement learning is based on a single objective performance optimization that is maximizing the expected returns based on scalar rewards that come from either univariate environment response to the agent actions or from a weighted aggregation of a multivariate response. But in many real world situations, tradeoffs must be made among multiple conflicting objectives that have the different order of magnitude, measurement units and business specific contexts related to the problem being solved (i.e. costs, lead time, quality of service, profits, etc.). The aggregation of such sub-rewards to get a scalar reward assumes a perfect knowledge about the decision maker preferences and the way she perceives the importance of each objective. In this study, we consider the problem of learning the best ATM cash replenishment policies in an uncertain multi-objective context given an arbitrary history of cash withdrawals that may be non-stationary and may contain outliers. We propose a model-free Multi-objective Deep Reinforcement Learning approach that allows us to compete against the human decision maker and to find the best policy per ATM that outperforms the current human policy. The idea is to disaggregate the performance of a replenishment policy to form a vector of objective functions. The performance of the human policy is then a multi-dimensional reference point (Rh). The task of the deep reinforcement learning algorithm is to find a policy that generates a set of performance points which Pareto-dominate the current human reference point (Rh).