In many service, production, and traffic systems there are multiple types of customers requiring different types of servers, i.e., different services, products, or routes. Often, however, a proportion of the customers are flexible, i.e., they are willing to change their type in order to achieve faster service, and even if this proportion is small, it
has the potential of achieving large performance gains. We study partial flexibility in multi-server queueing systems. Some (dedicated) arrivals are obliged to use a particular station, while others (flexible) have the ability to use any of the stations.
We first consider the optimal routing policy for the flexible customers under various settings. Flexible customers should be routed by the decision maker to one of queues upon arrival. When the only information available upon arrival is the queue lengths, then "Join the Shortest Queue" (JSQ) has been shown to be optimal in a variety of contexts. We generalize earlier results on the optimality of "Join the Shortest Queue" for flexible arrivals to the following: arbitrary arrivals where only a subset are flexible, multiple-server stations, and abandonments. Our optimality results are very strong; we minimize the queue length process in the weak majorization sense. When the actual workload at each queue is known upon arrival but the required work of the arriving customer is unknown, we show that routing flexible customers to the queue with the shortest workload, known in the literature as the "Join the Shortest Work" (JSW) policy, is optimal for general arrival and service processes.
Secondly, we study the scheduling problem in an alternate design where the flexible customers have a separate queue to wait. We show that the optimal scheduling policy is "Dedicated Customers First" (DCF) policy under various settings. This design is better than the routing design in terms of system performance; however it is not a good design in terms of fairness, because flexible customers face long waiting times.
Finally we consider the marginal impact of customer flexibility. We present our new results showing that the stationary expected waiting time is decreasing and convex in the proportion of flexible customers. Although convexity in the proportion of flexible customers is intuitive, it does not hold in the strong sense that monotonicity holds, and it is surprisingly difficult to prove. We develop a new approach that combines marginal analysis with coupling to show convexity in the stationary mean waiting time. Our results reinforce the idea that a little flexibility goes a long way.