It’s kind of a tacit art more than science.

It also depends on the motivations of the architect and the team.

Ceteris paribus (i.e. we do not consider application design here), I’ll say there’s the ideal, and there’s the pragmatic.

Ideally, the queue-based load leveling pattern calls for a high throughput queue to be placed before services (such as that for data persistency) as it may be slower (memory vs disk, etc.) than what can be achieved with a queue. This may not be the case for horizontally scalable data layers such as Azure Cosmos DB (depending on chosen consistency model and requirements too), but is very valid for scenarios such as when a blockchain is the persistency layer.

To be on the safe side, cloud architects will typically ask for a queue to buffer throughput, de facto.

From a pragmatic perspective though, sometimes both queue mechanisms such as Azure Cache for Redis, Azure Event Hubs and database services such as Azure Cosmos DB has very high throughput. So, the next factor to consider in deciding this is cost.

Cost requires a deeper level of analysis with throughput info over time and etc. There’s also engineering cost, as queue typically adds a layer of consideration too, implementation of the worker, etc. What is the longevity of the application?

We also do encounter scenarios where the development team is small or has a tight timeline. So, it may be ‘least code to feature’ for the time being, then implement a queue as the software proves business value, etc.

In a high stakes application, safety and availability then takes priority. The key is to find a balance between technical perfection vs business tactics/needs.

Finally, there’s also the Retry pattern to consider, i.e. which option gives the highest resistance to transient unavailability of a service.

These are some of the less technically formal but still commercially valid points when considering queues in architecture.