Trading System on Cloud : The Struggle

Mainak Saha
5 min readDec 21, 2020

Since fear of pandemic hits the Financial World, one of the biggest technology trend is movement towards Big Boy cloud (AWS / Azure / GCP). From last one decade Financial Institutions are embracing cloud, but always keeping it out from their core business. Adoption of cloud is always very wide spread in CRM Solutions, Call Center Management, and other peripheral services. Company wide SaaS adoption such as Office365 is also very common. Now if you turn around and look at core functionalities, such as Account Management, Transactions and Trading, are still managed at traditional data centers or which is commonly termed as On-Prem.

Trading System (On Prem)

Above figure describes well architected on-prem system, commonly used in most of the Trading Platform. I know, I am over simplifying it, but these are standard components used in most of the places.

  • Messaging System — This is mainly to achieve event driven architecture. An event can be propagated to multiple downstream system, so that each system can achieve its correct state. Messaging system also used for CQRS implementation, so that read / write of the data can be separated.
  • API Gateway — An gateway is must for Authorization of the request. As per the maturity of the organization and implementation, gateway can be used for versioning and throttling as well.
  • Transactional Data Store — Data store with ACID property and Read after Write consistency is must for transaction management. Most of the time, this leads to a single giant SQL based data store.
  • Caching — Memory based caching is used to improve performance of the system. Based on use case, Cache-Aside or Cache-as-SOR pattern is implemented.
  • Services — Mostly two kinds of services are used, request / response based and back ground services. For matured platforms request / response based services are exposed as API endpoints, and they are segregated as Experience API, Process API and Data API layer. Most of the time falsely claimed as Microservices. Many platforms use Service Bus as well.
  • 3rd Party Systems — This is the most vulnerable part of most of the systems. You cannot avoid it, but no integration can be “fool” proof. Many of the times, these integrations are done through vendor provided hardware, and TCP endpoints.
  • Logging and Monitoring — Last but not the least, any Trading platform needs extensive logging. This is achieved through asynchronous logging to files and then pushing those files to central store, and support team must have predefined dashboard and alerts to monitor the system.
Trading System — Under Stress

Now when anything like pandemic hits the market, then actual performance of the system under stress starts to come out. If you approach any Cloud Solution Architect with above diagram, most of them will come up with a solution like below, I tried to put together a serverless version for both AWS and GCP.

AWS Architecture — High Level

And here is the GCP version of the same .. There are various way of doing it, but more or less it’ll lead to this.

Google Cloud Platform Architecture — High Level

These kind of Architecture with multi region implementation and automatic fail over reaches maximum level of availability and scalability. Problem is, to correctly achieve this, a multi-year program is required, with thorough planning and funding till the last day for development and testing. In Financial world, these are not new systems, you are just proposing a modernization. To show ROI to business, most of the time Technology team start adopting a phased approach through a hybrid architecture. Moreover most of the systems having external dependencies to reach Market, for Market Data, Placing Trade to market etc. Most of these integrations are hardware dependent integration due to achieve low latency and licensing cost depends on number of these hard connections e.g. FIX connectivity to any major Market access provider. Web service based offerings are there, but not hugely used for low latency requirements.

Hybrid Architecture

Above diagram shows a hybrid / in-phase state of the Architecture. Let me quickly explain the design by putting few words around it.

  • Resources which needs scalability, shows performance issues during stress (refer to under stress diagram) e.g. computing resource / data store / cache , those are moved to cloud to achieve scalability.
  • 3rd Party legacy integrations, which cannot be moved to cloud quickly will stay connected with On-Prem.
  • Data Store and services to support 3rd party integrations will stay at On-Prem.
  • Through streaming layer, data will be kept in sync between cloud and on-prem data store.

This architecture has its own limitations, and many applications cannot adopt it.

  • Data Proximity — If data and computing are not happening at same place, low latency requirements cannot be full filled.
  • Data Flow — If transactional data need to flow between On-Prem to Cloud or vice versa and that information is used for Trade verification systems, then any latency introduced will lead to bad trade.

This is nothing new for any Architect dealing with Distributed Architecture for few months. This can be easily solved if this can be true …

“Bandwidth is Infinite”

As it is too good to be true, our problem persists.

Because of above mentioned limitations, application owners will get option to choose between on-prem and cloud, as per their use case, and organization spending more by keeping both infrastructures.

I am going to propose few approaches to address these issues in next couple of parts of this series.

--

--

Mainak Saha

Cloud / Artificial Intelligence / Financial Services Enthusiast ..