Distributed Ledger Platform: Powering Modern Payment Applications

Ata E Husain Bohra
11 min readJul 6, 2021

Author: Ata E Husain Bohra

In recent years, companies like Airbnb, DoorDash, Lyft, and Uber have revolutionized business verticals which existed for the past many decades mostly because the incumbent solutions didn’t keep up with the pace of innovation. The dramatic performance improvements in Mobile Communications standards and easy accessibility of mobile phones became the fastest adopted technology in the world. The simplistic software offerings advocating ‘one-click’ technology helped create billions of human connections around the world.

Building a reliable, scalable, and flexible payment platform is table-stacks for such software offerings serving millions of customers around the globe and enabling billions of dollars of money movements on a daily basis. At the core of any payment platform is a ledger, used to record an organization’s history of financial activities. The rest of the article focuses on discussing design details for building a highly scalable, reliable and available distributed Ledger platform to store organization’s transactions.

Glossary

Ledger entity is a managed entity for which a given ledger is maintained, for instance: bank account holders could be entities in banking ledger systems, homeowners could be entities in home insurance ledger system, riders/driver/restaurant owners could be entities in Uber ledger system etc.

Ledger transaction is an operation that updates the state of a given ledger entity. A ledger is an immutable record of all transactions done on a ledger’s entity over a specified period of time.

Ledger account is a logical construct used to maintain balances for a given ledger entity. An entity can have one or more accounts, a ledger transaction can operate on one or more accounts at the same time, however, within a scope of single transaction, all account(s) mutation(s) are applied atomically (all or none semantics).

Motivation

Ledgers are widely used in various organizations for immutable data bookkeeping; for banking organizations it records history of credits and debits per transactions, for insurance organizations it is used to verify data lineage of insurance claims, in supply-chain vertical it is used to track inventory item movements etc. In the past Ledger applications were implemented using custom audit tables or audit trails created in relational databases. However, this design is not a good fit for modern gig-economy use cases due to: scalability issues with underlying relational databases and building an audit functionality is both time consuming as well as prone to errors.

Design Considerations

Below list summarizes salient features expected from a Ledger Services platform:

  • Horizontal scalability, strong data consistency, durability and availability.
  • Immutable record keeping for business events received and processed by the platform.
  • TimeTravel functionality to assist audits, invoice generation, debugging issues etc.
  • Real-time balance tracking.
  • High request processing throughput.

The following section elaborates on design choices to accomplish above requirements.

Horizontal Scalability

A key design requirement for any modern day application is its ability to handle hundreds and thousands of customer write requests per second; hence, an application design needs to scale horizontally matching the demand. Few acceptable design patterns could be:

Decomposable Services

A service is a software designed to provide a specific functionality; it interacts with external applications using clearly defined Application Programming Interfaces (hereafter, referred as API) accepting input request in a predefined format as well as generating output in a predefined format. It is possible to design ledger software by putting all functionalities into one or few services; however, such a design would limit application’s horizontal scalability. An alternate approach is to follow the Microservices Architecture[2] pattern, by splitting the functionality into multiple independent services each running in its own compute instance (CPU and memory isolation) and communicating with other services using lightweight mechanisms, such as HTTP resource API. Such a decoupling allows scaling one service independently from the others based on the workload requirements.

Cluster Storage Resources

One of the core functionalities of ledger applications is to efficiently store and operate on a massive amount of organization’s data. Few acceptable cluster storage designs choices could be:

Compute local storage

Each compute instance can also serve as a data storage node. Applications can shard data across multiple compute nodes to deliver higher read and/or write bandwidth as well as maintain multiple copies of data to protect against transient and/or permanent compute instance failures. Few potential downsides for this approach are: tight coupling prevents independent compute and/or storage node scaling, and instance resources (CPU, memory, network) need to be carefully partitioned to provide adequate performance isolation guarantees.

Decoupled compute-storage tier

In this design each node in the cluster has predefined responsibilities, a node either serves as compute node running service application logic or it serves as a data node allowing clients to store and/or access persisted data. This model allows custom node configuration best suited to node’s assigned functionality i.e. compute node is configured with stronger CPU and memory profile and limited local storage (support application logging, for instance), whereas a data node can be configured with massive persistent storage and relatively weaker CPU and memory profile. This design also enables spinning new workloads consuming existing persisted data (analytical workload, for instance) without impacting already running workloads. Also this design supports independent scaling of compute and/or storage cluster resources.

Ledger Datastore

Ledgers need to provide a strongly consistent and durable datastore with the following guarantees: immutable record keeping i.e. record can be written only once and then never updated and/or deleted, ability to read record at a given point in time or scan records for a specified time range.

Timestamped Append-only datastore

In order to meet Ledger requirements, the underlying datastore can be modeled as a data-log, the applications are allowed to append to the data-log, but already stored data can never be updated and/or removed. The business events such as: client requests and/or transactions are represented as records in the data-log. The records are implicitly sorted by the record timestamp value, allowing an application to perform point-in-time queries as well as time-range scans on persisted records.

Secondary indices

The design needs to be flexible to allow efficient searching of records not only based on record timestamp but also based on application specific data fields, it can be achieved by allowing users to indexable field(s) and implementing a fast/efficient lookup mechanism to assist scanning records. For instance: Amazon allowing users to browse recent order history could be built by scanning datastore records in the chronological order of items purchased till date.

Ancillary use cases

The record data format should allow applications to store business contextual information as part of ledger record entry, it is a powerful feature and could serve as a backbone to solve important business use cases; for instance, ledger request can contains ancillary information such as: discounts, tax breakdown, special charges etc, such information not only makes ledger records single source of truth but may be essential for generating monthly invoices and/or for auditing purposes.

Real-time Balance Tracking

In common cases Ledgers are used to perform offline activities such as: invoice generation, billing etc. However, for certain use cases it is required to display real time available balances to the users; for-instance, ride-sharing and food-delivery platforms support payment via top-up cards; a user can view the current balance before submitting a new request. Following a decomposable services design pattern, the platform can implement a service which can provide such a functionality by reading the latest timestamp record for a given ledger entity. Further, this service can be leveraged to implement advanced functionalities such as Authorization hold[3] to guard the platform against losses due to frauds. This use case is not discussed in this blog and will write a followup blog detailing how it can be accomplished.

Handling peak workload traffic

Applications like Uber, Lyft, Doordash and Airbnb etc. are often subjected to sudden bursts of requests in order to meet business demands; for instance: ride-sharing burst on a NewYear’s eve, food-delivery requests peaks during a SuperBowl game etc. Ledgers can be designed to expose APIs providing only synchronous/inline processing semantics where client application submitting requests waits till the processing is complete, or, is responded with a timeout error if processing is not finished within a specified time interval. Even with horizontal scaling, such an approach might not work for following reasons: degraded system performance till cluster resources ramps-up matching the current demand which could take up to 10–15 minutes or even longer, higher than acceptable best-case average request processing latencies. An alternate approach could be to carefully design the APIs such that only absolute essential operations are performed inline to client execution context and delegate rest of operation(s) to be performed asynchronously to other system service(s).

Ledger Services Platform — High Level Design

Below section describes a viable high level design for implementing a distributed ledger services platform detailing various component services and their interactions among themselves as well as with external clients.

Ledger Datastore

Ledger Datastore service implements a strongly consistent, durable and highly available timestamped append-only datastore for persisting: client requests and transactions. Ledger Datastore enables clients to define two schemas: request and entity-state schemas. The request schema defines the data format in which a client can submit a transaction request to be applied on a given ledger-entity and an entity-state schema captures the state of a ledger entity at any given point of time.

Figure-1 below defines a sample request schema records data format.

Figure-1: Record representation of request schema records

Figure-2 below is a tabular representation of entity-state entries stored in Ledger Datastore, salient points are:

  • Each entity is represented as a row in the datastore.
  • For a given entity, mutations to the entity’s state, due to applying client transaction requests are appended as new records.
  • Each entity-state record captures details such as: timestamp, ledger-account(s) details etc. A transaction can apply following mutations to an entity atomically: (a) create/delete one or more ledger-account(s) (b) update one or more ledger-account(s) properties such as: balances, attributes etc.
Figure-2: Table representation of entity-state entries stored in Ledger Datastore

The data access patterns are mostly governed by the business application needs, however, in most common cases the applications would tend to access the recent data more often; older the data, lower the probability of it being accessed. Two effective techniques to optimize storage access latencies could be:

  1. Caching: Cache most recent data onto fast (but expensive) storage media such as: Solid State Drives (SSD), given the record data is immutable, the system doesn’t have to deal with cache-coherency issues.
  2. Multi-tier storage architecture: the recent data gets stored on a fast, but, expensive storage tier and at some later point in time gets moved to a slower, but, cheaper storage tier; for instance: leverage Amazon Dynamo[4] to cache records for first few weeks/months, once data is old enough move it to S3[5] and/or glacier[6] storage tier.

Ledger Passbook

Ledger Passbook service exposes synchronous API to allow clients to access the entity-state record of a given ledger entity for a specified point in time. The service would authenticate and authorize client accesses and will leverage Ledger Datastore APIs to fetch the record best matching the client requested timestamp, as described in Figure-3 above.

Given a ledger-entity data layout supports creating multiple ledger-accounts, the total available balance for a given entity at any given instance needs to be computed by summing balances across all accounts. Based on business use case, it is possible to design service endpoint to allow clients to fetch full entity-state for a given instance, such an API could be helpful in solving business specific use case without letting business logic leak into Ledger Platform services. For instance: given ledger-account(s) data format allows storing account-attributes (business specific extra information), it is possible for a client to exclude an account with specific properties while computing total available balance.

Ledger Recorder

Ledger Recorder service implements endpoints to allow clients to submit a transaction request to be applied on a given ledger entity. The service endpoints are synchronous to provide strong event transaction processing guarantees, however, in order to provide high throughput, it carefully splits the work performing only essential operations inline to a client request, and delegating rest of work to other service(s) to be performed asynchronously. The two main operations performed by the service (in the same order are):

  1. Synchronously persist client requests to Ledger Datastore request schema record, hence, providing immutable bookkeeping for all client requests submitted to the system. The clients are required to generate unique, but, deterministic request identifiers to handle intermittent network/compute failures handling client request retries.
  2. Once request is persisted in step #a, delegate transaction processing task to Ledger Teller service (discussed below).

Ledger Teller

Ledger Teller is responsible for processing client submitted requests by applying requested transactions to corresponding ledger entities. The service needs to provide following guarantees: idempotent request processing i.e. same request if processed twice (message bus can deliver a message more than once, for instance) should be idempotent, atomic update to ledger entity-state while processing a ledger transaction requiring update(s) to single/multiple entity’s ledger account(s).

Ledger Teller service can be designed to model complex request processing involving multiple steps including calls to external services facilitating complex functionalities such as money collection and/or disbursement, the Ledger Message Bus (discussed below) can be used as a message streaming backbone to construct such request execution workflows. However, one downside of complex request execution workflow is, it may require sophisticated error handling and retry mechanism to maintain platforms strong eventual request processing invariant.

Ledger Message Bus

Given asynchronous request processing design, Ledger component services need to collaborate together while processing a given client request. Two possible designs choices could be:

Remote Procedure Calls

Remote Procedure Call[7], hereafter referred to as RPCs, allows two services to interact with each other via a predefined API format. Ledger Recorder after persisting client request to persistent datastore can invoke Ledger Teller method responsible to process client requested transaction. RPCs can be designed to perform operation synchronously where the caller waits till the requested operation is processed, or, it can be asynchronous, where the caller invokes the remote procedure but does not wait for result and continue. For our use case, remote calls need to be asynchronous. While running applications at massive scale, failures are inevitable, providing strong eventual processing guarantees with asynchronous message passing backbone could be challenging due to intermittent network and/or compute instance failures.

Message Streaming

An event streaming framework can provide a highly scalable, reliable and fault-tolerant distributed message backbone to Ledger services platform. At core the framework is a publish-subscribe based durable messaging platform.

Apache Kafka[8] is one such open source highly scalable messaging system created at LinkedIn. It is a software where topics can be defined (think of a topic as a category), applications can add, process and reprocess records. Applications connect to this system and transfer a record onto a specified topic. Another application may connect to the system and process or reprocess records from a topic. Records are byte arrays that can store any object in any format. A record has four attributes, key and value are mandatory, and the other attributes, timestamp, and headers are optional. The value can be anything that needs to be sent, for example, JSON or plain text. Alternate solutions could be: Google Cloud Pub/Sub[9], Amazon MQ[10], Zero MQ[11], Rabbit MQ[12] etc.

Figure-3 showcases a high level architecture block diagram of Ledger Services Platform highlighting all component services, their interactions with each other and with external clients.

Figure-3: Ledger Services Platform high level architecture

Conclusion

Ledger is a core component for building modern day payment platform systems, designing and implementing a highly scalable, reliable and performant ledger systems is a hard, but, an interesting engineering challenge. However, if done right can significantly impact customers’ user experiences. The article discusses various architectural design choices and considerations for designing a distributed ledger platform meeting modern day payment application needs.

References:

  1. Ledger
  2. MicroServices Architecture overview.
  3. Authorization hold
  4. Amazon Dynamo
  5. Amazon S3
  6. Amazon Glacier
  7. Remote Procedure Call
  8. Apache Kafka
  9. Google Pub/Sub
  10. Amazon MQ
  11. Zero MQ
  12. Rabbit MQ

--

--

Ata E Husain Bohra

Experienced Software professional, a startup enthusiast, ex-Uber.