Performance Engineering

Thursday 23rd July 2020
13:30 PM UTC

Please post questions, suggestions and thoughts about this session in this thread.

Post by clicking the “Reply” button below.

Should we propose that performance should be assessed on a per-participant basis, with a background check to ensure that per-participant linearity is conserved?

I believe slide 36 (screenshot attached) does assess the performance per-participants. Does that answer your question?

1 Like

Thanks, Miguel. My point is: the limit is per-DFSP, I think. It’s good to have it confirmed that that really is true, so if you add DFSPs it scales exactly (which is kind of a worry in itself, wouldn’t you say?) But the limit, as far as I can see, is 240 FTPS

I think we deploy and manage the scheme with a scheme wide capacity and construct out per-participant slas around the per-dfsp capacity of the system as deployed and that customers needs. A single dfsp could burst over the per dfsp limit quite high and still be well within the acceptable slas for the transfers. At the same time a scheme might throttle a single dfsp what was running outside of its range. same as a dfsp that is underperforming and creating a sink hole by not repsonding quickly enough.

But I think what we’re saying is that a single DFSP can’t go over the per-DFSP limit. Or have I misunderstood?

It could not in a sustained manner. It could for a short period of time. This is where we said the patterns that could be applied if the per-dfsp number was a real concern.

1 Like

That understanding is correct, except the limit is 120 FTP/s (with 240 OP/s). It is possible that 2 FSPs may do more FPSs if they are transacting alone on the same hardware for 8 FSPs, but we have been focused on achieving the single unit of scale @ 1000 op/s. I also do not think there would be a substantial difference though. It is worth investigating.

But we can remove that limit by implementing the Virtual Position pattern (if it is required).

I think the FTPS is 120 but the Ops is 240 here…

Also, I think expressing the performance as a per-DFSP number does have good amount of value; we also saw in the presentation that that can be worked around by using a virtual DFSP position concept ( not to confuse with the FXP terminology), where more than one handler / partition can be dedicated to a participant with requirements for greater throughput…

1 Like

Yes, that’s an interesting pattern, I think. Since we don’t want or expect DFSPs to be trading very near their limit, we could automate a transfer between virtual positions as a kind of ledger transfer. Which would slow it down, but that wouldn’t matter in the normal run of things. Question might be, how would you distribute position decreases over the virtual DFSPs… This would be a nice independent service

Uploaded presentation: https://github.com/mojaloop/documentation-artifacts/blob/master/presentations/July%202020%20Community%20Event/Presentations/Mojaloop_OSS_PI11-July2020-Performance.pdf

2 Likes

Is it possible to have a link to the TigerBetle code? I know it’s early days yet, but we’ve used LMAX on a number of projects, but nothing recently. it would be interesting to see the latest thinking and see how it integrates into this accounting-specific DB.

Hey Alex, thanks for the interest in TigerBeetle. It is early days as you say and our repo is a “sprint mess”. We will share more as soon as we can. Have you seen Sam Adams’ talk on LMAX (https://skillsmatter.com/skillscasts/5247-the-lmax-exchange-architecture-high-throughput-low-latency-and-plain-old-java)? We take the same three classic LMAX steps from his talk (journal incoming events to disk and replicate, then apply to in-memory state, then output events) but then we introduce something new by deleting the local journalling step entirely and replacing it with 4-out-of-6 quorum replication to 6 distributed log nodes. That’s how TigerBeetle eliminates gray failure modes on the master’s local disk (the master no longer needs one) and also how TigerBeetle eliminates gray failure in the network links to the replication nodes (these are now 6 log nodes instead of same-DC or remote-DC secondaries like in LMAX). Things become much more homogenous like this. The quorum replication enables TigerBeetle to prevent split-brain on master failover, and is nicer to work with then a distributed consensus protocol like Raft or Paxos, which we also don’t need because we only write to the log nodes, we don’t read from them in the critical path. I don’t know if LMAX has recent changes to handle gray failure like this? It would be interesting to chat sometime and get your input.

Joran -

I’ll have a look at that article - I’d not seen it. My last foray into LMAX was about three years ago. We backed the journal with ChronicleQueue (https://github.com/OpenHFT/Chronicle-Queue) which worked quite well. But these days, with everything in K8s, I’d not use that. I’ve read/heard too many stories about filesystem-backed issues during re-deployment and/or failure recovery in K8s. In one of our current projects we are deploying Hazelcast IMDB/Jet backed by AWS DynamoDB (though Azure CosmosDB should work as well). Hazelcast has recently integrated their own Ringbuffer (https://hazelcast.com/blog/ringbuffer-data-structure/) so it has similar characteristics to LMAX (though I have not done a back-to-back comparison)