Thousands of cluster geo-replication for fan-in aggregation ? #22438

nareshv · 2024-04-05T02:02:47Z

nareshv
Apr 5, 2024

Assume (c1..cN & cAgg) clusters in thousands (10K)

We are creating a tenant (c1, c2, c3 .. cN, cAgg), namespace (c1, c2, c3,. ... cN, cAgg), topic (c1, c2, c3 ... CN, cAgg) and using per message replication clusters to (cAgg) this gives us flexibility to aggregate messages from all (1..N) clusters to (cAgg).

Does pulsar support such a model ?

What are the scalability concerns to be worried about ?

Any impact on the topic-stats api or admin-api as it lists all replications ?

Any impact on the geo-config-store ?

Any other considerations for implementing this model ?

Answered by lhotari

Apr 5, 2024

These c1..cN are pulsar-standalone instances running on a smaller footprint devices (1core-1GBmem) with very few topics & very small byte rate (kb/sec) to the cAgg cluster.

Ok, now I see the use case. I think geo-replication makes sense for this type of use case. I just don't have first hand experience of such a use case.

When we create a tenant/namespace/topic with replication-clusters, it'd internally use geo-config-store, right ?

I guess it's a matter of definition of what you call "geo-config-store". In Pulsar there are concepts of "local configuration store" and "global configuration store". I guess "global configuration store" is what many would call the "geo configuration store…

View full answer

lhotari · 2024-04-05T06:22:08Z

lhotari
Apr 5, 2024
Collaborator

Does pulsar support such a model ?

Yes. However "support" is perhaps not the correct word here since this would be a very extreme use case if there are 10K clusters within geo-replication.

What are the scalability concerns to be worried about ?

There would be a lot of amplification of traffic to the target cluster "cAgg". The traffic throughput matters a lot and there is a concern how to scale things. This model wouldn't be scalable from a design perspective when thousands of partitions all aggregate to a single partition.

Any impact on the topic-stats api or admin-api as it lists all replications ?

That's probably not a major concern. However, it could be unmanageable with thousands of replications.

Any impact on the geo-config-store ?

I probably wouldn't use a global configuration store at all in such configurations.

Any other considerations for implementing this model ?

I don't have the context of what the use case is and what the volumes are. Based on the provided information, I'd put more focus on why the aggregation is needed and how to find a scalable design for aggregation.

Perhaps the aggregation is a streams processing problem and could be handled with multiple levels of aggregation, implemented with Flink and it's Pulsar connector?

If aggregation using geo-replication is really necessary, it would be recommended to have a sharded design so that there are multiple aggregation clusters where the final results are then aggregated possibly using a streams processing solution.

There are also other types of solutions for aggregation that are compatible with Pulsar. For example, StreamNative has announced a "Streaming Lakehouse" product "Lakehouse Tiered Storage for Pulsar". More details in video and blog post. This opens up completely new possibilities for aggregating the results and saving on costs.

0 replies

nareshv · 2024-04-05T08:08:46Z

nareshv
Apr 5, 2024
Author

There would be a lot of amplification of traffic to the target cluster "cAgg". The traffic throughput matters a lot and there is a concern how to scale things. This model wouldn't be scalable from a design perspective when thousands of partitions all aggregate to a single partition.

These c1..cN are pulsar-standalone instances running on a smaller footprint devices (1core-1GBmem) with very few topics & very small byte rate (kb/sec) to the cAgg cluster.

I probably wouldn't use a global configuration store at all in such configurations.

When we create a tenant/namespace/topic with replication-clusters, it'd internally use geo-config-store, right ?

If without geo-config store solution is possible, does it look something like this. ?

deploy standalones on c1..cN
create c1..cN cluster names without geo-config-store
create cAgg cluster
create tenant/namespace/topic on all c1..cN, cAgg standalones
update the tenant/namespace replicationCluster on all c1..cN in such a way that they replication cluster value is pair (c{x}, cAgg)
now producing messages to the c{x} cluster will replicate the data to cAgg ?

If aggregation using geo-replication is really necessary, it would be recommended to have a sharded design so that there are multiple aggregation clusters where the final results are then aggregated possibly using a streams processing solution.

Any sweet spot for number of c1..cN grouping ? Online presentations talk about upto 100 clusters geo-replication

3 replies

lhotari Apr 5, 2024
Collaborator

These c1..cN are pulsar-standalone instances running on a smaller footprint devices (1core-1GBmem) with very few topics & very small byte rate (kb/sec) to the cAgg cluster.

Ok, now I see the use case. I think geo-replication makes sense for this type of use case. I just don't have first hand experience of such a use case.

When we create a tenant/namespace/topic with replication-clusters, it'd internally use geo-config-store, right ?

I guess it's a matter of definition of what you call "geo-config-store". In Pulsar there are concepts of "local configuration store" and "global configuration store". I guess "global configuration store" is what many would call the "geo configuration store". It is not mandatory to have a "global configuration store" at all. In that case, for a individual cluster, they would be completely independently managed and the global configuration store for an individual cluster would be it's local configuration store.
In this use case, I wouldn't use a global configuration store for security reasons either.

If without geo-config store solution is possible, does it look something like this. ?

yes those high-level steps look about right.

Any sweet spot for number of c1..cN grouping ? Online presentations talk about upto 100 clusters geo-replication

you'd have to test the limits your self to get confidence. In scalable designs, you usually want sharding. Since the configuration of each remote cluster is handled individually, it's a matter of configuration of how to assign those to a aggregation cluster. One possible approach is to start with a few aggregation clusters (so that the scalable design is present from the beginning) and "assign" the remote clusters in a round robin fashion across the aggregation clusters.
In this case, it's about having a cell-based architecture from the beginning.

Answer selected by nareshv

lhotari Apr 5, 2024
Collaborator

These c1..cN are pulsar-standalone instances running on a smaller footprint devices (1core-1GBmem) with very few topics & very small byte rate (kb/sec) to the cAgg cluster.

btw. It's worth considering to use Zookeeper for pulsar-standalone since it defaults to use Rocksdb for metadata storage. Some users have reported stability issues with the Rocksdb metadata storage. More details in apache/pulsar-site#871 .

pulsar-standalone hasn't been designed for production use cases, but if it works for this type of use case, I guess that's great. I think that even with low resources, it might be worth considering a docker compose setup or even k3s setup with separate JVMs for broker, zookeeper and bookie. You'd definitely want to tune the JVM options for this type of use cases.

lhotari Apr 5, 2024
Collaborator

Any sweet spot for number of c1..cN grouping ? Online presentations talk about upto 100 clusters geo-replication

you'd have to test the limits your self to get confidence. In scalable designs, you usually want sharding.

Another way to achieve sharding is to use different namespaces instead of sharing a single namespace for all clusters, spreading out the remote clusters across multiple namespaces.

In Pulsar, there are limits for the number of topics in a single namespace. The limits are around the lack of pagination in the API and when using Zookeeper, the limit is from the jute.maxbuffer parameter. Usually it's not recommended to have more than 50k topics in a single namespace because of this. Many recommend a much lower limit such as 10k or 20k for a single namespace.

yuweisung · 2024-04-05T12:58:28Z

yuweisung
Apr 5, 2024

geo replication is for "replication" cross data center for "ha" or failover. The goal is serving/getting closer to producer and consumer. Your usecase is more of "edge" computing/streaming. It is more of data pipeline. Using geo replication to reach this goal, technically, is doable. But you need to consider operational efforts. eg, how to maintain those standalone instances. how to maintain huge amount of replication subscriptions. in what way? automation? human?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thousands of cluster geo-replication for fan-in aggregation ? #22438

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Thousands of cluster geo-replication for fan-in aggregation ? #22438

nareshv Apr 5, 2024

Replies: 3 comments · 3 replies

lhotari Apr 5, 2024 Collaborator

nareshv Apr 5, 2024 Author

lhotari Apr 5, 2024 Collaborator

lhotari Apr 5, 2024 Collaborator

lhotari Apr 5, 2024 Collaborator

yuweisung Apr 5, 2024

nareshv
Apr 5, 2024

Replies: 3 comments 3 replies

lhotari
Apr 5, 2024
Collaborator

nareshv
Apr 5, 2024
Author

lhotari Apr 5, 2024
Collaborator

lhotari Apr 5, 2024
Collaborator

lhotari Apr 5, 2024
Collaborator

yuweisung
Apr 5, 2024