-
Assume We are creating a tenant Does pulsar support such a model ? What are the scalability concerns to be worried about ? Any impact on the Any impact on the geo-config-store ? Any other considerations for implementing this model ? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
Yes. However "support" is perhaps not the correct word here since this would be a very extreme use case if there are 10K clusters within geo-replication.
There would be a lot of amplification of traffic to the target cluster "cAgg". The traffic throughput matters a lot and there is a concern how to scale things. This model wouldn't be scalable from a design perspective when thousands of partitions all aggregate to a single partition.
That's probably not a major concern. However, it could be unmanageable with thousands of replications.
I probably wouldn't use a global configuration store at all in such configurations.
I don't have the context of what the use case is and what the volumes are. Based on the provided information, I'd put more focus on why the aggregation is needed and how to find a scalable design for aggregation. Perhaps the aggregation is a streams processing problem and could be handled with multiple levels of aggregation, implemented with Flink and it's Pulsar connector? If aggregation using geo-replication is really necessary, it would be recommended to have a sharded design so that there are multiple aggregation clusters where the final results are then aggregated possibly using a streams processing solution. There are also other types of solutions for aggregation that are compatible with Pulsar. For example, StreamNative has announced a "Streaming Lakehouse" product "Lakehouse Tiered Storage for Pulsar". More details in video and blog post. This opens up completely new possibilities for aggregating the results and saving on costs. |
Beta Was this translation helpful? Give feedback.
-
These c1..cN are pulsar-standalone instances running on a smaller footprint devices (1core-1GBmem) with very few topics & very small byte rate (kb/sec) to the
When we create a tenant/namespace/topic with replication-clusters, it'd internally use geo-config-store, right ? If without geo-config store solution is possible, does it look something like this. ?
Any sweet spot for number of c1..cN grouping ? Online presentations talk about upto 100 clusters geo-replication |
Beta Was this translation helpful? Give feedback.
-
geo replication is for "replication" cross data center for "ha" or failover. The goal is serving/getting closer to producer and consumer. Your usecase is more of "edge" computing/streaming. It is more of data pipeline. Using geo replication to reach this goal, technically, is doable. But you need to consider operational efforts. eg, how to maintain those standalone instances. how to maintain huge amount of replication subscriptions. in what way? automation? human? |
Beta Was this translation helpful? Give feedback.
Ok, now I see the use case. I think geo-replication makes sense for this type of use case. I just don't have first hand experience of such a use case.
I guess it's a matter of definition of what you call "geo-config-store". In Pulsar there are concepts of "local configuration store" and "global configuration store". I guess "global configuration store" is what many would call the "geo configuration store…