Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate replacing feature flag service #1343

Closed
austinlparker opened this issue Jan 22, 2024 · 30 comments
Closed

Investigate replacing feature flag service #1343

austinlparker opened this issue Jan 22, 2024 · 30 comments

Comments

@austinlparker
Copy link
Member

Capturing a SIG meeting discussion today...

The feature flag service is kind of a frustrating dependency for us currently. It's very important, but it's also hard to maintain as we don't have a ton of expertise in Erlang. In addition, due to limitations of our build pipeline it's very out of date.

Our general idea is to investigate replacing the feature flag service with some other off-the-shelf solution, then refactoring everything to use OpenFeature. Erlang support would come back into the demo as part of a smaller, more self-contained service (possibly some sort of consumer off Kafka) that should be more amenable to the build pipeline.

@austinlparker
Copy link
Member Author

From some research, I think our best bet might be flagd. It's maintained by OpenFeature, and it seems to be fairly lightweight and straightforward for our purposes. Even better, it's OpenTelemetry compatible, so we can get independent instrumentation out of it. It supports more evaluation styles than we currently do as well (such as targeting, multi-variable experiments, etc.) and can be configured with JSON files rather than requiring a separate database.

@puckpuck
Copy link
Contributor

I vote we replace the product catalog service with an Erlang/Elixir equivalent. This service is part of the minimal install and is one of 3 Go-based services in the demo.

@austinlparker
Copy link
Member Author

I vote we replace the product catalog service with an Erlang/Elixir equivalent. This service is part of the minimal install and is one of 3 Go-based services in the demo.

That should be vaguely easier to maintain since there's no UI component at least?

@julianocosta89
Copy link
Member

ProductCatalog sounds like a good suggestion to be replaced.

I'm just afraid of whenever we start adding Swift and Android to the demo.
Those 2 will have to talk with ProductCatalog, and changing anything in Erlang/Elixir is a bit overwhelming when you are not used to work with it, which is the case for 98% of global population 😅

@austinlparker
Copy link
Member Author

austinlparker commented Feb 18, 2024

#1388

Went ahead and made a PR swapping things out so we can get an idea of what this all would look like. There's a few gaps (no Python provider for flagd, which I'll have to write or find) but overall it seems ok.

ed: looks like there is a flagd provider for python, it's just not released yet. hopefully they'll get that sorted soon.

@austinlparker
Copy link
Member Author

@julianocosta89 @puckpuck After spending a bit of time trying to rewrite the product catalog in Erlang, I'm not so sure about it. Frankly, 3/4ths of this is that I simply don't grok Erlang well enough to be productive in it (and ChatGPT only gets you so far). I think our best bet would be to create a new Erlang service that reads off Kafka and try to make it as minimal as possible to avoid the need to touch it in the future outside of OpenTelemetry updates. I feel like making Product Catalog an Erlang service just creates another load-bearing dependency that'll be hard for us to work with going forward.

@austinlparker
Copy link
Member Author

Also worth noting there's no OpenFeature libraries for Erlang at this point in time, so we'd have to write that as well.

@julianocosta89
Copy link
Member

Yeah, I've figured that would be tricky!
There is also another thing, if we make a erlang/elixir service to talk with kafka, there is no instrumentation for it 😢
Not sure how great that would be

@julianocosta89
Copy link
Member

Let me bring to the discussion @joshleecreates that has some experience in Erlang/Elixir, and also @tsloughter that was the one that initially implemented the FeatureFlagService.

@joshleecreates
Copy link
Contributor

Also worth noting there's no OpenFeature libraries for Erlang at this point in time, so we'd have to write that as well.

The service doesn't need to consume feature flags in its initial version.

I think our best bet would be to create a new Erlang service that reads off Kafka and try to make it as minimal as possible to avoid the need to touch it in the future outside of OpenTelemetry updates.

I am inclined to agree with this. I like the idea of a simple stateless service, but to @julianocosta89's point, there would be no auto-instrumentation unless we also included a downstream database or some kind of HTTP server. But maybe that isn't a bad thing? This is a pretty realistic use-case and would be a good demo of combining auto- and manual-instrumentation to complete a trace.

@tsloughter
Copy link
Member

Just saw this issue after i saw the other. I can do the product catalog. Also can do a kafka service, but seems easier to just translate an existing service that seems to just read in json and serve it over grpc?

@joshleecreates
Copy link
Contributor

Personally I lean away from "batteries-inlcuded" frameworks like Phoenix in the demo since the frameworks' conventions are often at odds with the development and build needs of the demo, so removing Phoenix gets a +1 from me.

@austinlparker
Copy link
Member Author

@tsloughter @joshleecreates I think the ideal replacement Erlang service would be something that meets the criteria of #912 . Reads orders off the Kafka queue, then persists them to a DB.

If we wanted to swap out a JSON service, then we could... but I'd be more in favor of doing something net-new, just to make sure it wasn't a load-bearing part of the architecture.

@austinlparker
Copy link
Member Author

Replying to #1388 (comment)

The biggest advantage I can see to simplifying the build pipeline is that if we move Erlang off into its own little corner, then we don't have to deal with the weird QEMU build issues any more. FF Service was load-bearing to the entire demo and wouldn't run on AS/ARM without cross-compilation. By making a service that's kind of just off to the side, if we can't build cross-platform then that's fine, we can just publish the x86 version and have it not run without impacting the overall demo.

@tsloughter
Copy link
Member

I knew about the Windows 11 issue and I sort of may vaguely recall an ARM build discussion at one point but is there a github issue for that.

@austinlparker
Copy link
Member Author

@tsloughter I'm not sure there's an exhaustive writeup but it's this issue: #956

The short version is that some weird combination of issues with QEMU causes segfaults when trying to update past OTP 23. The actual notes are spread out over a combination of github issues, PRs, and Slack threads. I think the WSL issue is actually independent of it.

@julianocosta89
Copy link
Member

julianocosta89 commented Feb 20, 2024

I had another idea, let me know what you all think.

The easiest service to replace we have in the demo today is the quoteservice, it is just talking with shippingservice and it is not doing much. It receives the number of items, and returns a quote, simple as that. Currently the quoteservice is not talking with FeatureFlagService, so we wouldn't need to worry about missing OpenFeature libraries for Erlang.

We could replace the quoteservice with Erlang/Elixir and then rewrite the productcatalogservice in PHP.
PHP has an OpenFeature SDK on Laravel, so we could continue to use the productcatalogservice with the Feature Flags we have today.

It will be a bit more work, but I think it is a good idea. WDYT?

@mviitane
Copy link
Member

We currently have 3 services with Go that feel quite a lot. We could replace accountingservice with Erlang and create an additional DB to store the sales data. In this way, we wouldn’t need to convert productcatalog, which is quite central. Also, the demo already feels quite large, so we wouldn’t introduce new services other than the DB.

@julianocosta89
Copy link
Member

We currently have 3 services with Go that feel quite a lot. We could replace accountingservice with Erlang and create an additional DB to store the sales data. In this way, we wouldn’t need to convert productcatalog, which is quite central. Also, the demo already feels quite large, so we wouldn’t introduce new services other than the DB.

yeah, that would be okay to me as well.

The only problem is that there is no auto-instrumentation for Kafka in Erlang/Elixir, so we need to manually do it.

@joshleecreates
Copy link
Contributor

The only problem is that there is no auto-instrumentation for Kafka in Erlang/Elixir, so we need to manually do it.

This doesn't seem like a big problem if we're just consuming a single topic, since we also want to show one example of manual instrumentation in each service anyways

@austinlparker
Copy link
Member Author

while i'm in favor of reducing go services, i'd prefer we replace them with something like Java or .NET (probably .NET) -- while I love each and every otel language equally, the world does not, and we should probably err on the side of language popularity when it comes to these decisions rather than trying to have a 1:1 mapping of services to languages.

@austinlparker
Copy link
Member Author

in terms of demo size, I feel like having the erlang service be net-new as a kafka consumer is actually better than replacing an existing one. kafka consumers can all be dropped from the minimal compose file, and since there's no UI, overall requirements to run it should be less.

@austinlparker
Copy link
Member Author

Per SIG meeting -

  • We would like to go ahead and merge the flagd PR
  • We would like the replacement erlang service to hang off kafka; either a new service or a replacement for frauddetection. It should also talk to a DB (postgres)

@julianocosta89
Copy link
Member

@austinlparker why frauddetection and not accountingservice?

FraudDetection is Kotlin and the only Kotlin service in the demo.

AccountingService is Go and we have 3 services written in Go.

@austinlparker
Copy link
Member Author

sure, accountingservice is fine

@cthain
Copy link

cthain commented Mar 13, 2024

👋 Hey there. I've been messing around with the otel demo a bit recently and it is pretty great, thanks!

I'm not sure if this is the best place for this, but since this issue is specifically about using flagd for feature flags, I thought I'd comment here first, instead of raising a new issue. In docker compose I was using the featureflagservice to trigger the memory leak in the recommendationservice via the demo.flagd.json file. It works fine in the Docker compose setup because the file is mounted from the src/ dir into the flagd container.

But this does not work in K8s. I installed the demo via the Helm chart and everything starts up fine. However, the question is: how do I trigger the memory leak? I looked at the featureflagservice pod in k8s and there are no volume mounts. I tracked down that in k8s the FFS gets its flags from ffspostgres. So the docs that mention the demo.flagd.json don't apply here.

I tried port-forwarding 8081 for the featureflagservice and I can see a very basic UI, but I get Not Found for every link I click.. It would be slick to be able to edit the flags directly from the flagd UI, but I'm not sure if that's possible because I haven't done any more digging.

But after poking around in the ffspostgres pod I landed on:

kubectl exec my-otel-demo-ffspostgres-58df9cd6bc-5s77z -- psql -U ffs -d ffs -c \
  "UPDATE featureflags SET enabled=1 WHERE name='recommendationCache'"
UPDATE 1

After changing the flag directly in ffspostgres manually I see the recommendationservice starting to display latency, gaps, and spikes in memory usage (not shown in the figure below).

image

It's obiviously working to trigger the memory leak because the pod is getting OOMKilled by k8s, which is awesome!

kubectl get pods | grep recommendationservice
my-otel-demo-recommendationservice-756854bf-r22p8     0/1     OOMKilled   2 (89s ago)   52m

All this to say, I think the docs could use an update to clarify that updating the feature flags in the demo.flagd.json file only works for docker compose. Also wondering if there's an easier path to manipulating the feature flags in k8s? I opened open-telemetry/opentelemetry.io#4159 to correct the name of the flag in the json file. I'm happy to augment that PR with the necessary info for toggling the feature flags in k8s, if anyone has input/guidance on that front.

@austinlparker
Copy link
Member Author

@cthain part of the reason it doesn't work is because the helm charts haven't been updated yet. generally we don't keep them up to date with main; we usually update them before a release goes out (which should happen soon!)

@joshleecreates
Copy link
Contributor

Update post SIG meeting:

I started to build out a new elixir service downstream from Kafka. In my initial experiments I found that the Kafka client library I had chosen (https://github.com/kafkaex/kafka_ex) does not seem to support the version of Kafka used with the demo, or I am misunderstanding how to form the connection. In either case, it seems apparent that Kafka + Elixir is an unusual combination and not necessarily the best way to demo Erlang/Elixir observability.

In the SIG we discussed some potential alternatives:

  • Building a customer support / chat app using Erlang or Elixir
  • Building an admin portal using Elixir/Pheonix

That chat app seems to be a good path forward because:

  • It's a very Erlang-y use case
  • It's not called by any other services, but could call other services, making it an optional component
  • It provides an interesting area for demoing full stack telemetry with a frontend integration

cc @julianocosta89 @austinlparker

@tsloughter
Copy link
Member

If you'd like to discuss with the SIG we'll be meeting tomorrow (9 PST) in the ErlEF Observability WG zoom instead of the SIG zoom. See https://docs.google.com/document/d/18JVh6ICLyRCJBpRVIXwcR1fb-K4qRX3WHYgtM7mFx2U/edit for the agenda and zoom link.

@puckpuck
Copy link
Contributor

puckpuck commented Apr 7, 2024

I see we are still discussing what to do with an Elixir service. We should take that discussion to a new issue since I'm closing this one. We now have flagd rolled out with the 1.9 release for feature flagging, which covers this issue's original request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants