Investigate replacing feature flag service #1343

austinlparker · 2024-01-22T17:04:40Z

Capturing a SIG meeting discussion today...

The feature flag service is kind of a frustrating dependency for us currently. It's very important, but it's also hard to maintain as we don't have a ton of expertise in Erlang. In addition, due to limitations of our build pipeline it's very out of date.

Our general idea is to investigate replacing the feature flag service with some other off-the-shelf solution, then refactoring everything to use OpenFeature. Erlang support would come back into the demo as part of a smaller, more self-contained service (possibly some sort of consumer off Kafka) that should be more amenable to the build pipeline.

austinlparker · 2024-02-17T21:56:33Z

From some research, I think our best bet might be flagd. It's maintained by OpenFeature, and it seems to be fairly lightweight and straightforward for our purposes. Even better, it's OpenTelemetry compatible, so we can get independent instrumentation out of it. It supports more evaluation styles than we currently do as well (such as targeting, multi-variable experiments, etc.) and can be configured with JSON files rather than requiring a separate database.

puckpuck · 2024-02-18T04:27:26Z

I vote we replace the product catalog service with an Erlang/Elixir equivalent. This service is part of the minimal install and is one of 3 Go-based services in the demo.

austinlparker · 2024-02-18T04:29:11Z

I vote we replace the product catalog service with an Erlang/Elixir equivalent. This service is part of the minimal install and is one of 3 Go-based services in the demo.

That should be vaguely easier to maintain since there's no UI component at least?

julianocosta89 · 2024-02-18T10:48:05Z

ProductCatalog sounds like a good suggestion to be replaced.

I'm just afraid of whenever we start adding Swift and Android to the demo.
Those 2 will have to talk with ProductCatalog, and changing anything in Erlang/Elixir is a bit overwhelming when you are not used to work with it, which is the case for 98% of global population 😅

austinlparker · 2024-02-18T21:32:54Z

#1388

Went ahead and made a PR swapping things out so we can get an idea of what this all would look like. There's a few gaps (no Python provider for flagd, which I'll have to write or find) but overall it seems ok.

ed: looks like there is a flagd provider for python, it's just not released yet. hopefully they'll get that sorted soon.

austinlparker · 2024-02-19T15:55:22Z

@julianocosta89 @puckpuck After spending a bit of time trying to rewrite the product catalog in Erlang, I'm not so sure about it. Frankly, 3/4ths of this is that I simply don't grok Erlang well enough to be productive in it (and ChatGPT only gets you so far). I think our best bet would be to create a new Erlang service that reads off Kafka and try to make it as minimal as possible to avoid the need to touch it in the future outside of OpenTelemetry updates. I feel like making Product Catalog an Erlang service just creates another load-bearing dependency that'll be hard for us to work with going forward.

austinlparker · 2024-02-19T15:56:42Z

Also worth noting there's no OpenFeature libraries for Erlang at this point in time, so we'd have to write that as well.

julianocosta89 · 2024-02-19T16:00:54Z

Yeah, I've figured that would be tricky!
There is also another thing, if we make a erlang/elixir service to talk with kafka, there is no instrumentation for it 😢
Not sure how great that would be

julianocosta89 · 2024-02-19T16:02:18Z

Let me bring to the discussion @joshleecreates that has some experience in Erlang/Elixir, and also @tsloughter that was the one that initially implemented the FeatureFlagService.

joshleecreates · 2024-02-19T17:39:25Z

Also worth noting there's no OpenFeature libraries for Erlang at this point in time, so we'd have to write that as well.

The service doesn't need to consume feature flags in its initial version.

I think our best bet would be to create a new Erlang service that reads off Kafka and try to make it as minimal as possible to avoid the need to touch it in the future outside of OpenTelemetry updates.

I am inclined to agree with this. I like the idea of a simple stateless service, but to @julianocosta89's point, there would be no auto-instrumentation unless we also included a downstream database or some kind of HTTP server. But maybe that isn't a bad thing? This is a pretty realistic use-case and would be a good demo of combining auto- and manual-instrumentation to complete a trace.

tsloughter · 2024-02-19T17:40:03Z

Just saw this issue after i saw the other. I can do the product catalog. Also can do a kafka service, but seems easier to just translate an existing service that seems to just read in json and serve it over grpc?

joshleecreates · 2024-02-19T17:41:05Z

Personally I lean away from "batteries-inlcuded" frameworks like Phoenix in the demo since the frameworks' conventions are often at odds with the development and build needs of the demo, so removing Phoenix gets a +1 from me.

austinlparker · 2024-02-19T17:41:57Z

@tsloughter @joshleecreates I think the ideal replacement Erlang service would be something that meets the criteria of #912 . Reads orders off the Kafka queue, then persists them to a DB.

If we wanted to swap out a JSON service, then we could... but I'd be more in favor of doing something net-new, just to make sure it wasn't a load-bearing part of the architecture.

austinlparker · 2024-02-19T17:48:02Z

Replying to #1388 (comment)

The biggest advantage I can see to simplifying the build pipeline is that if we move Erlang off into its own little corner, then we don't have to deal with the weird QEMU build issues any more. FF Service was load-bearing to the entire demo and wouldn't run on AS/ARM without cross-compilation. By making a service that's kind of just off to the side, if we can't build cross-platform then that's fine, we can just publish the x86 version and have it not run without impacting the overall demo.

tsloughter · 2024-02-19T17:54:54Z

I knew about the Windows 11 issue and I sort of may vaguely recall an ARM build discussion at one point but is there a github issue for that.

austinlparker · 2024-02-19T18:17:41Z

@tsloughter I'm not sure there's an exhaustive writeup but it's this issue: #956

The short version is that some weird combination of issues with QEMU causes segfaults when trying to update past OTP 23. The actual notes are spread out over a combination of github issues, PRs, and Slack threads. I think the WSL issue is actually independent of it.

julianocosta89 · 2024-02-20T08:35:35Z

I had another idea, let me know what you all think.

The easiest service to replace we have in the demo today is the quoteservice, it is just talking with shippingservice and it is not doing much. It receives the number of items, and returns a quote, simple as that. Currently the quoteservice is not talking with FeatureFlagService, so we wouldn't need to worry about missing OpenFeature libraries for Erlang.

We could replace the quoteservice with Erlang/Elixir and then rewrite the productcatalogservice in PHP.
PHP has an OpenFeature SDK on Laravel, so we could continue to use the productcatalogservice with the Feature Flags we have today.

It will be a bit more work, but I think it is a good idea. WDYT?

mviitane · 2024-02-20T10:28:43Z

We currently have 3 services with Go that feel quite a lot. We could replace accountingservice with Erlang and create an additional DB to store the sales data. In this way, we wouldn’t need to convert productcatalog, which is quite central. Also, the demo already feels quite large, so we wouldn’t introduce new services other than the DB.

julianocosta89 · 2024-02-20T13:59:30Z

We currently have 3 services with Go that feel quite a lot. We could replace accountingservice with Erlang and create an additional DB to store the sales data. In this way, we wouldn’t need to convert productcatalog, which is quite central. Also, the demo already feels quite large, so we wouldn’t introduce new services other than the DB.

yeah, that would be okay to me as well.

The only problem is that there is no auto-instrumentation for Kafka in Erlang/Elixir, so we need to manually do it.

joshleecreates · 2024-02-20T14:06:33Z

The only problem is that there is no auto-instrumentation for Kafka in Erlang/Elixir, so we need to manually do it.

This doesn't seem like a big problem if we're just consuming a single topic, since we also want to show one example of manual instrumentation in each service anyways

austinlparker · 2024-02-20T14:49:47Z

while i'm in favor of reducing go services, i'd prefer we replace them with something like Java or .NET (probably .NET) -- while I love each and every otel language equally, the world does not, and we should probably err on the side of language popularity when it comes to these decisions rather than trying to have a 1:1 mapping of services to languages.

austinlparker · 2024-02-20T14:51:26Z

in terms of demo size, I feel like having the erlang service be net-new as a kafka consumer is actually better than replacing an existing one. kafka consumers can all be dropped from the minimal compose file, and since there's no UI, overall requirements to run it should be less.

austinlparker · 2024-02-21T16:22:13Z

Per SIG meeting -

We would like to go ahead and merge the flagd PR
We would like the replacement erlang service to hang off kafka; either a new service or a replacement for frauddetection. It should also talk to a DB (postgres)

julianocosta89 · 2024-02-26T19:33:15Z

@austinlparker why frauddetection and not accountingservice?

FraudDetection is Kotlin and the only Kotlin service in the demo.

AccountingService is Go and we have 3 services written in Go.

austinlparker · 2024-02-27T14:35:48Z

sure, accountingservice is fine

cthain · 2024-03-13T22:36:09Z

👋 Hey there. I've been messing around with the otel demo a bit recently and it is pretty great, thanks!

I'm not sure if this is the best place for this, but since this issue is specifically about using flagd for feature flags, I thought I'd comment here first, instead of raising a new issue. In docker compose I was using the featureflagservice to trigger the memory leak in the recommendationservice via the demo.flagd.json file. It works fine in the Docker compose setup because the file is mounted from the src/ dir into the flagd container.

But this does not work in K8s. I installed the demo via the Helm chart and everything starts up fine. However, the question is: how do I trigger the memory leak? I looked at the featureflagservice pod in k8s and there are no volume mounts. I tracked down that in k8s the FFS gets its flags from ffspostgres. So the docs that mention the demo.flagd.json don't apply here.

I tried port-forwarding 8081 for the featureflagservice and I can see a very basic UI, but I get Not Found for every link I click.. It would be slick to be able to edit the flags directly from the flagd UI, but I'm not sure if that's possible because I haven't done any more digging.

But after poking around in the ffspostgres pod I landed on:

kubectl exec my-otel-demo-ffspostgres-58df9cd6bc-5s77z -- psql -U ffs -d ffs -c \
  "UPDATE featureflags SET enabled=1 WHERE name='recommendationCache'"
UPDATE 1

After changing the flag directly in ffspostgres manually I see the recommendationservice starting to display latency, gaps, and spikes in memory usage (not shown in the figure below).

It's obiviously working to trigger the memory leak because the pod is getting OOMKilled by k8s, which is awesome!

kubectl get pods | grep recommendationservice
my-otel-demo-recommendationservice-756854bf-r22p8     0/1     OOMKilled   2 (89s ago)   52m

All this to say, I think the docs could use an update to clarify that updating the feature flags in the demo.flagd.json file only works for docker compose. Also wondering if there's an easier path to manipulating the feature flags in k8s? I opened open-telemetry/opentelemetry.io#4159 to correct the name of the flag in the json file. I'm happy to augment that PR with the necessary info for toggling the feature flags in k8s, if anyone has input/guidance on that front.

austinlparker · 2024-03-13T23:02:01Z

@cthain part of the reason it doesn't work is because the helm charts haven't been updated yet. generally we don't keep them up to date with main; we usually update them before a release goes out (which should happen soon!)

joshleecreates · 2024-04-03T15:38:02Z

Update post SIG meeting:

I started to build out a new elixir service downstream from Kafka. In my initial experiments I found that the Kafka client library I had chosen (https://github.com/kafkaex/kafka_ex) does not seem to support the version of Kafka used with the demo, or I am misunderstanding how to form the connection. In either case, it seems apparent that Kafka + Elixir is an unusual combination and not necessarily the best way to demo Erlang/Elixir observability.

In the SIG we discussed some potential alternatives:

Building a customer support / chat app using Erlang or Elixir
Building an admin portal using Elixir/Pheonix

That chat app seems to be a good path forward because:

It's a very Erlang-y use case
It's not called by any other services, but could call other services, making it an optional component
It provides an interesting area for demoing full stack telemetry with a frontend integration

cc @julianocosta89 @austinlparker

tsloughter · 2024-04-03T17:26:44Z

If you'd like to discuss with the SIG we'll be meeting tomorrow (9 PST) in the ErlEF Observability WG zoom instead of the SIG zoom. See https://docs.google.com/document/d/18JVh6ICLyRCJBpRVIXwcR1fb-K4qRX3WHYgtM7mFx2U/edit for the agenda and zoom link.

puckpuck · 2024-04-07T03:21:16Z

I see we are still discussing what to do with an Elixir service. We should take that discussion to a new issue since I'm closing this one. We now have flagd rolled out with the 1.9 release for feature flagging, which covers this issue's original request.

austinlparker mentioned this issue Feb 19, 2024

Remove Feature Flag Service in favor of OpenFeature + flagd #1388

Merged

3 tasks

cthain mentioned this issue Mar 13, 2024

Demo Docs: Update the name of recommendation cache feature flag open-telemetry/opentelemetry.io#4159

Merged

Kimbohlovette mentioned this issue Mar 20, 2024

Adding new service to save orders to database #912

Open

puckpuck closed this as completed Apr 7, 2024

joshleecreates mentioned this issue Apr 8, 2024

Add a new Elixir example service #1511

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate replacing feature flag service #1343

Investigate replacing feature flag service #1343

austinlparker commented Jan 22, 2024

austinlparker commented Feb 17, 2024

puckpuck commented Feb 18, 2024

austinlparker commented Feb 18, 2024

julianocosta89 commented Feb 18, 2024

austinlparker commented Feb 18, 2024 •

edited

Loading

austinlparker commented Feb 19, 2024

austinlparker commented Feb 19, 2024

julianocosta89 commented Feb 19, 2024

julianocosta89 commented Feb 19, 2024

joshleecreates commented Feb 19, 2024

tsloughter commented Feb 19, 2024

joshleecreates commented Feb 19, 2024

austinlparker commented Feb 19, 2024

austinlparker commented Feb 19, 2024

tsloughter commented Feb 19, 2024

austinlparker commented Feb 19, 2024

julianocosta89 commented Feb 20, 2024 •

edited

Loading

mviitane commented Feb 20, 2024

julianocosta89 commented Feb 20, 2024

joshleecreates commented Feb 20, 2024

austinlparker commented Feb 20, 2024

austinlparker commented Feb 20, 2024

austinlparker commented Feb 21, 2024

julianocosta89 commented Feb 26, 2024

austinlparker commented Feb 27, 2024

cthain commented Mar 13, 2024

austinlparker commented Mar 13, 2024

joshleecreates commented Apr 3, 2024

tsloughter commented Apr 3, 2024

puckpuck commented Apr 7, 2024

Investigate replacing feature flag service #1343

Investigate replacing feature flag service #1343

Comments

austinlparker commented Jan 22, 2024

austinlparker commented Feb 17, 2024

puckpuck commented Feb 18, 2024

austinlparker commented Feb 18, 2024

julianocosta89 commented Feb 18, 2024

austinlparker commented Feb 18, 2024 • edited Loading

austinlparker commented Feb 19, 2024

austinlparker commented Feb 19, 2024

julianocosta89 commented Feb 19, 2024

julianocosta89 commented Feb 19, 2024

joshleecreates commented Feb 19, 2024

tsloughter commented Feb 19, 2024

joshleecreates commented Feb 19, 2024

austinlparker commented Feb 19, 2024

austinlparker commented Feb 19, 2024

tsloughter commented Feb 19, 2024

austinlparker commented Feb 19, 2024

julianocosta89 commented Feb 20, 2024 • edited Loading

mviitane commented Feb 20, 2024

julianocosta89 commented Feb 20, 2024

joshleecreates commented Feb 20, 2024

austinlparker commented Feb 20, 2024

austinlparker commented Feb 20, 2024

austinlparker commented Feb 21, 2024

julianocosta89 commented Feb 26, 2024

austinlparker commented Feb 27, 2024

cthain commented Mar 13, 2024

austinlparker commented Mar 13, 2024

joshleecreates commented Apr 3, 2024

tsloughter commented Apr 3, 2024

puckpuck commented Apr 7, 2024

austinlparker commented Feb 18, 2024 •

edited

Loading

julianocosta89 commented Feb 20, 2024 •

edited

Loading