-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support running function worker along with broker #1329
Conversation
…cated license headers This addresses some comments in pulsar functions PR apache#1314
…keeper version This removes the temp dependency in `pulsar-functions-instance`
/cc @srkukarni @jerrypeng |
I think for standalone service, this should be enabled by default. |
agree with @merlimat |
Yup |
bin/pulsar
Outdated
@@ -74,6 +110,7 @@ where command is one of: | |||
discovery Run a discovery server | |||
proxy Run a pulsar proxy | |||
websocket Run a web socket proxy server | |||
worker Run a function worker server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about naming the command functions-worker
to make it more explicit? Also, use plural to highlight this will be running multiple instances of multiple functions.
@@ -40,6 +40,21 @@ | |||
<artifactId>testng</artifactId> | |||
<version>6.13.1</version> | |||
</dependency> | |||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If every module depends on buildtool
, everyone will depend on log4j
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the log4j are already default to all modules in test scope, buildtools is also default to all modules in test scope, since it is shipping a default log4j2.yaml for testing. so it is making sense to move them to buildtools.
conf/function_worker.yml
Outdated
# under the License. | ||
# | ||
|
||
workerId: standalone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have comments for each key explaining the purpose of each setting.
Ideally, each setting should also have the default value, such that commenting that same line would have the same effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file is currently copied from functions-module, which I don't change any behavior at this moment. I would keep this change as simple as for now since it is already touching quite a few places. we can clean up these configuration files after we merge the functions dist with pulsar/all.
conf/function_worker.yml
Outdated
pulsarServiceUrl: pulsar://localhost:6650 | ||
pulsarWebServiceUrl: http://localhost:8080 | ||
numFunctionPackageReplicas: 1 | ||
downloadDirectory: /tmp/pulsar_functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should default to data/pulsar_functions
. /tmp
is mounted on ramdisk in many systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the purpose of using /tmp because worker doesn't really require any persistence. the directory can be gone after reboot, that's okay. but we can change it to data/ if that's preferred. although it would be good to do this after this change when we are merging function dist with pulsar/all. the file is copied from functions/conf to conf.
conf/function_worker.yml
Outdated
metricsConfig: | ||
metricsSinkClassName: org.apache.pulsar.functions.metrics.sink.PrometheusSink | ||
metricsCollectionInterval: 30 | ||
metricsSinkConfig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we collect the metrics with the Prometheus java client lib, the metrics are already in the registry and will be reported by standalone broker at localhost:8080/metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is for process based runtime. @srkukarni can chime in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -76,6 +81,12 @@ | |||
@Parameter(names = { "--only-broker" }, description = "Only start Pulsar broker service (no ZK, BK)") | |||
private boolean onlyBroker = false; | |||
|
|||
@Parameter(names = {"-rfw", "--run-function-worker"}, description = "Run function worker with Broker") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--run-functions-worker
Also, we should have a key in broker.conf
to do the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add one.
@sijie In the worker we are issue calls the Pulsar broker's admin REST interface. In the future, for better integration, can the pulsar broker pass in objects to the worker so it can use that determine the info gotten from the admin REST interface instead of actually issuing a HTTP command locally? |
@jerrypeng yes. that would be the ideal solution. however that is going to take more efforts to achieve that, which should be done after we make function worker integrated on broker. |
@jerrypeng @srkukarni @merlimat : the console output of standalone is already very complicated, making standalone by default starts function worker will make it worse. are we sure we want to make it by default. If we think start function worker should be the default behavior, I can make the change. |
@sijie I still think that we should make it default for standalone. Users will be typically more interested in actual function logs(either in log files or log topic) than the console output of broker. |
Would it be feasible to route all logs to a separate file? |
@merlimat we can route based on classes @srkukarni okay will make it default. |
@merlimat @srkukarni @jerrypeng : addressed your comments. make functions worker started by default in standalone, it can be turned off by "--no-functions-worker". changed "function worker" to "functions worker" and also add a setting in server configuration for broker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
LGTM! |
Motivation
It would make things easier to support run function worker along with broker. For example, user can turn on
pulsar-functions
by just adding a new flag in command line: e.g.bin/pulsar standalone --run-function-worker
.Modifications
The major challenge of running function worker along with broker is to avoid conflicting on protobuf version. So the main idea of this change:
pulsar-function-worker-shaded
module shade protobufpulsar-broker
includespulsar-function-worker-shaded
: start a worker service when--run-function-worker
is specified, registered functions rest endpoints underadmin/functions
andadmin/v2/functions
.The details of the change:
within pulsar-functions
pulsar-function-worker-shaded
: it shades protobuf3 and includes grpc and the dependencies that uses protobuf3.pulsar-function-worker-runner
, which excludepulsar-client-tools-shaded
and includes the original client/admin module. It is acting as a testing module to make surepulsar-function-worker-shaded
can work with pulsar-broker.pulsar-function-worker-runner
. The existing functions scripts remain here util we merge function cli into pulsar-toolspulsar-broker
pulsar-function-worker-shaded
in pulsar-broker.PulsarBrokerStarter
andPulsarStandaloneStarter
to start function worker along with it.script
function_worker.yml
in conf directoryconf/log4j2.yml
to includeRoutingAppender
, so broker can use it for routing functions' logging.bin/pulsar
to find java/python instancesResult
we are able to start function-worker with
bin/pulsar standalone --run-function-worker
.NOTES: