Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running function worker along with broker #1329

Merged
merged 15 commits into from
Mar 5, 2018

Conversation

sijie
Copy link
Member

@sijie sijie commented Mar 5, 2018

Motivation

It would make things easier to support run function worker along with broker. For example, user can turn on pulsar-functions by just adding a new flag in command line: e.g. bin/pulsar standalone --run-function-worker.

Modifications

The major challenge of running function worker along with broker is to avoid conflicting on protobuf version. So the main idea of this change:

  • have a pulsar-function-worker-shaded module shade protobuf
  • pulsar-broker includes pulsar-function-worker-shaded: start a worker service when --run-function-worker is specified, registered functions rest endpoints under admin/functions and admin/v2/functions.
  • update conf/log4j2.yml and bin/pulsar for adopting function worker in pulsar startup scripts.

The details of the change:

within pulsar-functions

  • add pulsar-function-worker-shaded: it shades protobuf3 and includes grpc and the dependencies that uses protobuf3.
  • add pulsar-function-worker-runner, which exclude pulsar-client-tools-shaded and includes the original client/admin module. It is acting as a testing module to make sure pulsar-function-worker-shaded can work with pulsar-broker.
  • points the existing scripts to use the pulsar-function-worker-runner. The existing functions scripts remain here util we merge function cli into pulsar-tools

pulsar-broker

  • move tests from pulsar-client-tools into pulsar-client-tools-test. because pulsar-client-tools depends on pulsar-broker on testing. this would introduce recursive dependency when we attempt to include pulsar-function-worker-shaded in pulsar-broker.
  • update PulsarBrokerStarter and PulsarStandaloneStarter to start function worker along with it.

script

  • include function_worker.yml in conf directory
  • update conf/log4j2.yml to include RoutingAppender, so broker can use it for routing functions' logging.
  • update bin/pulsar to find java/python instances

Result

we are able to start function-worker with bin/pulsar standalone --run-function-worker.

NOTES:

  1. this change doesn't touch the CLI part.
  2. this change doesn't touch the distribution part. i

@sijie sijie self-assigned this Mar 5, 2018
@sijie sijie requested review from merlimat and rdhabalia March 5, 2018 05:54
@sijie
Copy link
Member Author

sijie commented Mar 5, 2018

/cc @srkukarni @jerrypeng

@merlimat
Copy link
Contributor

merlimat commented Mar 5, 2018

I think for standalone service, this should be enabled by default.

@srkukarni
Copy link
Contributor

agree with @merlimat

@jerrypeng
Copy link
Contributor

Yup

bin/pulsar Outdated
@@ -74,6 +110,7 @@ where command is one of:
discovery Run a discovery server
proxy Run a pulsar proxy
websocket Run a web socket proxy server
worker Run a function worker server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about naming the command functions-worker to make it more explicit? Also, use plural to highlight this will be running multiple instances of multiple functions.

@@ -40,6 +40,21 @@
<artifactId>testng</artifactId>
<version>6.13.1</version>
</dependency>
<dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If every module depends on buildtool, everyone will depend on log4j

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the log4j are already default to all modules in test scope, buildtools is also default to all modules in test scope, since it is shipping a default log4j2.yaml for testing. so it is making sense to move them to buildtools.

# under the License.
#

workerId: standalone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have comments for each key explaining the purpose of each setting.

Ideally, each setting should also have the default value, such that commenting that same line would have the same effect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is currently copied from functions-module, which I don't change any behavior at this moment. I would keep this change as simple as for now since it is already touching quite a few places. we can clean up these configuration files after we merge the functions dist with pulsar/all.

pulsarServiceUrl: pulsar://localhost:6650
pulsarWebServiceUrl: http://localhost:8080
numFunctionPackageReplicas: 1
downloadDirectory: /tmp/pulsar_functions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should default to data/pulsar_functions. /tmp is mounted on ramdisk in many systems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the purpose of using /tmp because worker doesn't really require any persistence. the directory can be gone after reboot, that's okay. but we can change it to data/ if that's preferred. although it would be good to do this after this change when we are merging function dist with pulsar/all. the file is copied from functions/conf to conf.

metricsConfig:
metricsSinkClassName: org.apache.pulsar.functions.metrics.sink.PrometheusSink
metricsCollectionInterval: 30
metricsSinkConfig:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we collect the metrics with the Prometheus java client lib, the metrics are already in the registry and will be reported by standalone broker at localhost:8080/metrics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is for process based runtime. @srkukarni can chime in here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sijie This is for both process and thread runtimes. @merlimat I was hoping to integrate with Pulsar's inbuilt metrics after the merge.

@@ -76,6 +81,12 @@
@Parameter(names = { "--only-broker" }, description = "Only start Pulsar broker service (no ZK, BK)")
private boolean onlyBroker = false;

@Parameter(names = {"-rfw", "--run-function-worker"}, description = "Run function worker with Broker")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--run-functions-worker

Also, we should have a key in broker.conf to do the same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add one.

@jerrypeng
Copy link
Contributor

@sijie In the worker we are issue calls the Pulsar broker's admin REST interface. In the future, for better integration, can the pulsar broker pass in objects to the worker so it can use that determine the info gotten from the admin REST interface instead of actually issuing a HTTP command locally?

@sijie
Copy link
Member Author

sijie commented Mar 5, 2018

In the worker we are issue calls the Pulsar broker's admin REST interface. In the future, for better integration, can the pulsar broker pass in objects to the worker so it can use that determine the info gotten from the admin REST interface instead of actually issuing a HTTP command locally?

@jerrypeng yes. that would be the ideal solution. however that is going to take more efforts to achieve that, which should be done after we make function worker integrated on broker.

@sijie
Copy link
Member Author

sijie commented Mar 5, 2018

for standalone service, this should be enabled by default.

@jerrypeng @srkukarni @merlimat : the console output of standalone is already very complicated, making standalone by default starts function worker will make it worse. are we sure we want to make it by default.

If we think start function worker should be the default behavior, I can make the change.

@srkukarni
Copy link
Contributor

@sijie I still think that we should make it default for standalone. Users will be typically more interested in actual function logs(either in log files or log topic) than the console output of broker.

@merlimat
Copy link
Contributor

merlimat commented Mar 5, 2018

@jerrypeng @srkukarni @merlimat : the console output of standalone is already very complicated, making standalone by default starts function worker will make it worse. are we sure we want to make it by default.

Would it be feasible to route all logs to a separate file?

@sijie
Copy link
Member Author

sijie commented Mar 5, 2018

@merlimat we can route based on classes org.apache.pulsar.functions.

@srkukarni okay will make it default.

@sijie
Copy link
Member Author

sijie commented Mar 5, 2018

@merlimat @srkukarni @jerrypeng :

addressed your comments. make functions worker started by default in standalone, it can be turned off by "--no-functions-worker".

@merlimat :

changed "function worker" to "functions worker" and also add a setting in server configuration for broker.

Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@merlimat merlimat added this to the 2.0.0-incubating milestone Mar 5, 2018
@merlimat merlimat merged commit b701925 into apache:master Mar 5, 2018
@srkukarni
Copy link
Contributor

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants