Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to v0.14 plugin api #928

Merged
merged 29 commits into from
May 17, 2016
Merged

Switch to v0.14 plugin api #928

merged 29 commits into from
May 17, 2016

Conversation

tagomoris
Copy link
Member

@tagomoris tagomoris commented Apr 28, 2016

This change is based on #912 and #943.

@tagomoris tagomoris added the v0.14 label May 9, 2016
@tagomoris tagomoris force-pushed the switch-to-v14-plugin-api branch 3 times, most recently from 5641fd3 to 3117c55 Compare May 13, 2016 11:06
@tagomoris
Copy link
Member Author

Rest tasks are:

  • writing tests for Fluent::Plugin::MultiOutput
  • making it sure to run Fluentd processes

@tagomoris
Copy link
Member Author

tagomoris commented May 16, 2016

Tasks not resolved (will do these later after v0.14.0 release)

  • plugin lifecycle test (agent / root_agent): it should call lifecycle methods for all plugins in order asc (or desc for #start)
  • plugin lifecycle test (root_agent): it should call all methods of startup/shutdown sequence
  • ...

@tagomoris tagomoris changed the title [WIP] Switch to v0.14 plugin api Switch to v0.14 plugin api May 16, 2016
@tagomoris
Copy link
Member Author

I'm checking some example configurations:

  • example/in_forward
  • example/out_forward
  • example/out_file (buf_memory)
  • example/out_copy (stdout, file+memory)
  • example/out_forward_buf_file (newly added)

@tagomoris
Copy link
Member Author

I'll merge #912, #943 and this after CI green confirmation.

@tagomoris
Copy link
Member Author

I did a very simple benchmark to show basic throughput on my laptop (MacBookAir 11inch), using configurations below:

# generator process (1 or 2) w/ v0.12
<source>
  @type dummy
  dummy {"message":"yay fluentd v0.14!"}
  rate  20000
  tag   test.flow
</source>

<match test.flow>
  @type forward
  flush_interval 1s
  num_threads 5
  <server>
    host 127.0.0.1
    port 24224
  </server>
</match>

# forward process v0.14
<source>
  @type forward
  port 24224
</source>

<match test.**>
  @type forward
  buffer_type file
  buffer_path /tmp/fluentd.forward.buffer
  num_threads 10
  flush_interval 1s
  <server>
    host 127.0.0.1
    port 24225
  </server>
</match>

# counter process v0.12
<source>
  @type forward
  port 24225
</source>

<match test.**>
  @type flowcounter
  unit second
  tag  fluentd.traffic
</match>

<match fluentd.traffic>
  @type stdout
</match>

@tagomoris
Copy link
Member Author

v0.14 + file buffer shows a bit curious and unexpected event transferring. Totally this process can forward 20k events per second on average, but flush interval control in v0.14 is different and unexpected from v0.12's one.:

2016-05-17 10:21:47 +0900 fluentd.traffic: {"test.flow_count":40000,"test.flow_count_rate":40000.0}
2016-05-17 10:21:48 +0900 fluentd.traffic: {"test.flow_count":0,"test.flow_count_rate":0.0}
2016-05-17 10:21:49 +0900 fluentd.traffic: {"test.flow_count":38000,"test.flow_count_rate":38000.0}
2016-05-17 10:21:50 +0900 fluentd.traffic: {"test.flow_count":0,"test.flow_count_rate":0.0}
2016-05-17 10:21:51 +0900 fluentd.traffic: {"test.flow_count":40000,"test.flow_count_rate":40000.0}
2016-05-17 10:21:52 +0900 fluentd.traffic: {"test.flow_count":0,"test.flow_count_rate":0.0}
2016-05-17 10:21:53 +0900 fluentd.traffic: {"test.flow_count":38000,"test.flow_count_rate":38000.0}
2016-05-17 10:21:54 +0900 fluentd.traffic: {"test.flow_count":0,"test.flow_count_rate":0.0}
2016-05-17 10:21:55 +0900 fluentd.traffic: {"test.flow_count":40000,"test.flow_count_rate":40000.0}
2016-05-17 10:21:56 +0900 fluentd.traffic: {"test.flow_count":0,"test.flow_count_rate":0.0}
2016-05-17 10:21:57 +0900 fluentd.traffic: {"test.flow_count":38000,"test.flow_count_rate":38000.0}
2016-05-17 10:21:58 +0900 fluentd.traffic: {"test.flow_count":0,"test.flow_count_rate":0.0}
2016-05-17 10:21:59 +0900 fluentd.traffic: {"test.flow_count":39392,"test.flow_count_rate":39392.0}
2016-05-17 10:22:00 +0900 fluentd.traffic: {"test.flow_count":38608,"test.flow_count_rate":38608.0}
2016-05-17 10:22:01 +0900 fluentd.traffic: {"test.flow_count":0,"test.flow_count_rate":0.0}
2016-05-17 10:22:02 +0900 fluentd.traffic: {"test.flow_count":38000,"test.flow_count_rate":38000.0}
2016-05-17 10:22:03 +0900 fluentd.traffic: {"test.flow_count":0,"test.flow_count_rate":0.0}
2016-05-17 10:22:04 +0900 fluentd.traffic: {"test.flow_count":40000,"test.flow_count_rate":40000.0}
2016-05-17 10:22:05 +0900 fluentd.traffic: {"test.flow_count":0,"test.flow_count_rate":0.0}
2016-05-17 10:22:06 +0900 fluentd.traffic: {"test.flow_count":38000,"test.flow_count_rate":38000.0}
2016-05-17 10:22:07 +0900 fluentd.traffic: {"test.flow_count":0,"test.flow_count_rate":0.0}
2016-05-17 10:22:08 +0900 fluentd.traffic: {"test.flow_count":40000,"test.flow_count_rate":40000.0}
2016-05-17 10:22:09 +0900 fluentd.traffic: {"test.flow_count":0,"test.flow_count_rate":0.0}
2016-05-17 10:22:10 +0900 fluentd.traffic: {"test.flow_count":38000,"test.flow_count_rate":38000.0}
2016-05-17 10:22:11 +0900 fluentd.traffic: {"test.flow_count":0,"test.flow_count_rate":0.0}
2016-05-17 10:22:12 +0900 fluentd.traffic: {"test.flow_count":39762,"test.flow_count_rate":39762.0}
2016-05-17 10:22:13 +0900 fluentd.traffic: {"test.flow_count":0,"test.flow_count_rate":0.0}
2016-05-17 10:22:14 +0900 fluentd.traffic: {"test.flow_count":0,"test.flow_count_rate":0.0}
2016-05-17 10:22:15 +0900 fluentd.traffic: {"test.flow_count":58238,"test.flow_count_rate":58238.0}
2016-05-17 10:22:16 +0900 fluentd.traffic: {"test.flow_count":0,"test.flow_count_rate":0.0}
2016-05-17 10:22:17 +0900 fluentd.traffic: {"test.flow_count":78627,"test.flow_count_rate":78627.0}

@tagomoris
Copy link
Member Author

This is an (expected) result with forwarder on Fluentd v0.12:

2016-05-17 10:24:59 +0900 fluentd.traffic: {"test.flow_count":20000,"test.flow_count_rate":20000.0}
2016-05-17 10:25:00 +0900 fluentd.traffic: {"test.flow_count":20000,"test.flow_count_rate":20000.0}
2016-05-17 10:25:01 +0900 fluentd.traffic: {"test.flow_count":20000,"test.flow_count_rate":20000.0}
2016-05-17 10:25:02 +0900 fluentd.traffic: {"test.flow_count":18000,"test.flow_count_rate":18000.0}
2016-05-17 10:25:03 +0900 fluentd.traffic: {"test.flow_count":20000,"test.flow_count_rate":20000.0}

@tagomoris
Copy link
Member Author

I will merge this (and some leading) changes into master, and then will investigate this flush timing.
Total throughput looks enough good as the first version of major change, and this big (unmerged) change is hard to control.
@repeatedly @sonots Do you have any thoughts or concerns?

@tagomoris
Copy link
Member Author

No responses... It's time to merge!

tagomoris added 16 commits May 17, 2016 14:22
* without this fix, some events will be written paritally if any chunks raise error in #append/#concat
* it makes duplicated events

Buffer#write will do: (method name was changed: #emit is method names for event routers)
* this method receives pairs of metadata and data to be written at once
* append/concat these data to chunks (not committed)
* commit first chunk
  * if succeeded, then commit all (even if any following chunk raises error)
  * if failed, rollback all

In memory/file buffer, #commit is very lightweight operation and will NOT fail in most cases.

This change requires some additional internal APIs for buffers/chunks
* chunk status in general: this is required to make #write thread safe
* keyword argument of #write: bulk and enqueue
  * #write method becomes much more complex, so bulk operation should be merged into an implementation
    (we can't maintain two different/similar methods)
  * #write method enqueues chunks if needed, so :immediate mode should be implemented in same level
* chunk_full_threshold configuration parameter to control "chunk_size_full?"
  * bulk and non-bulk #write were merged
  * In non-bulk mode, it's too rare that written chunk content bytesize is same with limitation
* without this fix, some events will be written paritally if any chunks raise error in #append/#concat
* it makes duplicated events

Buffer#write will do: (method name was changed: #emit is method names for event routers)
* this method receives pairs of metadata and data to be written at once
* append/concat these data to chunks (not committed)
* commit first chunk
  * if succeeded, then commit all (even if any following chunk raises error)
  * if failed, rollback all

In memory/file buffer, #commit is very lightweight operation and will NOT fail in most cases.

This change requires some additional internal APIs for buffers/chunks
* chunk status in general: this is required to make #write thread safe
* keyword argument of #write: bulk and enqueue
  * #write method becomes much more complex, so bulk operation should be merged into an implementation
    (we can't maintain two different/similar methods)
  * #write method enqueues chunks if needed, so :immediate mode should be implemented in same level
* chunk_full_threshold configuration parameter to control "chunk_size_full?"
  * bulk and non-bulk #write were merged
  * In non-bulk mode, it's too rare that written chunk content bytesize is same with limitation
@tagomoris tagomoris force-pushed the switch-to-v14-plugin-api branch from 1a43a1a to eefaa15 Compare May 17, 2016 05:36
@tagomoris
Copy link
Member Author

Green!

@tagomoris tagomoris merged commit 966a579 into master May 17, 2016
@tagomoris tagomoris deleted the switch-to-v14-plugin-api branch June 2, 2016 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant