-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sorted key/value store (badger) backed storage plugin #760
Conversation
Thanks for the contribution! Some preliminary thoughts: Have you seen this ticket? #422 we've been considering supporting third party storage plugins via a new plugin framework so that the core jaeger library isn't overly inundated with storage solution specific implementations (new plugins will be supported by the contributor in a separate repo). We haven't come up with the exact framework yet so we can review this PR but in the future, we'd expect to move it over to the plugin framework. I'll take a look at the actual PR later today. |
Yeah, my intention was to solve the #551 issue. Theoretically moving the code to an external plugin shouldn't be an issue - just add some API listeners in front (if that's the solution #422 ends up with). Separate repo could technically allow defining build flags to support multiple different K/V stores I assume. |
There's a bug in the index seek process, it returns results in the DESC order, but does not actually fetch TOP N by using a DESC ordering, but instead uses ASC. Will fix in the next commit with some other updates (as well as add test that catches the mistake) |
@burmanm thanks for the PR. I missed it. I am wondering whether we could completely replace in-memory with local storage. I will have a closer look next week. also travis seems to be broken for the last build |
There was a discussion about this PR on bi-weekly call. Rather than introducing a new storage impl can this substitute current in-memory? What are the implications? Can it work without creating any files? If so we could use this instead of in-memory. |
By creating files in the tmpfs it is in-memory storage as tmpfs is not persisted to the disk. We could add an option to clean up the tmpfs files on the shutdown if we wanted (pod reboot / machine reboot / etc will do that of course also) |
glide.yaml
Outdated
@@ -55,3 +55,5 @@ import: | |||
- package: github.com/go-openapi/validate | |||
- package: github.com/go-openapi/loads | |||
- package: github.com/elazarl/go-bindata-assetfs | |||
- package: github.com/dgraph-io/badger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs glide update
to update the lock file with this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly I'm not sure what's the proper way to update this project's dependencies. glide up will break Jaeger (for example viper is updated and will no longer compile) as that has no version locked in the glide.yaml
It also updates several other dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I did in a previous PR was to apply this change in a separate branch and then manually update the lock file only for the dependency I'm updating. Sucks, but works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this now needs to be switched to Gopkg.toml / dep
There are a lot of magic numbers in different places. Could you please define them as constants? |
Then when you are creating prefixes could you please extract those into functions? |
plugin/storage/badger/options.go
Outdated
// Options store storage plugin related configs | ||
type Options struct { | ||
primary *NamespaceConfig | ||
others map[string]*NamespaceConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? Even the primary namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be constant with the other storage engines. No other reason really.
plugin/storage/badger/options.go
Outdated
SpanStoreTTL: time.Hour * 72, // Default is 3 days | ||
SyncWrites: false, // Performance over consistency | ||
Ephemeral: true, // Default is ephemeral storage | ||
ValueDirectory: "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to use a reasonable default value so users just set empheral to false and it works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make as default the application path + data/
for example. I don't think there's a sane default path for files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about something inside /tmp/
? When ephemeral is true, /tmp
is appropriate.
I see that ioutil.TempDir
is used when this is empty, so, ignore my last comment.
) | ||
|
||
// NewOptions creates a new Options struct. | ||
func NewOptions(primaryNamespace string, otherNamespaces ...string) *Options { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate why we need namespaces? If not needed then remove or rename it to flag prefix :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment to keep consistency with other storage engines.
opts := badger.DefaultOptions | ||
|
||
if f.Options.primary.Ephemeral { | ||
opts.SyncWrites = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If set to false, is data available for query immediately?
Does it make sense for tmpfs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data is available in any case directly. With tmpfs it makes no difference is sync is true or false (other than extra overhead of fsync call to tmpfs) for durability and there's never a difference when it comes to visibility. Stuff is always written to the page cache in any case, but this controls if fsync to the disk is called after every write or not.
|
||
// Seek all the services first | ||
for it.Seek(serviceKey); it.ValidForPrefix(serviceKey); it.Next() { | ||
timestampStartIndex := len(it.Item().Key()) - 24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 24?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the WriteSpan. 8 bytes for the timestamp + 16 bytes for the traceId -> 24
if p == nil { | ||
return ErrMalformedRequestObject | ||
} | ||
if p.ServiceName == "" && len(p.Tags) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the number of tags matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no restriction to the amount.
} | ||
|
||
ids := make([][][]byte, 0, len(indexSeeks)+1) | ||
if len(indexSeeks) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like redundant
ids := make([][][]byte, 0, len(indexSeeks)+1) | ||
if len(indexSeeks) > 0 { | ||
for i, s := range indexSeeks { | ||
indexResults, _ := r.scanIndexKeys(s, query.StartTimeMin, query.StartTimeMax) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignoring error?
} | ||
ids = append(ids, make([][]byte, 0, len(indexResults))) | ||
for _, k := range indexResults { | ||
ids[i] = append(ids[i], k[len(k)-16:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
16?
Could you please define this magic numbers as constants with some comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Size of TraceID is 16.
} | ||
|
||
// Close Implements io.Closer | ||
func (r *TraceReader) Close() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not needed remove please
plugin/storage/badger/options.go
Outdated
flagSet.Bool( | ||
nsConfig.namespace+suffixSyncWrite, | ||
nsConfig.SyncWrites, | ||
"If all writes should be synced immediately. This will greatly reduce write performance and will require fast SSD drives. Default is false.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will require fast SSD drives
If I'm not on SSD, are you stopping the process? If not, what does "require" mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly that this backend really wants SSDs and if you're going to sync all the writes, you probably want something with fast write time. I've removed it for now since technically you could use a DRAM backed disk array also for writes (but reads would be slow most likely)
plugin/storage/badger/options.go
Outdated
flagSet.Bool( | ||
nsConfig.namespace+suffixEphemeral, | ||
nsConfig.Ephemeral, | ||
"Mark this storage ephemeral, data is stored in tmpfs (in-memory). Default is true.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "default" part is added automatically by Viper. This is how it looks like when you run SPAN_STORAGE_TYPE=badger go run ./cmd/collector/main.go --help
--badger.ephemeral Mark this storage ephemeral, data is stored in tmpfs (in-memory). Default is true. (default true)
} | ||
|
||
func initFromViper(cfg *NamespaceConfig, v *viper.Viper) { | ||
cfg.Ephemeral = v.GetBool(cfg.namespace + suffixEphemeral) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a max-traces
option? or just a strategy for dealing with a full disk?
Rebased. I also fixed a generic issue with the testing framework inside Jaeger which happens to test against pointer locations instead of real data. |
@burmanm I am trying to test your Badger storage backend for my use case #894 which requires persisting traces to some backend, and being able to send the storage files over a network when needed so that a Jaeger instance on a remote machine can use it to display the traces recorded so far. I cloned your changes but am unable to figure out how to restore the Jaeger instance using previously persisted traces. It seems f.Options.primary.Ephemeral should be set to false so that the storage is not lost but I am unable to figure out how to restore traces and which files would be required for a remote Jaeger instance to read from. Could you please list the steps. Long term, I think it would be better if you could write a Readme file on how to use the badger storage and all its features. |
|
||
// GetServices fetches the sorted service list that have not expired | ||
func (r *TraceReader) GetServices() ([]string, error) { | ||
return r.cache.GetServices() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed an issue with this implementation that in case of initializing badger factory by reloading from a pre-existing directory, the services list does not get populated with services from pre-existing traces even though the traces are accessible by making a direct REST API call.
Probably the cause is that you are relying on the cache which is not pre-populated.
|
||
// GetOperations fetches operations in the service and empty slice if service does not exists | ||
func (r *TraceReader) GetOperations(service string) ([]string, error) { | ||
return r.cache.GetOperations(service) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed an issue with this implementation that in case of initializing badger factory by reloading from a pre-existing directory, the operations list does not get populated with services from pre-existing traces even though the traces are accessible by making a direct REST API call.
Probably the cause is that you are relying on the cache which is not pre-populated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NewCacheStore() calls prefillCaches()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unit test that does persisting (mentioned in other comment) also seems to indicate the caches are prefilled correctly.
I should add this type unit tests to the suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a backup of Badger db and restored it to a location that was passed as key and value dir to the Jaeger standalone run. I noticed the loadServices() method does find services present in traces of the backup but finds its TTL as 0. This causes the services to be deleted during GetServices() as your logic finds it as expired. I did take a look at your test and somehow TTL in this case is 72 hours ahead of now time so the test passes.
I am not sure why services loaded from a pre-existing Badger db has 0 TTL. Do you know what might be causing this?
For now, commenting the delete line in GetServices() helps populate the services on UI- similarly for GetOperations(). Is there a danger in this approach? Why do you perform the check (v > t) in GetServices?
PS: My use case demands that backed up data should be accessible always.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't replicate that behavior, in my testing I see:
Key: key="\x81service-p\x00\x05pSgE\x89\xbd\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01", version=1, meta=40, 1531137600, time now: 1530878401
Key: key="\x82service-poperation-p\x00\x05pSgE\x89\xbd\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01", version=1, meta=40, 1531137600, time now: 1530878401
That is, the services do expire later than the current time (as they should). The reason for that v > t
is that we don't return ghost services or operations. That would be bad in terms of usability if one has a lot of services / operations that keep changing.
If you need backups that do not expire ever, then I guess TTL is the wrong approach for you. What you'd want is something closer to a snapshot + purge of snapshotted data (with no TTL, such that you would manually take care of retention policies).
Certainly doable, but maybe it is a bit of feature creep for this PR?
@SwarnimRaj Here's a very rudimentary unit test that shows how to load an existing file: https://gist.github.com/burmanm/04021a32d8fb728792eb49ab0044dae8 Now, as for backup & restore, there's a separate tool for that (which should take care of LSM syncing). I have not considered such use case and for that reason there's no instructions for it either. But it can be done by following the instructions here: https://github.com/dgraph-io/badger#database-backup |
Signed-off-by: Michael Burman <yak@iki.fi>
Signed-off-by: Michael Burman <yak@iki.fi>
Signed-off-by: Michael Burman <yak@iki.fi>
Signed-off-by: Michael Burman <yak@iki.fi>
Signed-off-by: Michael Burman <yak@iki.fi>
Changed the dependencyreader to use spanstore instead of directly reading from the DB. The negative side is that there's increased memory usage as the []model.Trace must be kept in the memory while loading. Previously each model.Trace could have been thrown away by escape analysis. In any case, there wasn't any |
Once the merge conflict is solved, can this be merged, or is there still something pending? |
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
I think we mostly need to address #1389. |
Signed-off-by: Yuri Shkuro <ys@uber.com>
Codecov Report
@@ Coverage Diff @@
## master #760 +/- ##
==========================================
- Coverage 100% 99.82% -0.18%
==========================================
Files 165 172 +7
Lines 7510 8146 +636
==========================================
+ Hits 7510 8132 +622
- Misses 0 7 +7
- Partials 0 7 +7
Continue to review full report at Codecov.
|
So the build is green, and the code coverage decreases are only on error checks. I looked around for any tools that allow ignoring certain lines from code coverage, didn't find any. |
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
Amazing! Can't wait to use it. |
I am looking forward to using this as well! Forgive me if this has already been covered, but will the deployment instructions be updated to include badger as part of this change? https://github.com/jaegertracing/documentation/blob/master/content/docs/1.11/deployment.md |
It can already be used it with |
I read that this alternative storage option is available; however, the documentation still only mentions ElasticSearch, Cassandra, and Kafka. I'm trying to understand how I would configure this latest option. I want to have all traces written to a non-ephemeral storage. Does this option provide this capability without having to install ElasticSearch, Cassandra, or Kafka? Any examples of deployment using this approach discussed in this thread? |
Storage options are documented in the Deployment section of the docs |
Thank you for the response regarding the storage options. I want to confirm that within my Kubernetes deployment, I can create a PVC and specify that storage to be used by Badger. This will require that I deploy a Badger container and would assume that I will deploy as a daemonset; is this correct? I then specify the storage endpoint when configuring Jaeger with badger. In this manner, I should get all of the traces stored in a non-ephemeral storage. |
@sherwoodzern as explained in the documentation:
You cannot use it with normal production (i.e. scalable) installation of Jaeger, because the storage is embedded in a single process that includes both collection and UI components. |
This is a storage plugin for Jaeger implemented against a sorted K/V store. Although this version was done against Badger it should work with relatively small changes against any lexicographically sorted K/V store (RocksDB is such a possibility also - but it would require cgo for compiling).
This is WIP, pushed for early feedback. It is missing implementation for Duration index (range index scanning) as well as GetOperations & GetServices interfaces and benchmarketing/more tests of course. Some smaller TODO parts obviously remain as well, some for easier development purposes and some just lacking optimization (not to mention splitting some parts to separate functions).
cc @pavolloffay