-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persistent config #937
Persistent config #937
Conversation
Hey @Qizot, thanks for kicking this off! Some high-level thoughts below. I think we should always use the application environment for accessing the configuration at runtime. The values we keep there are generally parsed into Elixir values, for example in case of file systems or the default runtime, we shouldn't dump the structs as-is, because it would break as soon as we change some of the internals. One direction that we could take this is to make the backend a "primary store" (file, maybe database, or none), so then when setting the initial application config we would read what we have in the store and use a precedence like @josevalim please let me know if this makes sense, or if you have any other take on this :) |
.gitignore
Outdated
@@ -33,3 +33,6 @@ npm-debug.log | |||
|
|||
# The built Escript | |||
/livebook | |||
|
|||
# The output of elixir's lsp | |||
/.elixir_ls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go to your global gitignore. :)
Thank you @Qizot for the PR and sorry for the delay when replying. I was thinking about this problem and, because I could not answer all questions, it took me too long. So I decided to write my thoughts down here. About the persistence, it is likely that we want to store more information in the future besides the configuration. For example, maybe we will allow users to be stored somewhere. Or perhaps configuration to third party nodes, etc. The other issue is the conflict between flags/env vars and configuration. I think it would be best if we don't mix those altogether: or something is a configuration/env vars or it is something that we fully put in storage, there should be no overlap. In other words, we should decouple the solution from configuration and treat it as a complete separate thing. For example, once we add this feature, there is no environment var for configuring file systems. I still need to digest a bit more where we want to go, but I thought I would give you some high-level updates. |
Ok, I have thought more about this. One of the issues with the format that we choose, is how are we going to migrate the data. For example, we could perhaps use Ecto. We would define the schemas and we could use Etso to persist it in memory/disk and use PostgreSQL if you want to share it across servers. The issue, however, is that, if we add new data to the schema, then we need to migrate PostgreSQL and we need to migrate the structure in memory/disk too. My conclusion is that, in our case, the data is more important than a particular shape. Therefore, I propose for us to store data using a variant of the Entity-Attribute-Value model. We will have two additional columns: a namespace and the insertion timestamp. For example, to store the user name, we would do:
Our initial implementation (V1) could store this in ETS like this:
The timestamp is computed with
Where For V1, we can focus simply on replacing the filesystem part. We can remove the environment variables and introduce something called Livebook.Storage. Here is the behaviour: @doc """
Returns a map identified by `entity_id` in `namespace`.
fetch(:filesystem, "rand-id")
#=> {:ok, %{id: "rand-id", type: "s3", bucket_url: "/...", secret: "abc", access_key: "xyz"}}
"""
@callback fetch(namespace, entity_id)
@doc """
Returns all values in namespace.
all(:filesystem)
[%{id: "rand-id", type: "s3", bucket_url: "/...", secret: "abc", access_key: "xyz"}]
"""
@callback all(namespace)
@doc ...
@callback insert(namespace, entity_id, [{attribute, value}])
@doc ...
@callback delete(namespace, entity_id) Both insert and delete go though a GenServer. WDYT? If you agree, do you want to implement this? |
@josevalim I really like the idea! I can sit down to it during this weekend or during the following week :) |
@josevalim I've implemented the ETS versions of the spec you suggested. Any ideas how to integrate it with the livebook itself (regarding the filesystems configs I can simply remove just the initial configuration and all the other add/remove should work out of the box)? |
Yes, exactly! Just assume it will be stored in storage. :) |
lib/livebook/application.ex
Outdated
if Mix.env() == :test do | ||
[] | ||
else | ||
[Application.fetch_env!(:livebook, :storage)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Soon we will need the storage in tests as well, so you should always start it. :) And remember to start it early on, as other processes may rely on it in the future. I would start it right after the repository!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little bit lost on how to set it up properly so that it does not conflict with the Ets
test suits as it starts under the specified name. Is it okay to just start_link
it inside setup_all
and leave it (meaning that if is is already started we simply don't care and otherwise when the test ends it will be shut down on its own)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the test just assume that it is already running and write directly to it. As long as they write to different namespaces, it should be fine. You can also do this trick:
test "insert/2", config do
Ets.insert(config.test, ...)
...
In other words, make the test name itself the namespace, and that should guarantee uniqueness!
lib/livebook/storage/ets.ex
Outdated
entries -> | ||
entries | ||
|> Enum.map(fn {_key, attr, val, _timestamp} -> | ||
{attr, val} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized perhaps we could store {attr, val}
as a tuple. This way this could be a :ets.lookup_element/2
. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can't really use lookup_element
due to this quirk from documentation If no object with key Key exists, the function exits with reason badarg.
which is pretty bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we typically do this:
try do
:ets.lookup_element(@table, key, 2)
catch
_ -> []
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I think that keeping attribute key and value separately looks more clear and enables to operate on them more freely (an example would be the select_delete
that I've added).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great, I have dropped my review with some possible improvements!
@josevalim any ideas how to cleanly serialize/deserialize different types of filesystem back and forth from storage? My idea was to serialize and serialize the module and make the |
Do we need to deserialize them? The filesystems are created from the configuration, which today is a string, and we will change said configuration to be a map. I don't think we need to go from a filesystem instance to a setting, do we? |
I probably have problems understanding how to current flow of managing file systems should look like so correct me if I'm wrong. Each added file system should have a generated id and it should probably be carried around so that we can easily My problem was that as far as this approach is fine for S3, then for Local file system this is no so great as it is carrying some pid (meaning that the information will be persisted even if we don't want it to). |
def filesystems do
[Livebook.Filesystem.Local...] ++
Enum.map(Livebook.Storage.all(:filesystems), &storage_to_fs/1)
end And then defp storage_to_fs(%{type: "s3"} = config), do: Livebook.Filesystem.S3.from_config(config) I am not sure if those are the modules but the important point I want to bring your attention to is that you should not be expecting the data to come ready from the storage. You will use the storage as a mechanism to "hydrate"/create the actual data. This is true in many cases but especially true in our case since we created a very dumb storage layer (which has pros in our case!). :) Does this make sense? |
lib/livebook/file_system/s3.ex
Outdated
"#{file_system.bucket_url} #{file_system.access_key_id} #{file_system.secret_access_key}" | ||
@spec to_config(t()) :: map() | ||
def to_config(%__MODULE__{} = s3) do | ||
Map.from_struct(s3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to be explicit. Match on the relevant keys and return the relevant keys. This way, if we add new fields to the struct, we don't accidentally "leak" them. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean something like that? Looks a little bit strange to me and still there is a lot of room for making mistakes.
I took the Map.take
approach but then the keys may be nil at some point...
@spec to_config(t()) :: map()
def to_config(%__MODULE__{} = s3) do
%{bucket_url: bucket_url, access_key_id: access_key_id, secret_access_key: secret_access_key} = s3
%{bucket_url: bucket_url, access_key_id: access_key_id, secret_access_key: secret_access_key}
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. Although I have a hunch this function will no longer exist after the rewrite, so I guess we can leave it as is right now. Your call.
test/livebook/storage/ets_test.exs
Outdated
setup_all do | ||
Ets.start_link([]) | ||
|
||
:ok | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this anymore?
setup_all do | |
Ets.start_link([]) | |
:ok | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact we don't but just wanted to keep in here once some other storage backend gets implemented. Will delete it for now if necessary
file_systems = Enum.uniq(file_systems() ++ [file_system]) | ||
Application.put_env(:livebook, :file_systems, file_systems, persistent: true) | ||
file_systems | ||
def append_file_system(%FileSystem.S3{} = file_system) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling it append
is a bit misleading now because we don't have any order. Ideally, we would move these functions out of this module, because they are no longer related to config, and I would move all of them to Livebook.FileSystem
, such as Livebook.FileSystem.insert
, Livebook.FileSystem.all
, and so on. But I think we can make this in a future pull request. What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had exactly the same thoughts as yours, changing that in the next PR seems reasonable so that this one does not grow to significantly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had another idea. We should probably have Livebook.FileSystem.static()
(returns from config), .dynamic()
(returns from storage), and .all()
(both). Then on the settings page, we know that static cannot be added/deleted. But again, for another PR. :)
|
||
storage().insert(:filesystem, generate_filesystem_id(), [{:type, "s3"} | attributes]) | ||
|
||
file_systems() | ||
end | ||
|
||
@doc """ | ||
Removes the given file system from the configured ones. | ||
""" | ||
@spec remove_file_system(FileSystem.t()) :: list(FileSystem.t()) | ||
def remove_file_system(file_system) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make it so we remove it based on an ID. But we will need to change the UI for this. It can also be done in another PR. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that was intentional as I didn't want to introduce persisting the ids straight away.
I have added a few extra comments. We should fully move away from config but that can be done in another PR. It is your call. :) |
I would go for moving all the filesystem configuration away from |
@Qizot we just need to make the formatter happy and we are good to go! |
Whops, some error came in, will fix it in a moment |
💚 💙 💜 💛 ❤️ |
@Qizot fantastic job! |
A proposed solution for #633.
It provides a config backend that can be used for persisting and reading livebook configuration. It does not break any API (all existing tests pass) and allows to freely swap to other config backends, currently one reading from elixir's config variables and the other storing encrypted information in a file.
This is rather a POC rather than a proper solution so keep that in mind when you see hard coded secret 👀
Any feedback appreciated.