Persistent config #937

Qizot · 2022-01-25T21:17:35Z

A proposed solution for #633.

It provides a config backend that can be used for persisting and reading livebook configuration. It does not break any API (all existing tests pass) and allows to freely swap to other config backends, currently one reading from elixir's config variables and the other storing encrypted information in a file.

This is rather a POC rather than a proper solution so keep that in mind when you see hard coded secret 👀

Any feedback appreciated.

jonatanklosko · 2022-01-26T14:50:50Z

Hey @Qizot, thanks for kicking this off! Some high-level thoughts below.

I think we should always use the application environment for accessing the configuration at runtime. The values we keep there are generally parsed into Elixir values, for example in case of file systems or the default runtime, we shouldn't dump the structs as-is, because it would break as soon as we change some of the internals.

One direction that we could take this is to make the backend a "primary store" (file, maybe database, or none), so then when setting the initial application config we would read what we have in the store and use a precedence like stored value > cli flags > env vars. Then, whenever there's a config change we just change application config, and if that's something we want to persist, we serialize the value and defer storage to the backend. This way we explicitly control what we load/store. The only tricky bit may be ensuring that store takes precedence over CLI flags (which currently override the config after initialization), but that's just a technical detail.

@josevalim please let me know if this makes sense, or if you have any other take on this :)

josevalim · 2022-01-27T20:20:58Z

.gitignore

@@ -33,3 +33,6 @@ npm-debug.log

 # The built Escript
 /livebook
+
+# The output of elixir's lsp
+/.elixir_ls


This should go to your global gitignore. :)

josevalim · 2022-01-27T20:31:57Z

Thank you @Qizot for the PR and sorry for the delay when replying. I was thinking about this problem and, because I could not answer all questions, it took me too long. So I decided to write my thoughts down here.

About the persistence, it is likely that we want to store more information in the future besides the configuration. For example, maybe we will allow users to be stored somewhere. Or perhaps configuration to third party nodes, etc.

The other issue is the conflict between flags/env vars and configuration. I think it would be best if we don't mix those altogether: or something is a configuration/env vars or it is something that we fully put in storage, there should be no overlap. In other words, we should decouple the solution from configuration and treat it as a complete separate thing. For example, once we add this feature, there is no environment var for configuring file systems.

I still need to digest a bit more where we want to go, but I thought I would give you some high-level updates.

josevalim · 2022-01-28T12:32:37Z

Ok, I have thought more about this. One of the issues with the format that we choose, is how are we going to migrate the data. For example, we could perhaps use Ecto. We would define the schemas and we could use Etso to persist it in memory/disk and use PostgreSQL if you want to share it across servers. The issue, however, is that, if we add new data to the schema, then we need to migrate PostgreSQL and we need to migrate the structure in memory/disk too.

My conclusion is that, in our case, the data is more important than a particular shape. Therefore, I propose for us to store data using a variant of the Entity-Attribute-Value model. We will have two additional columns: a namespace and the insertion timestamp. For example, to store the user name, we would do:

Namespace	Entity	Attribute	Value	Timestamp
:users	1	:name	"josé"	1643371831822
:users	1	:job	"dev"	1643371831822

Our initial implementation (V1) could store this in ETS like this:

{{namespace, entity}, attribute, value, timestamp}

The timestamp is computed with System.os_time(:millisecond). For now, we can store all data in a single ETS table. For example, to store both the autosave path and filesystem information, we would have:

Namespace	Entity	Attribute	Value	Timestamp
:settings	:global	:autosave_path	"/foo/bar"	1643371831822
:filesystem	rand()	:type	s3	1643371832489
:filesystem	rand()	:bucket_url	"/..."	1643371832489
:filesystem	rand()	:access_key_id	"xyz"	1643371832489
:filesystem	rand()	:secret	"abc"	1643371832489

Where rand() in this case could be :crypto.strong_rand_bytes(6) |> Base.url_encode64() that is generated when a filesystem is added.

For V1, we can focus simply on replacing the filesystem part. We can remove the environment variables and introduce something called Livebook.Storage. Here is the behaviour:

@doc """
Returns a map identified by `entity_id` in `namespace`.

    fetch(:filesystem, "rand-id")
    #=> {:ok, %{id: "rand-id", type: "s3", bucket_url: "/...", secret: "abc", access_key: "xyz"}}

"""
@callback fetch(namespace, entity_id)

@doc """
Returns all values in namespace.

    all(:filesystem)
    [%{id: "rand-id", type: "s3", bucket_url: "/...", secret: "abc", access_key: "xyz"}]

"""
@callback all(namespace)

@doc ...
@callback insert(namespace, entity_id, [{attribute, value}])


@doc ...
@callback delete(namespace, entity_id)

Both insert and delete go though a GenServer. insert performs both inserts and updates. If any particular attribute already exists, it must be replaced. We would only implement the storage for now, without backing it up to filesystem, which would come next.

WDYT? If you agree, do you want to implement this?

Qizot · 2022-01-28T14:01:04Z

@josevalim I really like the idea! I can sit down to it during this weekend or during the following week :)

Qizot · 2022-01-30T13:18:06Z

@josevalim I've implemented the ETS versions of the spec you suggested. Any ideas how to integrate it with the livebook itself (regarding the filesystems configs I can simply remove just the initial configuration and all the other add/remove should work out of the box)?

josevalim · 2022-01-30T14:01:25Z

Any ideas how to integrate it with the livebook itself (regarding the filesystems configs I can simply remove just the initial configuration and all the other add/remove should work out of the box)?

Yes, exactly! Just assume it will be stored in storage. :)

josevalim · 2022-01-30T14:02:45Z

lib/livebook/application.ex

+    if Mix.env() == :test do
+      []
+    else
+      [Application.fetch_env!(:livebook, :storage)]


Soon we will need the storage in tests as well, so you should always start it. :) And remember to start it early on, as other processes may rely on it in the future. I would start it right after the repository!

I am a little bit lost on how to set it up properly so that it does not conflict with the Ets test suits as it starts under the specified name. Is it okay to just start_link it inside setup_all and leave it (meaning that if is is already started we simply don't care and otherwise when the test ends it will be shut down on its own)?

In the test just assume that it is already running and write directly to it. As long as they write to different namespaces, it should be fine. You can also do this trick:

test "insert/2", config do Ets.insert(config.test, ...) ...

In other words, make the test name itself the namespace, and that should guarantee uniqueness!

lib/livebook/storage.ex

lib/livebook/storage/ets.ex

josevalim · 2022-01-30T14:05:52Z

lib/livebook/storage/ets.ex

+      entries ->
+        entries
+        |> Enum.map(fn {_key, attr, val, _timestamp} ->
+          {attr, val}


I just realized perhaps we could store {attr, val} as a tuple. This way this could be a :ets.lookup_element/2. WDYT?

I think that we can't really use lookup_element due to this quirk from documentation If no object with key Key exists, the function exits with reason badarg. which is pretty bad.

Yeah, we typically do this:

try do :ets.lookup_element(@table, key, 2) catch _ -> [] end

To be honest I think that keeping attribute key and value separately looks more clear and enables to operate on them more freely (an example would be the select_delete that I've added).

lib/livebook/storage/ets.ex

test/livebook/storage/ets_test.exs

josevalim

This is looking great, I have dropped my review with some possible improvements!

Qizot · 2022-01-30T16:00:29Z

@josevalim any ideas how to cleanly serialize/deserialize different types of filesystem back and forth from storage? My idea was to serialize and serialize the module and make the FileSystem protocol define some functions for serialization/deserialization but it can get tricky once there are things such pids for e.g. local filesystem (would cause trouble if we want further persist that information).

josevalim · 2022-01-30T16:21:23Z

Do we need to deserialize them? The filesystems are created from the configuration, which today is a string, and we will change said configuration to be a map. I don't think we need to go from a filesystem instance to a setting, do we?

Qizot · 2022-01-30T16:44:36Z

I probably have problems understanding how to current flow of managing file systems should look like so correct me if I'm wrong.

Each added file system should have a generated id and it should probably be carried around so that we can easily
remove the file system from storage based on that id.
Moreover, if we want to make the storage a single source of truth we have to keep all the information about the file systems there, meaning that for each Livebook.Config.file_systems() call we need to fetch the information from storage and convert that to the necessary struct such as Livebook.FileSystem.S3 so that the FileSystem protocol works on them.

My problem was that as far as this approach is fine for S3, then for Local file system this is no so great as it is carrying some pid (meaning that the information will be persisted even if we don't want it to).

josevalim · 2022-01-30T17:05:34Z

Livebook.Config should be completely out of the equation. Also I don't think you should treat Livebook.Storage as the source of filesystems. You can assume that the filesystems will be something like this:

def filesystems do
  [Livebook.Filesystem.Local...] ++
    Enum.map(Livebook.Storage.all(:filesystems), &storage_to_fs/1)
end

And then storage_to_fs is something like:

defp storage_to_fs(%{type: "s3"} = config), do: Livebook.Filesystem.S3.from_config(config)

I am not sure if those are the modules but the important point I want to bring your attention to is that you should not be expecting the data to come ready from the storage. You will use the storage as a mechanism to "hydrate"/create the actual data. This is true in many cases but especially true in our case since we created a very dumb storage layer (which has pros in our case!). :)

Does this make sense?

lib/livebook/config.ex

josevalim · 2022-01-30T18:04:31Z

lib/livebook/file_system/s3.ex

-    "#{file_system.bucket_url} #{file_system.access_key_id} #{file_system.secret_access_key}"
+  @spec to_config(t()) :: map()
+  def to_config(%__MODULE__{} = s3) do
+    Map.from_struct(s3)


I would prefer to be explicit. Match on the relevant keys and return the relevant keys. This way, if we add new fields to the struct, we don't accidentally "leak" them. :)

Do you mean something like that? Looks a little bit strange to me and still there is a lot of room for making mistakes.
I took the Map.take approach but then the keys may be nil at some point...

@spec to_config(t()) :: map() def to_config(%__MODULE__{} = s3) do %{bucket_url: bucket_url, access_key_id: access_key_id, secret_access_key: secret_access_key} = s3 %{bucket_url: bucket_url, access_key_id: access_key_id, secret_access_key: secret_access_key} end

Yes, exactly. Although I have a hunch this function will no longer exist after the rewrite, so I guess we can leave it as is right now. Your call.

lib/livebook/storage.ex

lib/livebook/storage/ets.ex

josevalim · 2022-01-30T18:08:42Z

test/livebook/storage/ets_test.exs

+  setup_all do
+    Ets.start_link([])
+
+    :ok
+  end
+


I don't think we need this anymore?

Suggested change

setup_all do

Ets.start_link([])

:ok

end

In fact we don't but just wanted to keep in here once some other storage backend gets implemented. Will delete it for now if necessary

josevalim · 2022-01-30T18:12:24Z

lib/livebook/config.ex

-    file_systems = Enum.uniq(file_systems() ++ [file_system])
-    Application.put_env(:livebook, :file_systems, file_systems, persistent: true)
-    file_systems
+  def append_file_system(%FileSystem.S3{} = file_system) do


Calling it append is a bit misleading now because we don't have any order. Ideally, we would move these functions out of this module, because they are no longer related to config, and I would move all of them to Livebook.FileSystem, such as Livebook.FileSystem.insert, Livebook.FileSystem.all, and so on. But I think we can make this in a future pull request. What are your thoughts?

I had exactly the same thoughts as yours, changing that in the next PR seems reasonable so that this one does not grow to significantly.

I had another idea. We should probably have Livebook.FileSystem.static() (returns from config), .dynamic() (returns from storage), and .all() (both). Then on the settings page, we know that static cannot be added/deleted. But again, for another PR. :)

josevalim · 2022-01-30T18:13:01Z

lib/livebook/config.ex

+
+    storage().insert(:filesystem, generate_filesystem_id(), [{:type, "s3"} | attributes])
+
+    file_systems()
  end

  @doc """
  Removes the given file system from the configured ones.
  """
  @spec remove_file_system(FileSystem.t()) :: list(FileSystem.t())
  def remove_file_system(file_system) do


I would make it so we remove it based on an ID. But we will need to change the UI for this. It can also be done in another PR. :)

Yup, that was intentional as I didn't want to introduce persisting the ids straight away.

josevalim · 2022-01-30T18:14:18Z

I have added a few extra comments. We should fully move away from config but that can be done in another PR. It is your call. :)

Qizot · 2022-01-30T18:30:45Z

I would go for moving all the filesystem configuration away from Config and adjusting it to using ids to be in a separate PR so there are not that many changes.

josevalim · 2022-01-30T18:35:46Z

@Qizot we just need to make the formatter happy and we are good to go!

Qizot · 2022-01-30T18:38:23Z

Whops, some error came in, will fix it in a moment

josevalim · 2022-01-30T18:51:09Z

💚 💙 💜 💛 ❤️

jonatanklosko · 2022-01-31T11:03:21Z

@Qizot fantastic job!

Qizot added 2 commits January 25, 2022 21:53

Added support for optional config backends

361181d

Fixed config env backend

4c575a4

josevalim reviewed Jan 27, 2022

View reviewed changes

Reimplemented livebook config storage concept

49b0e66

Qizot requested a review from josevalim January 30, 2022 13:16

Revert file change

7a41d63