Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added databricks.labs.blueprint.paths.WorkspacePath as pathlib.Path equivalent #115

Merged
merged 2 commits into from
Jul 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
221 changes: 221 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,13 @@ Baseline for Databricks Labs projects written in Python. Sources are validated w
* [Databricks Labs Blueprint](#databricks-labs-blueprint)
* [Installation](#installation)
* [Batteries Included](#batteries-included)
* [Python-native `pathlib.Path`-like interfaces](#python-native-pathlibpath-like-interfaces)
* [Working With User Home Folders](#working-with-user-home-folders)
* [Relative File Paths](#relative-file-paths)
* [Browser URLs for Workspace Paths](#browser-urls-for-workspace-paths)
* [`read/write_text()`, `read/write_bytes()`, and `glob()` Methods](#readwrite_text-readwrite_bytes-and-glob-methods)
* [Moving Files](#moving-files)
* [Working With Notebook Sources](#working-with-notebook-sources)
* [Basic Terminal User Interface (TUI) Primitives](#basic-terminal-user-interface-tui-primitives)
* [Simple Text Questions](#simple-text-questions)
* [Confirming Actions](#confirming-actions)
Expand Down Expand Up @@ -70,6 +77,220 @@ pip install databricks-labs-blueprint

This library contains a proven set of building blocks, tested in production through [UCX](https://github.com/databrickslabs/ucx) and projects.

## Python-native `pathlib.Path`-like interfaces

This library exposes subclasses of [`pathlib`](https://docs.python.org/3/library/pathlib.html) from Python's standard
library that work with Databricks Workspace paths. These classes provide a more intuitive and Pythonic way to work
with Databricks Workspace paths than the standard `str` paths. The classes are designed to be drop-in replacements
for `pathlib.Path` and provide additional functionality for working with Databricks Workspace paths.

[[back to top](#databricks-labs-blueprint)]

### Working With User Home Folders

This code initializes a client to interact with a Databricks workspace, creates
a relative workspace path (`~/some-folder/foo/bar/baz`), verifies the path is not absolute, and then demonstrates
that converting this relative path to an absolute path is not implemented and raises an error. Subsequently,
it expands the relative path to the user's home directory and creates the specified directory if it does not
already exist.

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath

name = 'some-folder'
ws = WorkspaceClient()
wsp = WorkspacePath(ws, f"~/{name}/foo/bar/baz")
assert not wsp.is_absolute()

wsp.absolute() # raises NotImplementedError

with_user = wsp.expanduser()
with_user.mkdir()

user_name = ws.current_user.me().user_name
wsp_check = WorkspacePath(ws, f"/Users/{user_name}/{name}/foo/bar/baz")
assert wsp_check.is_dir()

wsp_check.parent.rmdir() # raises BadRequest
wsp_check.parent.rmdir(recursive=True)

assert not wsp_check.exists()
```

[[back to top](#databricks-labs-blueprint)]

### Relative File Paths

This code expands the `~` symbol to the full path of the user's home directory, computes the relative path from this
home directory to the previously created directory (`~/some-folder/foo/bar/baz`), and verifies it matches the expected
relative path (`some-folder/foo/bar/baz`). It then confirms that the expanded path is absolute, checks that
calling `absolute()` on this path returns the path itself, and converts the path to a FUSE-compatible path
format (`/Workspace/username@example.com/some-folder/foo/bar/baz`).

```python
from pathlib import Path
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath

name = 'some-folder'
ws = WorkspaceClient()
wsp = WorkspacePath(ws, f"~/{name}/foo/bar/baz")
with_user = wsp.expanduser()

home = WorkspacePath(ws, "~").expanduser()
relative_name = with_user.relative_to(home)
assert relative_name.as_posix() == f"{name}/foo/bar/baz"

assert with_user.is_absolute()
assert with_user.absolute() == with_user
assert with_user.as_fuse() == Path("/Workspace") / with_user.as_posix()
```

[[back to top](#databricks-labs-blueprint)]

### Browser URLs for Workspace Paths

`as_uri()` method returns a browser-accessible URI for the workspace path. This example retrieves the current user's username
from the Databricks workspace client, constructs a browser-accessible URI for the previously created directory
(~/some-folder/foo/bar/baz) by formatting the host URL and encoding the username, and then verifies that the URI
generated by the with_user path object matches the constructed browser URI:

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath

name = 'some-folder'
ws = WorkspaceClient()
wsp = WorkspacePath(ws, f"~/{name}/foo/bar/baz")
with_user = wsp.expanduser()

user_name = ws.current_user.me().user_name
browser_uri = f'{ws.config.host}#workspace/Users/{user_name.replace("@", "%40")}/{name}/foo/bar/baz'

assert with_user.as_uri() == browser_uri
```

[[back to top](#databricks-labs-blueprint)]

### `read/write_text()`, `read/write_bytes()`, and `glob()` Methods

This code creates a `WorkspacePath` object for the path `~/some-folder/a/b/c`, expands it to the full user path,
and creates the directory along with any necessary parent directories. It then creates a file named `hello.txt` within
this directory, writes "Hello, World!" to it, and verifies the content. The code lists all `.txt` files in the directory
and ensures there is exactly one file, which is `hello.txt`. Finally, it deletes `hello.txt` and confirms that the file
no longer exists.

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath

name = 'some-folder'
ws = WorkspaceClient()
wsp = WorkspacePath(ws, f"~/{name}/a/b/c")
with_user = wsp.expanduser()
with_user.mkdir(parents=True)

hello_txt = with_user / "hello.txt"
hello_txt.write_text("Hello, World!")
assert hello_txt.read_text() == "Hello, World!"

files = list(with_user.glob("**/*.txt"))
assert len(files) == 1
assert hello_txt == files[0]
assert files[0].name == "hello.txt"

with_user.joinpath("hello.txt").unlink()

assert not hello_txt.exists()
```

`read_bytes()` method works as expected:

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath

name = 'some-folder'
ws = WorkspaceClient()

wsp = WorkspacePath(ws, f"~/{name}")
with_user = wsp.expanduser()
with_user.mkdir(parents=True)

hello_bin = with_user.joinpath("hello.bin")
hello_bin.write_bytes(b"Hello, World!")

assert hello_bin.read_bytes() == b"Hello, World!"

with_user.joinpath("hello.bin").unlink()

assert not hello_bin.exists()
```

[[back to top](#databricks-labs-blueprint)]

### Moving Files

This code creates a WorkspacePath object for the path ~/some-folder, expands it to the full user path, and creates
the directory along with any necessary parent directories. It then creates a file named hello.txt within this directory
and writes "Hello, World!" to it. The code then renames the file to hello2.txt, verifies that hello.txt no longer exists,
and checks that the content of hello2.txt is "Hello, World!".

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath

name = 'some-folder'
ws = WorkspaceClient()

wsp = WorkspacePath(ws, f"~/{name}")
with_user = wsp.expanduser()
with_user.mkdir(parents=True)

hello_txt = with_user / "hello.txt"
hello_txt.write_text("Hello, World!")

hello_txt.replace(with_user / "hello2.txt")

assert not hello_txt.exists()
assert (with_user / "hello2.txt").read_text() == "Hello, World!"
```

[[back to top](#databricks-labs-blueprint)]

### Working With Notebook Sources

This code initializes a Databricks WorkspaceClient, creates a WorkspacePath object for the path ~/some-folder, and
defines two items within this folder: a text file (a.txt) and a Python notebook (b). It creates the notebook with
specified content and writes "Hello, World!" to the text file. The code then retrieves all files in the folder, asserts
there are exactly two files, and verifies the suffix and content of each file. Specifically, it checks that a.txt has a
.txt suffix and b has a .py suffix, with the notebook containing the expected code.

```python
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.paths import WorkspacePath

ws = WorkspaceClient()

folder = WorkspacePath(ws, "~/some-folder")

txt_file = folder / "a.txt"
py_notebook = folder / "b" # notebooks have no file extension

make_notebook(path=py_notebook, content="display(spark.range(10))")
txt_file.write_text("Hello, World!")

files = {_.name: _ for _ in folder.glob("**/*")}
assert len(files) == 2

assert files["a.txt"].suffix == ".txt"
assert files["b"].suffix == ".py" # suffix is determined from ObjectInfo
assert files["b"].read_text() == "# Databricks notebook source\ndisplay(spark.range(10))"
```

[[back to top](#databricks-labs-blueprint)]

## Basic Terminal User Interface (TUI) Primitives

Your command-line apps do need testable interactivity, which is provided by `from databricks.labs.blueprint.tui import Prompts`. Here are some examples of it:
Expand Down
Loading
Loading