Skip to content

Commit

Permalink
chore: fix typos
Browse files Browse the repository at this point in the history
  • Loading branch information
e-kwsm committed Oct 26, 2023
1 parent 9d23055 commit e587597
Show file tree
Hide file tree
Showing 11 changed files with 18 additions and 18 deletions.
4 changes: 2 additions & 2 deletions docs/getting_started/group_builder.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Group Builder

Another advanced template in `maggma` is the `GroupBuilder`, which groups documents together before applying your function on the group of items. Just like `MapBuilder`, `GroupBuilder` also handles incremental building, keeping track of errors, getting only the data you need, and managing timeouts. GroupBuilder won't delete orphaned documents since that reverse relationshop isn't valid.
Another advanced template in `maggma` is the `GroupBuilder`, which groups documents together before applying your function on the group of items. Just like `MapBuilder`, `GroupBuilder` also handles incremental building, keeping track of errors, getting only the data you need, and managing timeouts. GroupBuilder won't delete orphaned documents since that reverse relationship isn't valid.

Let's create a simple `ResupplyBuilder`, which will look at the inventory of items and determine what items need resupply. The source document will look something like this:

Expand Down Expand Up @@ -65,7 +65,7 @@ Note that unlike the previous `MapBuilder` example, we didn't call the source an
- store_process_timeout: adds the process time into the target document for profiling
- retry_failed: retries running the process function on previously failed documents

One parameter that doesn't work in `GroupBuilder` is `delete_orphans`, since the Many-to-One relationshop makes determining orphaned documents very difficult.
One parameter that doesn't work in `GroupBuilder` is `delete_orphans`, since the Many-to-One relationship makes determining orphaned documents very difficult.

Finally let's get to the hard part which is running our function. We do this by defining `unary_function`

Expand Down
4 changes: 2 additions & 2 deletions docs/getting_started/running_builders.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ There are progress bars for each of the three steps, which lets you understand w
`maggma` can distribute work across multiple computers. There are two steps to this:
1. Run a `mrun` manager by providing it with a `--url` to listen for workers on and `--num-chunks`(`-N`) which tells `mrun` how many sub-pieces to break up the work into. You can can run fewer workers then chunks. This will cause `mrun` to call the builder's `prechunk` to get the distribution of work and run distributd work on all workers
1. Run a `mrun` manager by providing it with a `--url` to listen for workers on and `--num-chunks`(`-N`) which tells `mrun` how many sub-pieces to break up the work into. You can can run fewer workers then chunks. This will cause `mrun` to call the builder's `prechunk` to get the distribution of work and run distributed work on all workers
2. Run `mrun` workers b y providing it with a `--url` to listen for a manager and `--num-workers` (`-n`) to tell it how many processes to run in this worker.

The `url` argument takes a fully qualified url including protocol. `tcp` is recommended:
Expand Down Expand Up @@ -112,7 +112,7 @@ mrun -n 32 -vv my_first_builder.json builder_2_and_3.py last_builder.ipynb

## Reporting Build State

`mrun` has the ability to report the status of the build pipeline to a user-provided `Store`. To do this, you first have to save the `Store` as a JSON or YAML file. Then you can use the `-r` option to give this to `mrun`. It will then periodicially add documents to the `Store` for one of 3 different events:
`mrun` has the ability to report the status of the build pipeline to a user-provided `Store`. To do this, you first have to save the `Store` as a JSON or YAML file. Then you can use the `-r` option to give this to `mrun`. It will then periodically add documents to the `Store` for one of 3 different events:

* `BUILD_STARTED` - This event tells us that a new builder started, the names of the `sources` and `targets` as well as the `total` number of items the builder expects to process
* `UPDATE` - This event tells us that a batch of items was processed and is going to `update_targets`. The number of items is stored in `items`.
Expand Down
2 changes: 1 addition & 1 deletion docs/getting_started/simple_builder.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ Calling the parent class `__init__` is a good practice as sub-classing builders

## `get_items`

`get_items` is conceptually a simple method to implement, but in practice can easily be more code than the rest of the builder. All of the logic for getting data from the sources has to happen here, which requires some planning. `get_items` should also sort all of the data into induvidual **items** to process. This simple builder has a very easy `get_items`:
`get_items` is conceptually a simple method to implement, but in practice can easily be more code than the rest of the builder. All of the logic for getting data from the sources has to happen here, which requires some planning. `get_items` should also sort all of the data into individual **items** to process. This simple builder has a very easy `get_items`:

``` python

Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
name="maggma",
use_scm_version=True,
setup_requires=["setuptools_scm"],
description="Framework to develop datapipelines from files on disk to full dissemenation API",
description="Framework to develop datapipelines from files on disk to full dissemination API",
long_description=long_desc,
long_description_content_type="text/markdown",
url="https://github.com/materialsproject/maggma",
Expand Down
2 changes: 1 addition & 1 deletion src/maggma/api/query_operator/pagination.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@


class PaginationQuery(QueryOperator):
"""Query opertators to provides Pagination"""
"""Query operators to provides Pagination"""

def __init__(self, default_limit: int = 100, max_limit: int = 1000):
"""
Expand Down
4 changes: 2 additions & 2 deletions src/maggma/api/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ def attach_signature(function: Callable, defaults: Dict, annotations: Dict):
Args:
function: callable function to attach the signature to
defaults: dictionary of parameters -> default values
annotations: dictionary of type annoations for the parameters
annotations: dictionary of type annotations for the parameters
"""

required_params = [
Expand Down Expand Up @@ -167,7 +167,7 @@ def validate_monty(cls, v, _):

if len(errors) > 0:
raise ValueError(
"Missing Monty seriailzation fields in dictionary: {errors}"
"Missing Monty serialization fields in dictionary: {errors}"
)

return v
Expand Down
2 changes: 1 addition & 1 deletion src/maggma/builders/group_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ def ensure_indexes(self):

def prechunk(self, number_splits: int) -> Iterator[Dict]:
"""
Generic prechunk for group builder to perform domain-decompostion
Generic prechunk for group builder to perform domain-decomposition
by the grouping keys
"""
self.ensure_indexes()
Expand Down
2 changes: 1 addition & 1 deletion src/maggma/builders/map_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ def ensure_indexes(self):

def prechunk(self, number_splits: int) -> Iterator[Dict]:
"""
Generic prechunk for map builder to perform domain-decompostion
Generic prechunk for map builder to perform domain-decomposition
by the key field
"""
self.ensure_indexes()
Expand Down
6 changes: 3 additions & 3 deletions src/maggma/core/builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,19 +54,19 @@ def connect(self):
def prechunk(self, number_splits: int) -> Iterable[Dict]:
"""
Part of a domain-decomposition paradigm to allow the builder to operate on
multiple nodes by divinding up the IO as well as the compute
multiple nodes by dividing up the IO as well as the compute
This function should return an iterator of dictionaries that can be distributed
to multiple instances of the builder to get/process/update on
Arguments:
number_splits: The number of groups to split the documents to work on
"""
self.logger.info(
f"{self.__class__.__name__} doesn't have distributed processing capabillities."
f"{self.__class__.__name__} doesn't have distributed processing capabilities."
" Instead this builder will run on just one worker for all processing"
)
raise NotImplementedError(
f"{self.__class__.__name__} doesn't have distributed processing capabillities."
f"{self.__class__.__name__} doesn't have distributed processing capabilities."
" Instead this builder will run on just one worker for all processing"
)

Expand Down
2 changes: 1 addition & 1 deletion src/maggma/stores/aws.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ def query(
yield {p: doc[p] for p in properties if p in doc}
else:
try:
# TODO: THis is ugly and unsafe, do some real checking before pulling data
# TODO: This is ugly and unsafe, do some real checking before pulling data
data = self.s3_bucket.Object(self.sub_dir + str(doc[self.key])).get()["Body"].read()
except botocore.exceptions.ClientError as e:
# If a client error is thrown, then check that it was a NoSuchKey or NoSuchBucket error.
Expand Down
6 changes: 3 additions & 3 deletions src/maggma/stores/gridfs.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@

class GridFSStore(Store):
"""
A Store for GrdiFS backend. Provides a common access method consistent with other stores
A Store for GridFS backend. Provides a common access method consistent with other stores
"""

def __init__(
Expand All @@ -58,7 +58,7 @@ def __init__(
**kwargs,
):
"""
Initializes a GrdiFS Store for binary data
Initializes a GridFS Store for binary data
Args:
database: database name
collection_name: The name of the collection.
Expand Down Expand Up @@ -447,7 +447,7 @@ def __init__(
**kwargs,
):
"""
Initializes a GrdiFS Store for binary data
Initializes a GridFS Store for binary data
Args:
uri: MongoDB+SRV URI
database: database to connect to
Expand Down

0 comments on commit e587597

Please sign in to comment.