This repository has been archived by the owner on Jul 3, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
0a92494
commit 309486c
Showing
5 changed files
with
80 additions
and
8 deletions.
There are no files selected for viewing
Binary file added
BIN
+3.87 MB
...amilton, a micro framework for creating dataframes, and its application at Stitch Fix.pdf
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,35 @@ | ||
--- | ||
description: Guidebook coming! We appreciate contributions, as always... | ||
description: Hamilton will force you to organize your code! Here's some tip | ||
--- | ||
|
||
# Code Organization | ||
|
||
Hamilton forces you to put your code into modules that are distinct from where you run your code.  | ||
|
||
You'll soon find that a single python module does not make sense, and so you'll organically start to (very likely) put like functions with like functions, i.e. thus creating domain specific modules --> _use this to your development advantage!_ | ||
|
||
At Stitch Fix we: | ||
|
||
1. Use modules to model team thinking, e.g. date\_features.py. | ||
2. Use modules to helps isolate what you’re working on.  | ||
3. Use modules to replace parts of your Hamilton dataflow very easily for different contexts. | ||
|
||
## Team thinking | ||
|
||
You'll need to curate your modules. We suggest orienting this around how teams think about the business.  | ||
|
||
E.g. marketing spend features should be in the same module, or in separate modules but in the same directory/package. | ||
|
||
This will then make it easy for people to browse the code base and discover what is available.  | ||
|
||
## Helps isolate what you're working on | ||
|
||
Grouping functions into modules then helps set the tone for what you're working on. It helps set the "namespace", if you will, for that function. Thus you can have the same function name used in multiple modules, as long as only one of those modules is imported to build the DAG. | ||
|
||
Thus modules help you create boundaries in your code base to isolate functions that you'll want to change inputs to. | ||
|
||
## Enables you to replace parts of your DAG easily for different contexts | ||
|
||
The names you provide as inputs to functions form a defined "interface", to borrow a computer science term, so if you want to swap/change/augment an input, having a function that would map to it defined in another module(s) provides a lot of flexibility. Rather than having a single module with all functions defined in it, separating the functions into different modules could be a productivity win.  | ||
|
||
Why? That's because when you come to tell Hamilton what functions constitute your dataflow (i.e. DAG), you'll be able to simply replace/add/change the module being passed. So if you want to compute inputs for certain functions differently, this composability of including/excluding modules, when building the DAG provides a lot of flexibility that you can exploit to make your development cycle faster.  |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,45 @@ | ||
--- | ||
description: Guidebook coming! We appreciate contributions, as always... | ||
description: Function Naming is something to focus on | ||
--- | ||
|
||
# Function Naming | ||
|
||
Here are three important points about function naming: | ||
|
||
1. It enables you to define your Hamilton dataflow. | ||
2. It drives collaboration & code reuse. | ||
3. It serves as documentation itself.  | ||
|
||
You don't need to get this right the first time -- search and replace is really easy with Hamilton code bases -- but it is something to converge thinking on! | ||
|
||
## It enables you to define your Hamilton dataflow | ||
|
||
The core of Hamilton is really in how you name your functions. | ||
|
||
Naming something like | ||
|
||
``` | ||
def foo_bar(input1: int, input2: pd.Series) -> pd.Series: | ||
"""docs...""" | ||
... | ||
``` | ||
|
||
`foo_bar` is not helpful - it's unclear what this function produces at all. Remember you want function names to mean something, since that will enable clarity when using Hamilton, what is being requested, and will help document what the function itself is doing. | ||
|
||
## It drives collaboration and reuse | ||
|
||
When people come to encounter your code, they'll need to understand it, add to it, modify it, etc.  | ||
|
||
You'll want to ensure some standardization to enable: | ||
|
||
1. Mapping business concepts to function names. E.g. That will help people to find things in the code that map to things that happen within your business. | ||
2. Ensuring naming uniformity across the code base. People usually follow the precedent of the code around them, so if everything in a particular module for say, date features, has a `D_` prefix, then they will likely follow that naming convention. This is likely something you will iterate on -- and it's best to try to converge on a team naming convention once you have a feel for the Hamilton functions being written by the team. | ||
|
||
We suggest that long functions names that are separated by `_` aren't a bad thing. E.g. if you were to come across a function named `life_time_value` versus `ltv` versus `l_t_v`, which one is more obvious as to what it is and what it represents? | ||
|
||
## It serves as documentation itself  | ||
|
||
Remember your code usually lives a lot longer that you ever think it will. So our suggestion is to always err to the more obvious way of naming to ensure it's clear what a function represents.  | ||
|
||
Again, if you were to come across a function named `life_time_value` versus `ltv` versus `l_t_v`, which one is more obvious as to what it is and what it represents? | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters