Skip to content
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.

Commit

Permalink
GitBook: [#69] No subject
Browse files Browse the repository at this point in the history
  • Loading branch information
skrawcz authored and gitbook-bot committed Feb 21, 2022
1 parent 0a92494 commit 309486c
Show file tree
Hide file tree
Showing 5 changed files with 80 additions and 8 deletions.
Binary file not shown.
31 changes: 30 additions & 1 deletion best-practices/code-organization.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,35 @@
---
description: Guidebook coming! We appreciate contributions, as always...
description: Hamilton will force you to organize your code! Here's some tip
---

# Code Organization

Hamilton forces you to put your code into modules that are distinct from where you run your code. 

You'll soon find that a single python module does not make sense, and so you'll organically start to (very likely) put like functions with like functions, i.e. thus creating domain specific modules --> _use this to your development advantage!_

At Stitch Fix we:

1. Use modules to model team thinking, e.g. date\_features.py.
2. Use modules to helps isolate what you’re working on. 
3. Use modules to replace parts of your Hamilton dataflow very easily for different contexts.

## Team thinking

You'll need to curate your modules. We suggest orienting this around how teams think about the business. 

E.g. marketing spend features should be in the same module, or in separate modules but in the same directory/package.

This will then make it easy for people to browse the code base and discover what is available. 

## Helps isolate what you're working on

Grouping functions into modules then helps set the tone for what you're working on. It helps set the "namespace", if you will, for that function. Thus you can have the same function name used in multiple modules, as long as only one of those modules is imported to build the DAG.

Thus modules help you create boundaries in your code base to isolate functions that you'll want to change inputs to.

## Enables you to replace parts of your DAG easily for different contexts

The names you provide as inputs to functions form a defined "interface", to borrow a computer science term, so if you want to swap/change/augment an input, having a function that would map to it defined in another module(s) provides a lot of flexibility. Rather than having a single module with all functions defined in it, separating the functions into different modules could be a productivity win. 

Why? That's because when you come to tell Hamilton what functions constitute your dataflow (i.e. DAG), you'll be able to simply replace/add/change the module being passed. So if you want to compute inputs for certain functions differently, this composability of including/excluding modules, when building the DAG provides a lot of flexibility that you can exploit to make your development cycle faster. 
41 changes: 40 additions & 1 deletion best-practices/function-naming.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,45 @@
---
description: Guidebook coming! We appreciate contributions, as always...
description: Function Naming is something to focus on
---

# Function Naming

Here are three important points about function naming:

1. It enables you to define your Hamilton dataflow.
2. It drives collaboration & code reuse.
3. It serves as documentation itself. 

You don't need to get this right the first time -- search and replace is really easy with Hamilton code bases -- but it is something to converge thinking on!

## It enables you to define your Hamilton dataflow

The core of Hamilton is really in how you name your functions.

Naming something like

```
def foo_bar(input1: int, input2: pd.Series) -> pd.Series:
"""docs..."""
...
```

`foo_bar` is not helpful - it's unclear what this function produces at all. Remember you want function names to mean something, since that will enable clarity when using Hamilton, what is being requested, and will help document what the function itself is doing.

## It drives collaboration and reuse

When people come to encounter your code, they'll need to understand it, add to it, modify it, etc. 

You'll want to ensure some standardization to enable:

1. Mapping business concepts to function names. E.g. That will help people to find things in the code that map to things that happen within your business.
2. Ensuring naming uniformity across the code base. People usually follow the precedent of the code around them, so if everything in a particular module for say, date features, has a `D_` prefix, then they will likely follow that naming convention. This is likely something you will iterate on -- and it's best to try to converge on a team naming convention once you have a feel for the Hamilton functions being written by the team.

We suggest that long functions names that are separated by `_` aren't a bad thing. E.g. if you were to come across a function named `life_time_value` versus `ltv` versus `l_t_v`, which one is more obvious as to what it is and what it represents?

## It serves as documentation itself 

Remember your code usually lives a lot longer that you ever think it will. So our suggestion is to always err to the more obvious way of naming to ensure it's clear what a function represents. 

Again, if you were to come across a function named `life_time_value` versus `ltv` versus `l_t_v`, which one is more obvious as to what it is and what it represents?

1 change: 1 addition & 0 deletions best-practices/migrating-to-hamilton.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Create a way to easily & frequently compare results.

1. Integrate with continuous integration (CI) system if you can.
2. 🔎🐛 Having a means that tests code early & often will helps diagnose bugs in your old code (most likely) or your new implementation (less likely).
3. Specifically, have a system to compare the output of your Hamilton code, to compare to the output of your existing system. 

![Example CI process that we used at Stitch Fix for migrating to Hamilton](<../.gitbook/assets/Hamilton ApplyMeetup 2022 - migration CI (1).svg>)

Expand Down
15 changes: 9 additions & 6 deletions talks-or-podcasts-or-blogs.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,13 @@ description: This page curates talks, podcasts, and blogs about Hamilton

# Talks | Podcasts | Blogs

| Date | Title | Info/Links |
| ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| February 2022 | \[Open Source] Hamilton, a micro framework for creating dataframes, and its application at Stitch Fix @ [Apply(meetup)](https://www.applyconf.com/agenda/open-source-hamilton-a-micro-framework-for-creating-dataframes-and-its-application-at-stitch-fix/) | 30 minute talk about Hamilton, it's origin, tips on using it, and some exciting extensions & plans. [Youtube](https://www.youtube.com/watch?v=GXqK6HlYG6M\&t=6548s) (starts 1:49 mins in). |
| October 2022 | Functions & DAGs: introducing Hamilton, a microframework for dataframe generation | Blog post on Hamilton origins. [https://multithreaded.stitchfix.com/blog/2021/10/14/functions-dags-hamilton/](https://multithreaded.stitchfix.com/blog/2021/10/14/functions-dags-hamilton/) |
| April 2021 | Hamilton: a Micro Framework for Creating Dataframes @ applyconf | 10 minute lightning talk on Hamilton [https://www.youtube.com/watch?v=B5Zp\_30Knoo](https://www.youtube.com/watch?v=B5Zp\_30Knoo) |
| | | |
| Date | Title | Info/Links |
| ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| February 2022 | \[Open Source] Hamilton, a micro framework for creating dataframes, and its application at Stitch Fix @ [Apply(meetup)](https://www.applyconf.com/agenda/open-source-hamilton-a-micro-framework-for-creating-dataframes-and-its-application-at-stitch-fix/) | 30 minute talk about Hamilton, it's origin, tips on using it, and some exciting extensions & plans. [Youtube](https://www.youtube.com/watch?v=GXqK6HlYG6M\&t=6548s) (starts 1:49 mins in). For slides, see below. |
| October 2022 | Functions & DAGs: introducing Hamilton, a microframework for dataframe generation | Blog post on Hamilton origins. [https://multithreaded.stitchfix.com/blog/2021/10/14/functions-dags-hamilton/](https://multithreaded.stitchfix.com/blog/2021/10/14/functions-dags-hamilton/) |
| April 2021 | Hamilton: a Micro Framework for Creating Dataframes @ applyconf | 10 minute lightning talk on Hamilton [https://www.youtube.com/watch?v=B5Zp\_30Knoo](https://www.youtube.com/watch?v=B5Zp\_30Knoo) |
| | | |

## Slides

{% file src=".gitbook/assets/Public ApplyConf2022 - [Open Source] Hamilton, a micro framework for creating dataframes, and its application at Stitch Fix.pdf" %}

0 comments on commit 309486c

Please sign in to comment.