Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move starter project into dbt repo #3474

Merged
merged 10 commits into from
Jun 22, 2021
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion core/MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1 +1 @@
recursive-include dbt/include *.py *.sql *.yml *.html *.md
recursive-include dbt/include *.py *.sql *.yml *.html *.md .gitkeep
15 changes: 15 additions & 0 deletions core/dbt/include/starter_project/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Welcome to your new dbt project!

### Using the starter project

Try running the following commands:
- dbt run
- dbt test


### Resources:
- Learn more about dbt [in the docs](https://docs.getdbt.com/docs/introduction)
- Check out [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers
- Join the [chat](http://slack.getdbt.com/) on Slack for live discussions and support
leahwicz marked this conversation as resolved.
Show resolved Hide resolved
- Find [dbt events](https://events.getdbt.com) near you
- Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices
3 changes: 3 additions & 0 deletions core/dbt/include/starter_project/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
import os

PACKAGE_PATH = os.path.dirname(__file__)
Empty file.
Empty file.
38 changes: 38 additions & 0 deletions core/dbt/include/starter_project/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@

# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'my_new_project'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'default'
Comment on lines +5 to +10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love if we could figure out a way to automatically populate name and profile with the project_name argument supplied to dbt init. Right now, that project_name arg is just used to name the file directory, but not the actual project name. Pretty confusing! Also, we tend to discourage using a profile named default, and yet here we are...

I know all we're really doing here is shutil.copytree. Is there any sane way to try editing the files after copying?

For reference, I was just triaging a related issue, so I mentioned this over there as well: #3462 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a lot of expertise with jinja templates :). Perhaps we could make a dbt_project jinja template and substitute the project name. Or just do a python substitution if it's simpler. I guess jinja would only make sense if there are other ways we could leverage it to customize the project file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, you're not wrong. Back in the day, we did a lot of customization for client projects over here: https://github.com/fishtown-analytics/dbt-init

The purpose there was cross-applying some pretty opinionated configs for a given adapter. I think, for our purposes, we shouldn't go much further beyond name and profile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a crack at doing this in another PR to follow this one up. So we want project_name to be the name and profile field in the dbt_project.yml file (just to verify)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right!


# These configurations specify where dbt should look for different types of files.
# The `source-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
source-paths: ["models"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
data-paths: ["data"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
- "target"
- "dbt_modules"
leahwicz marked this conversation as resolved.
Show resolved Hide resolved


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/ directory
# as tables. These settings can be overridden in the individual model files
# using the `{{ config(...) }}` macro.
models:
my_new_project:
# Applies to all files under models/example/
example:
materialized: view
leahwicz marked this conversation as resolved.
Show resolved Hide resolved
Empty file.
21 changes: 21 additions & 0 deletions core/dbt/include/starter_project/models/example/schema.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@

leahwicz marked this conversation as resolved.
Show resolved Hide resolved
version: 2

models:
- name: my_first_dbt_model
description: "A starter dbt model"
columns:
- name: id
description: "The primary key for this table"
tests:
- unique
- not_null

- name: my_second_dbt_model
description: "A starter dbt model"
columns:
- name: id
description: "The primary key for this table"
tests:
- unique
- not_null
Empty file.
Empty file.
22 changes: 10 additions & 12 deletions core/dbt/task/init.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,21 @@
import shutil

import dbt.config
import dbt.clients.git
import dbt.clients.system
from dbt.adapters.factory import load_plugin, get_include_paths
from dbt.exceptions import RuntimeException

from dbt.logger import GLOBAL_LOGGER as logger

from dbt.include.starter_project import PACKAGE_PATH as starter_project_directory

from dbt.task.base import BaseTask

STARTER_REPO = 'https://github.com/fishtown-analytics/dbt-starter-project.git'
STARTER_BRANCH = 'dbt-yml-config-version-2'
DOCS_URL = 'https://docs.getdbt.com/docs/configure-your-profile'

# This file is not needed for the starter project but exists for finding the resource path
IGNORE_FILE = "__init__.py"

ON_COMPLETE_MESSAGE = """
Your new dbt project "{project_name}" was created! If this is your first time
using dbt, you'll need to set up your profiles.yml file (we've created a sample
Expand All @@ -36,14 +38,10 @@


class InitTask(BaseTask):
def clone_starter_repo(self, project_name):
dbt.clients.git.clone(
STARTER_REPO,
cwd='.',
dirname=project_name,
remove_git_dir=True,
revision=STARTER_BRANCH,
)
def copy_starter_repo(self, project_name):
logger.debug("Starter project path: " + starter_project_directory)
shutil.copytree(starter_project_directory, project_name,
ignore=shutil.ignore_patterns(IGNORE_FILE))

def create_profiles_dir(self, profiles_dir):
if not os.path.exists(profiles_dir):
Expand Down Expand Up @@ -98,7 +96,7 @@ def run(self):
project_dir
))

self.clone_starter_repo(project_dir)
self.copy_starter_repo(project_dir)

addendum = self.get_addendum(project_dir, profiles_dir, sample_adapter)
logger.info(addendum)
12 changes: 1 addition & 11 deletions core/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,17 +38,7 @@ def read(fname):
author_email="info@fishtownanalytics.com",
url="https://github.com/fishtown-analytics/dbt",
packages=find_namespace_packages(include=['dbt', 'dbt.*']),
package_data={
'dbt': [
'include/index.html',
'include/global_project/dbt_project.yml',
'include/global_project/docs/*.md',
'include/global_project/macros/*.sql',
'include/global_project/macros/**/*.sql',
'include/global_project/macros/**/**/*.sql',
'py.typed',
]
},
include_package_data = True,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple articles I read said to use include_package_data instead of package_data with the MANIFEST.in file so that's why I made this change. You don't have to list the files then.

Example:
https://newbedev.com/how-include-static-files-to-setuptools-python-package

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like it! We did the same in dbt-spark a few months ago: dbt-labs/dbt-spark#151. To quote a wise person, "This is much easier."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

test_suite='test',
entry_points={
'console_scripts': [
Expand Down