Skip to content

Git workflows and tips

Sam Maurer edited this page Sep 7, 2018 · 13 revisions

Started by Sam Maurer, Sep 2018. Additions welcome.

Intro and setup

Git is a version control system for collaborative coding projects. It's like Dropbox, but with every sync and merge done manually.

"Git" is a command-line app that runs on your computer. "Github" is a website that hosts Git-based projects.

You'll use Git on the command line to:

  • download ("clone") a project hosted on Github
  • select (or "check out") the code branch you want to work with
  • save ("commit") additions and changes to the tracked files
  • upload ("push") your changes to Github
  • download ("pull") remote changes to your computer

You'll use the Github web interface to:

  • create new code branches
  • initiate a merge ("pull request") to bring changes from a working branch into the master branch
  • document and execute the merge
  • run track-changes ("diffs") to compare versions of the code
  • provide feedback, track issues, edit the wiki
  • run automated tests, create versioned releases

Getting started

Git is pre-installed on Mac and Linux (use the Terminal app or equivalent to access the command line). There are various GUI apps -- sometimes useful, but not necessary. (On Mac there's a free one made by Github and a paid one called Git Tower.)

To download a project, first use the command line to navigate to the directory where you want to create a folder for the project. On a Mac, if you drag an icon into the Terminal window it will paste the associated path.

pwd  # display the current directory
cd /absolute/or/relative/path  # change directory (use .. to move up a level)

Then, "clone" a project hosted on Github. This downloads a copy to your computer.

git clone https://github.com/ual/urbansim_parcel_bayarea.git
cd urbansim_parcel_bayarea

(The command line interface is called a shell.)

If Git repeatedly asks for your Github.com login credentials, there's a way to permanently register them (I forget the details; try Googling it).

General workflow

This is a workflow that we've found works well for multi-person projects: Before starting a work session, create a new branch. Periodically save your work to this branch. When a set of work is finished, use the web interface to create a "pull request" (PR) to merge your branch into the master codebase. Use the PR message field to explain and document the material you're adding.

Using pull requests will help avoid and resolve code conflicts, will communicate to others on the team what changes you've made, and will help create a clear record of our work. You can either merge a pull request yourself, or ask a colleague to review it first.

Here are the steps in more detail:

  1. In the web interface, create a new "branch". Give it a short name without spaces. This should correspond to a small or medium-sized task, perhaps a day or two of work.

  2. In the terminal, download ("pull") the remote changes and then activate ("check out") the branch.

    git pull
    git branch  # display the active branch
    git checkout branch-name  # activate a different branch
    

    The visible files on your computer will correspond to the branch you've checked out in Git. The other versions are still there, just hidden.

  3. Add and edit files however you like.

  4. Periodically, "commit" your changes to Git's tracking system. New files have to be added before running the "commit" command. Files that are already being tracked can be included automatically.

    git status  # display list of new and edited files
    git add /path/to/a/new/file
    git commit -a -m "One-line description"  # a = all, m = message
    
  5. "Push" your new commits to Github to make them visible in the web interface.

    git push
    

    Repeat steps 3-5 as you continue working.

  6. When you're ready to merge your work into the "master" codebase, go to the web interface, select your branch, and create a "pull request" (PR).

    Use the message field to explain and document your work. You can either merge the pull request yourself, or ask a colleague to review it first.

  7. Cleanup. After merging the PR, the web interface will give you the option to delete your working branch. This is usually a good idea. Back in the command line, download the remote changes and switch back to another branch.

    git checkout master
    git pull
    

Tips and troubleshooting

Commit messages

If you forget to include a commit message, Git will drop you into a command line text editor to provide one. Often this is Vim, which has an inscrutable interface. Type :q <return> to exit.

If you need to edit the last commit message, ideally before pushing any changes to the web, you can use:

git commit --amend -m "new message"

Untracked files

It's fine to leave some files untracked, like runtime cache files. To prevent them from coming up every time you run "git status", you can create a text file named ".gitignore" and list them in it.

Moving, renaming, or deleting tracked files

Git gets confused when you move, rename, or delete tracked files. But if you do it with Git commands rather than through the filesystem, it's fine:

git mv old-path new-path  # move or rename a tracked file
git rm file-path  # delete a tracked file

Merging branches on the command line

Some people prefer to maintain a permanent personal working branch instead of creating and deleting branches for each task. If you do this, you'll need to manually merge any changes in the master codebase back into your personal branch to keep it up to date:

git checkout master
git pull
git checkout sam-branch
git pull
git merge master  # merges changes from master into sam-branch
git push

Pruning local branches after merging a PR

Deleting a branch on Github will not delete it from your local copy of the repository. If this gets annoying, you can "prune" your local copy.

This takes two steps. First, Git can automatically remove references to remote Github branches that don't exist any more. This doesn't remove the copies that you've worked on locally, though. You can delete them manually.

git remote prune origin  # prune references to remote branches
git branch  # list local branches
git branch -d branch-name  # delete a local branch

Understanding "remotes"

Your local copy of a Git repository uses "remotes" to synchronize with other copies. For example, when you clone a repo from Github, its URL is saved as a remote named "origin".

git remote -v  # list remotes

You can edit this if the Github URL changes, or add additional remotes if you want to do things like merge changes from one fork of a repository into another.

git remote set-url origin https://github.com/acct/repo.git

# add a remote named "upstream"
git remote add upstream https://github.com/acct/repo.git

Resolving merge conflicts

If you initiate a merge on the command line and there are conflicts that Git can't resolve, it will automatically mark up the affected files with plain-text banners showing which lines it's unable to reconcile. Open these files and resolve the conflicts (removing the banners and whatever code you don't want to keep), and then commit the changes. This will complete the merge.

There's a similar process with pull requests initiated in the web interface. Github will warn you if there are conflicts, and prompt you to resolve them in an online text editor.

If you want to back out of a command-line merge that involves unexpected conflicts, use this:

git merge --abort

Rolling back mistakes

There are many different ways to roll back mistakes, but it's often complicated. One of the advantages of not committing directly to the master branch is that you can always copy files to outside of the project directory, create a new branch, and re-implement the changes you want to keep. But here are some other options:

# Discard local changes that haven't been committed yet
git stash
git pull

# Unstage files from pending commit, without losing changes
git reset --soft HEAD

# Discard changes to just one file
git checkout <good-commit-hash> -- <file>  # (leave out the angle brackets)

Jupyter notebooks in Git

Git manages Jupyter notebooks as raw JSON, which makes diffs hard to interpret and commit conflicts hard to resolve. If there are conflicts, it's generally easiest to just choose one file or the other. The Git warning gives you the commands to do that.