Introduction to replicable research using version control

Aengus Bridgman

2019-08-30

Introduction and welcome

Agenda for the day

  • Part 1: Introduction to VCS’ and Git
  • Part 2: Working with GitHub
  • Part 3: Collaborating using Git and GitHub
  • Part 4: Tips, tricks, and additional resources

Learning objectives

  • Functional understanding of personal use Git and GitHub
  • The basics of collaboration using Git through GitHub

Part 1: Introduction to Version Control and Git

Learning objectives

  • Get motivated to learn about version control!
  • Learn the stages and workflow of Git
  • Learn the basic commands of Git

Version Control

The concept of version control is quite simple and we all have different systems and ways of doing it:

  • interesting paper v3b.docx
  • interesting paper_prof edits.docx
  • interesting paper_08262018.docx
  • “Send me that Dropbox link”
  • “Let me look through my email for the latest version”

Don’t let this happen to you…

Source: http://phdcomics.com/comics/archive_print.php?comicid=1531

Or this…

Source: https://www.datamation.com/news/tech-comics-version-control-1.html

Or this…

Source: McTavish and Docteur-Penfield Post

Proper version control

In general, VCS help you keep track of how your files change over time, and, if things go south, easily recover. * Version control is like an unlimited undo/redo * Version control also allows many people to work in parallel.

Note: The basic structure of this introductory lesson is from: https://git-scm.com/book/en/v2/

Git

One Distributed Version Control system is called Git.

Git emerged in 2005 out of the Linux development community. It is primarily used for code management in software development, but is also increasingly used by social scientists who collaborate or wish to publicize their data, code, and work more generally.

Git Basics

With Git, every time you commit, or save the state of your project, Git basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot.

If a specific file has not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored. Git treats its data more like a stream of snapshots.

Git States

Git has four main states that your files can reside in: unmodified, modified, staged, and commited:

  • Unmodified
  • Modified means that you have changed the file but have not committed it to your database yet.
  • Staged means that you have marked a modified file in its current version to go into your next commit snapshot.
  • Committed means that the data is safely stored in your local database.

Basic workflow

The basic Git workflow goes something like this:

  • You modify files in your working tree.
  • You stage those changes you want to be part of your commit, which adds those changes to the staging area.
  • You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your Git directory.

Installing Git

Before going further, we need:

The Command Line

The best way to learn how to use Git is through the Command Line.

Using the Command Line lets you communicate directly with your computer and allows you to perform specific tasks through specific commands.

Git config

  • git config ‑‑global user.name “your name”
  • git config ‑‑global user.email “your email”

Download the folder

The cd command

  • Use cd <your directory> to navigate to the folder you downloaded

Git init and status

Git add and commit

Make some changes to your thesis.txt document

Now that we have made some changes

Git log

Git checkout

Basic commands

  • git config
  • git init
  • git add
  • git commit -m “clear message here”
  • git log
  • git checkout

And your best friends:

  • git status
  • git -h
  • Stack Overflow/Google

Break

Quick exercise

  1. Create a new file and add it to your .git directory

  2. Delete everything from your .csv file, save it, and add those changes to your .git directory (and commit them!). Can you retrieve your data?

Part 2: Working with GitHub

Learning objectives

  • Get your GitHub account up and running
  • Setup SSH access to GitHub on your computer
  • Be able to both push and pull repositories from GitHub

GitHub

GitHub is the single largest host for Git repositories, and is the central point of collaboration for millions of developers and projects

Setup a GitHub account

For students you can get the GitHub Student Developer Pack which gives you unlimited private repositories and numerous other perks.

Generating SSH key

https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/

Add key to ssh-agent

Windows: https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/#platform-windows

Mac: https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/#platform-mac

Let’s put our local project on github - 1

Let’s put our local project on github - 2

Sometimes entire folders get deleted accidentally

Source: T. Russell Harris

Cloning an existing remote project

  • Fortunately we now have a remote back-up

Review

  • Make some changes to your files, stage them, and then commit
  • What does git status tell you?

We can now push those changes to the remote repository

You can also clone others’ projects

Instead we use fork

Go ahead and fork the mock_thesis repository

Break

Quick exercise

  • Clone into your forked version of the mock_thesis project
  • Make a few changes and push them to GitHub

Part 3: Collaborating on Projects

Learning objectives

  • Be able to work in small teams on a single project on a single branch
  • Be able to troubleshoot basic problems

Find a team!

  • Get into small teams of 3 (4 if you must)
  • Choose one person’s mock_thesis github project to work on

Add your teamates as collaborators

Time to work on a project together

  • Clone into the new team repository
  • One person make some changes, add, commit, and push

Updating your local repository

At this point, one member of the team and the remote repository have the most up-to-date version, but the others do not. To get the changes, you use the git pull command. Make sure you are in the correct working directory!

Important git pull advice

  • You should be using git pull before you make changes if you think someone else may have modified the original file – after all you want the most recent version to work on.
  • Also remember the first law of version control systems: a VCS is not a replacement for communication (aka Git and GitHub are not magicial)

Pull and check what edits were made

Unsure about the edits being made?

  • One person make some changes, add, commit, and push

Pre-check the edits

Make changes to different files

  • Each team member change a different file
  • Add, commit, push
  • What happens?

Workflow 1: Changes to different files

Make minor changes to same file (different line)

  • Each team member change a different line of the file
  • Add, commit, push
  • What happens?

Workflow 2: Automatic merges

Make major changes to same file

  • Make some big changes - move things around
  • Add, commit, push
  • Try to use the same pull for automatic merges - what happens?

Workflow 3a: Sometimes the auto-merge fails…

Workflow 3b: Sometimes the auto-merge fails…

Part 4: Troubleshooting, tips and tricks, and additional resources

Learning objectives

  • Get comfortable reverting branches and specific files
  • Get an overview of some tips and tricks and additional resources

Exercises (15-20 minutes)

  • Choose one of your projects and initiate a git repository
  • Upload to GitHub (it will be public)
  • Share your repo with a neighbour
  • Both make some changes; start cooperation but then go rogue.

Remember:

  • Use git pull before you start working
  • Be sure to commit local and then pull before pushing

A few things to try

  • Delete all your new friends files (ahahaha)
  • Edit a file and reorganize the folder - can you revert the folder changes but keep the file?
  • Try and overwrite the edits the other has made (malicious merge)
  • Have the owner delete the remote and local folder - have the new friend help them recover both a local and remote version
  • Make multiple commits to multiple different files and then simultaneously push - try and sort it out

If you get stuck that is fine - we will go over some ways to fix things.

How bad did it get?

Revert to a previous commit

  • git reflog
  • git revert HEAD@{index}

Or if you just want a single file

  • git diff <commit hash> <filename>
  • git checkout <commit hash> – <filename>

Just make sure to stage and commit your changes.

And as a last resort

There is no shame in sometimes going nuclear:

  • cd ..
  • rmdir <messed-up-folder>
  • git clone <thank-goodness-I-had-backup.git>

Remember that the remote repo keeps the .git history so you can revert at any time!

Useful commands

  • git help everyday
  • git help -g
  • git tag

GitHub Desktop

Academic website hosting on GitHub

It is pretty straightforward to set up an personal website with the address <username>.github.io. Hosting is free and you can host presentations, resources, etc. such as this one.

.gitignore

You can place a .gitignore file into your local repository if you want git to consistenty ignore a certain set of files or paths.

Windows users see: https://stackoverflow.com/questions/10744305/how-to-create-gitignore-file/34995806

R, RStudio, and Markdown

  • RStudio GitHub workflow
  • Markdown documents

Issues and assignment

  • See demo

Forking and pull requests

  • See demo

Multiple local branches