Host: Bristol Mathematics Lecturer: Dr Daniel Lawson

Data Science Toolbox

LIVE COURSE Coursebook index Timetable Home

Jump to Block: (About) 01 02 03 04 05 06 07 08 09 10 11 12

Git, GitHub, GitHub Desktop

Getting to grips with Git is an essential component of the course.

Get a GitHub Account and required software

To get going on GitHub you need to:

  1. Make an account at github.com.
  2. For Windows users:
    • Choose a good text editor; Notepad++ is good, as is Atom. You should not use Notepad (it messes up line endings).
    • Create a working command line, using Git for Windows.
      • There are several options it will ask you about. Select your text editor; use the OpenSSL library and “Checkout as-is, commit Unix-style line endings”, MinTTY, and other defaults.
  3. Install GitHub Desktop. There are many places to look for help, including the very good GitHub official Docs, and Desktop-Specific tutorials and Git background.
  4. Set up an ssh key and add it to your account. This allows passwordless access and is essential for hassle-free operation.

Using the Git/Bash Command Line

Create a Repository for an Assessment

To create an assessment project, one of your team must:

  1. Create a Repository for your project. You can either do this: a. Create from scratch; b. By forking a previous repository, such as the Example Assessment.
  2. Grant read and write access to your group by Inviting Collaborators.
  3. Finally, each Group Member then needs to Add the Repository to their GitHub Desktop.

Working on a Project

To use your GitHub Collaboration Space, you need to:

  1. Make changes to your project, as you would normally.
  2. Press Fetch Origin to get any changes to the repository from your collaborators.
  3. Resolve and conflicts that arise.
  4. Commit your changes to your current branch; typically Master, by selecting Changes->Commit to Master. Remember to give a useful description of the changes.
  5. Push Origin to add your changes to the remote repository for others to fetch.

Working practices for an easy life

You are free to use the full functionality of GitHub. However, to have an easy experience for non-Git experts, I have the following advise:

  1. Always Fetch Origin before starting work! This limits the amount of conflicts.
  2. Don’t worry about branches. They are very useful but require a strong understanding of git.
  3. Instead, structure your work to limit conflicts by: a. Having your own scratch space - A folder given by your name - that only you edit. b. Follow good practice of file naming: use data/raw for raw data that is immutable (won’t change). Use data/processed for transient data, and data/output for immutable outputs, should you have them. c. Where practical, do not commit intermediate content, but instead commit the code that generates that content and have your Group run your code if they want the content. Make this easy for them! d. For your report and for data generation, again minimise conflicts by working on separate files. This will not always be possible so: i) Plan your work and report structure together! ii) Where possible, do this synchronously via e.g. Teams and share your screen. iii) Otherwise agree a messaging platform and let everyone know if you are making a change that affects them.
  4. Resolve conflicts as they occur and before they get out of hand.

Conflicts: When things go wrong

Conflicts are inevitable. Some are easy to resolve with GitHub Desktop. When things go wrong:

  1. Try not to push your edits to master, until you’ve resolved the problem!
  2. Consider if you only need to keep the Origin copy, or the Master. If so, do.
  3. You can always create a backup of your local version, delete the files in question, check them back out, and then merge manually. a. I do this by manually copying the files, checkout <file> to get the Master version of them, then merging with Meld or KDiff3. b. You sometimes need some rather obtuse commands to reset the git repository to the right state.
  4. Google the problem. You are not the first!
  5. A “nuclear” option exists in creating a second, clean copy of the repository, and manually updating that until it is correct.
  6. Some references for issues: a. Atlassian merge conflicts b. GitHub Command-line merge conflicts c. Oh Shit, Git

Deeper Git

There is no need to use GitHub Desktop; for some things you need command line git and it allows you work effectively on a remote server, such as BlueCrystal. There are many resources including:

Some video tutorials:

Command Line Git for Noteable

You can use Git when using Jupyter Notebook via Noteable. To do this you need to become familiar with Command Line Git; see the references in Deeper Git above. The process is:

  1. Start Jupyter Hub by going to the Data Science Toolbox on Blackboard and selecting Noteable.
  2. Select “R with Stan”. This creates an instance that includes Python 3 and R. Press Start.
  3. Select the “New” button from the top right and select “Terminal”.
  4. You are now at a Linux Command Line prompt. You then need to setup GitHub to use your GitHub Account:
    git config --global user.name "Your Name" # Replace with your name
    git config --global user.email "myemail@host.com" # Replace with your GitHub email address
    
  5. Now you need to generate a GitHub Access Token. This is like a limited password. You must give this access token the required permissions. I would recommend:
    • Making a token specifically for Noteable, with a suitable name;
    • Allowing full access to the repo but nothing else;
    • Saving this access token somewhere secure.
  6. You are now ready to clone your project. This is done via the git clone command. We need to know the repository that you want. Go to GitHub`, navigate to your project and select Code. You need to choose HTTPS mode as SSH is not currently supported on Noteable. You can then press the copy to clipboard icon to give you a link, which you can past into the command line. You will take the last part of this to make the following command that additionally contains your username and access token:
    git clone https://username:accesstoken@github.com/projectuser/project.git
    

    for example (with a fake access token):

    git clone https://dsbristol:1234123412341234@github.com/dsbristol/dst_example_project.git
    

    Remember that:

    • username is your own GitHub username;
    • accesstoken is the access token you created for this above;
    • projectuser is the owner of the project;
    • project is the name of the project.
  7. You can now make changes and commit them as normal. For example:
    cd dst_example_project
    emacs README.md ## Make some changes
    git status
    git add -u
    git commit -m "I made some changes from Noteable"
    

Notes :

Git is Evil! Make it go away!

This is a collaborative process and I am always open to suggestions. If you have a good option, discuss it with me.

I can imagine a few options: