Jump to Block: (About) 01 02 03 04 05 06 07 08 09 10 11 12 (Assessments)
Git, GitHub, GitHub Desktop
- Git is a Version Management system that is integral to modern data science. It works using your local `repository’ which it can synchronise with a remote repository.
- Github is one choice of server to store your remote repository. Most services allow you to a) distribute code, and b) share development within a team. We will use Github because it is also allows for some user friendly interfaces, specifically:
- Github Desktop is an easy to use solution for working with Git that removes some of the technical requirements for understanding.
Getting to grips with Git is an essential component of the course.
Get a GitHub Account and required software
To get going on GitHub you need to:
- Make an account at github.com.
- For Windows users:
- Choose a good text editor; Notepad++ is good, as is Atom. You should not use Notepad (it messes up line endings).
- Create a working command line, using Git for Windows.
- There are several options it will ask you about. Select your text editor; use the OpenSSL library and “Checkout as-is, commit Unix-style line endings”, MinTTY, and other defaults.
- Install GitHub Desktop. There are many places to look for help, including the very good GitHub official Docs, and Desktop-Specific tutorials and Git background.
- Set up an ssh key and add it to your account. This allows passwordless access and is essential for hassle-free operation.
Using the Git/Bash Command Line
- Git for Windows installs a pretty good command line, based on Bash. It provides a similar experience to Bash for Linux or Mac, and is well integrated with git.
- There are many tutorials for bash. You can find one yourself, but the UoB Intro to Command Line is very good.
- The important commands are:
ls
for list directory. I likels -lhrt
which lists files in long format (l), human readable sizes (s), reverse order (r) and by most recent edit (t).cd
for change directory. For examplecd ~/.ssh
changes to the.ssh
directory inside your home (which is what the tilde at the start means).
Create a Repository for an Assessment
To create an assessment project, one of your team must:
- Create a Repository for your project. You can either do this: a. Create from scratch; b. By forking a previous repository, such as the Example Assessment.
- Grant read and write access to your group by Inviting Collaborators.
- Finally, each Group Member then needs to Add the Repository to their GitHub Desktop.
Working on a Project
To use your GitHub Collaboration Space, you need to:
- Make changes to your project, as you would normally.
- Press
Fetch Origin
to get any changes to the repository from your collaborators. - Resolve and conflicts that arise.
- Commit your changes to your current branch; typically
Master
, by selecting Changes->Commit to Master. Remember to give a useful description of the changes. Push Origin
to add your changes to the remote repository for others to fetch.
Working practices for an easy life
You are free to use the full functionality of GitHub. However, to have an easy experience for non-Git experts, I have the following advise:
- Always
Fetch Origin
before starting work! This limits the amount of conflicts. - Don’t worry about branches. They are very useful but require a strong understanding of git.
- Instead, structure your work to limit conflicts by:
a. Having your own
scratch space
- A folder given by your name - that only you edit. b. Follow good practice of file naming: usedata/raw
for raw data that is immutable (won’t change). Usedata/processed
for transient data, anddata/output
for immutable outputs, should you have them. c. Where practical, do not commit intermediate content, but instead commit the code that generates that content and have your Group run your code if they want the content. Make this easy for them! d. For your report and for data generation, again minimise conflicts by working on separate files. This will not always be possible so: i) Plan your work and report structure together! ii) Where possible, do this synchronously via e.g. Teams and share your screen. iii) Otherwise agree a messaging platform and let everyone know if you are making a change that affects them. - Resolve conflicts as they occur and before they get out of hand.
Conflicts: When things go wrong
Conflicts are inevitable. Some are easy to resolve with GitHub Desktop. When things go wrong:
- Try not to push your edits to master, until you’ve resolved the problem!
- Consider if you only need to keep the
Origin
copy, or theMaster
. If so, do. - You can always create a backup of your local version, delete the files in question, check them back out, and then merge manually.
a. I do this by manually copying the files,
checkout <file>
to get theMaster
version of them, then merging with Meld or KDiff3. b. You sometimes need some rather obtuse commands to reset the git repository to the right state. - Google the problem. You are not the first!
- A “nuclear” option exists in creating a second, clean copy of the repository, and manually updating that until it is correct.
- Some references for issues: a. Atlassian merge conflicts b. GitHub Command-line merge conflicts c. Oh Shit, Git
Deeper Git
There is no need to use GitHub Desktop; for some things you need command line git and it allows you work effectively on a remote server, such as BlueCrystal. There are many resources including:
- Chrys Woods’ Git Tutorial - Chrys provides this tutorial to students and staff at the University of Bristol. I highly recommend going through it to understand Git more deeply.
- He also covers more advanced usage in Python and Data.
- swcarpentry is another excellent resource with descriptions at several layers of complexity.
- Which notes that Rstudio has integrated Git functionality. You are welcome to use this, but don’t become reliant as it will not help you for Python usage.
Some video tutorials:
Command Line Git for Noteable
You can use Git when using Jupyter Notebook via Noteable. To do this you need to become familiar with Command Line Git; see the references in Deeper Git above. The process is:
- Start Jupyter Hub by going to the Data Science Toolbox on Blackboard and selecting Noteable.
- Select “R with Stan”. This creates an instance that includes Python 3 and R. Press Start.
- Select the “New” button from the top right and select “Terminal”.
- You are now at a Linux Command Line prompt. You then need to setup GitHub to use your GitHub Account:
git config --global user.name "Your Name" # Replace with your name git config --global user.email "myemail@host.com" # Replace with your GitHub email address
- Now you need to generate a GitHub Access Token. This is like a limited password. You must give this access token the required permissions. I would recommend:
- Making a token specifically for Noteable, with a suitable name;
- Allowing full access to the repo but nothing else;
- Saving this access token somewhere secure.
- You are now ready to clone your project. This is done via the
git clone
command. We need to know the repository that you want. Go to GitHub`, navigate to your project and select Code. You need to choose HTTPS mode as SSH is not currently supported on Noteable. You can then press the copy to clipboard icon to give you a link, which you can past into the command line. You will take the last part of this to make the following command that additionally contains your username and access token:git clone https://username:accesstoken@github.com/projectuser/project.git
for example (with a fake access token):
git clone https://dsbristol:1234123412341234@github.com/dsbristol/dst_example_project.git
Remember that:
username
is your own GitHub username;accesstoken
is the access token you created for this above;projectuser
is the owner of the project;project
is the name of the project.
- You can now make changes and commit them as normal. For example:
cd dst_example_project emacs README.md ## Make some changes git status git add -u git commit -m "I made some changes from Noteable"
Notes :
- Noteable should be secure enough to store an access token, though some might have concerns about security. If you are worried, you can instead ask it to cache your password:
git config --global credential.helper cache
- It is possible to modify a Git repository you already created to use your details, a new password, etc. You do this by editing the
.git/config
file in your repository. There are many online resources explaining this.
Git is Evil! Make it go away!
This is a collaborative process and I am always open to suggestions. If you have a good option, discuss it with me.
I can imagine a few options:
- OneDrive/GoogleDrive/Dropbox: None of these are designed for co-creation of content alone. You will have a miserable experience.
- Google Colab or alternatives, specifically that let all collaborators edit the same document, live. All of these options should support dumping your project into a GitHub Repository, so feel free to try.
- We expect a better collaboration tool to be provided by the University in the future.