Learn Git

  • data-coding
  • learn
  • git
  • summer

Intro

  • It is a standardised way of tracking your code and analyses (plus history thereof).
  • Mrshu says that it's a good idea to learn Git, because it:
    • Helps avoid "versioning hell" (you know, files like essay.doc, essay_v2.doc, essay_final.doc)
    • Gives you the ability to "jump in time"
    • Helps you make your work "reproducible"
    • Makes it a bit more straightforward to work on common (larger) projects with others
  • And don't forget that Git != GitHub != GitLab. Git is the technology that powers GitHub and GitLab who are "web frontends" and business who added things like pull requests.

Learn

  • Start with this article that explains the main principles of Git using a fitting analogy with designing a car.
  • Follow-up with reading this tutorial that sums up the very basis how to work with Git(Hub).
  • For further getting a better grasp on some terms, it's worth going through this one.
  • When it's time to start playing around on some examples, this interactive website is the perfect place for it.
  • If it still doesn't make sense, maybe a visual tutorial good be the way to go?
  • Before pushing a first commit, it might be a good idea to find out how to write a good commit message. Like this one, for example.
  • If there is still anything unclear, Git docs/book will almost certainly have an answer.
  • In case you wish to discover more commands that you might encounter and you are a visual learner this Visual Git Reference might be of interest to you.
  • Still confused about some terms and hungry for some more (sometimes a bit technical) explanation, try How to explain git in simple words?.

Snippets

  • When anything goes wrong, these code snippets come particularly handy; alternatively these from CIA.

  • Make .gitignore ignore the files that were committed before added into the file:

$ git rm -r --cached .
$ git add .
$ git commit -m ".gitignore is now working"
  • Create repo from scratch through this init.sh script, based on this, which can be run through sh init.sh repo-name from directory (or better still ./init.sh repo-name if you run chmod u+x init.sh first):
#!/bin/zsh

mkdir $1
cd $1

if [ "$2" = "--p" ] ; then
    python3 -m venv .venv
	source .venv/bin/activate
fi

git init
gh repo create $1
git pull origin main --rebase
touch README.md
atom ./
  • See your log of changes in a nice format:
    git log --pretty=oneline
    # or
    git log --oneline --decorate --graph --all
    
  • Copy-paste commits from one branch to another:
    git cherry-pick
    
  • Print some nice stats about your changes
    git diff main...origin/your-branch | diffstat -Cm
    git diff master 00aa0157f23f50151f74e4ba203deb8f11621946 . | diffstat -Cm
    
  • Get stats for commits per person per month
    git log --pretty=format:"%h,%aN,%ad" --date=format:'%Y/%m' | awk -F, '{print $2","$3}' | sort | uniq -c | awk '{print $3","$2","$1}' > output.csv
    
  • Save your work for later
    git stash # stash all files
    git stash push -m "message" [file] # stash file with message
    git stash list # check stashed files
    git stash apply stash@{0} # apply the changes
    git stash drop stash@{0} # delete stashes
    
  • Learn from Mrshu here
  • Use Git in Python using GitPython
  • Version-control large datasets, esp. in ML projects, using DVC
  • Merging vs Rebasing
  • Git's data model
  • Git uses hashing via SHA-1 – maps arbitrary-sized inputs to 160-bit outputs (which can be represented as 40 hexadecimal characters, e.g. commit hashes) but no longer unbroken; more info in Learn about cryptographyLearn about cryptography
    Entropy

    Entropy is a measure of randomness




    Hashing functions

    A cryptographic hash function maps data of arbitrary size to a fixed size
    An example of a hash function is SHA-1, which is used in Git references
    At a high level, a hash function can be thought of as a hard-to-invert random-looking (but deterministic) function.
    A hash function has the following properties:

    Deterministic: the same input always generates the same output.
    Non-invertible: it is hard to f...
  • There are many different workflows, i.e. practices to follow when working on big projects.
  • Analyse how a Git repo grows over time
Metadata