Intro
- It is a standardised way of tracking your code and analyses (plus history thereof).
- Mrshu says that it's a good idea to learn Git, because it:
- Helps avoid "versioning hell" (you know, files like
essay.doc
,essay_v2.doc
,essay_final.doc
) - Gives you the ability to "jump in time"
- Helps you make your work "reproducible"
- Makes it a bit more straightforward to work on common (larger) projects with others
- Helps avoid "versioning hell" (you know, files like
- And don't forget that
Git != GitHub != GitLab
. Git is the technology that powers GitHub and GitLab who are "web frontends" and business who added things like pull requests.
Learn
- Start with this article that explains the main principles of Git using a fitting analogy with designing a car.
- Follow-up with reading this tutorial that sums up the very basis how to work with Git(Hub).
- For further getting a better grasp on some terms, it's worth going through this one.
- When it's time to start playing around on some examples, this interactive website is the perfect place for it.
- If it still doesn't make sense, maybe a visual tutorial good be the way to go?
- Before pushing a first commit, it might be a good idea to find out how to write a good commit message. Like this one, for example.
- If there is still anything unclear, Git docs/book will almost certainly have an answer.
- In case you wish to discover more commands that you might encounter and you are a visual learner this Visual Git Reference might be of interest to you.
- Still confused about some terms and hungry for some more (sometimes a bit technical) explanation, try How to explain git in simple words?.
Snippets
-
When anything goes wrong, these code snippets come particularly handy; alternatively these from CIA.
- Make
.gitignore
ignore the files that were committed before added into the file:$ git rm -r --cached . $ git add . $ git commit -m ".gitignore is now working"
- Create repo from scratch through this
init.sh
script, based on this, which can be run throughsh init.sh repo-name
from directory (or better still./init.sh repo-name
if you runchmod u+x init.sh
first):#!/bin/zsh mkdir $1 cd $1 if [ "$2" = "--p" ] ; then python3 -m venv .venv source .venv/bin/activate fi git init gh repo create $1 git pull origin main --rebase touch README.md atom ./
- See your log of changes in a nice format:
git log --pretty=oneline # or git log --oneline --decorate --graph --all
- Copy-paste commits from one branch to another:
git cherry-pick
- Print some nice stats about your changes
git diff main...origin/your-branch | diffstat -Cm git diff master 00aa0157f23f50151f74e4ba203deb8f11621946 . | diffstat -Cm
- Get stats for commits per person per month
git log --pretty=format:"%h,%aN,%ad" --date=format:'%Y/%m' | awk -F, '{print $2","$3}' | sort | uniq -c | awk '{print $3","$2","$1}' > output.csv
- Save your work for later
git stash # stash all files git stash push -m "message" [file] # stash file with message git stash list # check stashed files git stash apply stash@{0} # apply the changes git stash drop stash@{0} # delete stashes
- Find a
bad
commit when there is a bug, more info heregit bisect start git bisect bad HEAD git bisect good xxxxxxx git bisect good|bad git bisect reset
Links
- Learn from Mrshu here
- Use
lazygit
for simple terminal UI for git commands - Use Git in Python using
GitPython
- Version-control large datasets, esp. in ML projects, using DVC
- Merging vs Rebasing
- Git's data model
- Git uses hashing via SHA-1 – maps arbitrary-sized inputs to 160-bit outputs (which can be represented as 40 hexadecimal characters, e.g. commit hashes) but no longer unbroken; more info in Learn about cryptographyLearn about cryptography
Entropy
Entropy is a measure of randomness
Hashing functions
A cryptographic hash function maps data of arbitrary size to a fixed size
An example of a hash function is SHA-1, which is used in Git references
At a high level, a hash function can be thought of as a hard-to-invert random-looking (but deterministic) function.
A hash function has the following properties:
Deterministic: the same input always generates the same output.
Non-invertible: it is hard to f... - There are many different workflows, i.e. practices to follow when working on big projects.
- Analyse how a Git repo grows over time
- You can also store your private data inside a git repo