Version Control

Tuesday, May 6

Today we will…

  • Open-Ended Analysis from Lab 4
  • Questions about Midterm?
  • New Material
    • git/GitHub
    • Connect GitHub to RStudio
  • PA 6: Merge Conflicts – Collaborating within a GitHub Repo

Open-Ended Analysis from Lab 4

Research Question

  • Keep it general
  • We are doing an exploratory analysis - not statistical testing

Written Description

  • Do not use variable names or R function names in your written text.
  • Breaking up sections with headers can help with organization and flow.
  • Do not print out the data!!!

Open-Ended Analysis from Lab 4

Discussing Data

  • What would you need to tell something if they knew nothing about the data already?? Probably should include:
    • Data source
    • Observational unit / level (e.g. county and year)
    • Overview of what is included (e.g. demographic information and weekly median childcare costs for each county and year)
    • Years or geographies included (e.g. 2008-2018, CA only)

Open-Ended Analysis from Lab 4

Table Design

  • Think about the number of rows/columns – is it readable?
  • How many decimal points are needed?
  • Change row/column names to be understandable.

Plot Design

  • What can I investigate with a plot that is difficult with a table?
  • What type of plot will best display the data?
  • What order of elements will best display the comparison you want to make?
  • Think about: colors, order of categories, if a legend is needed, etc.

Example: Answering a Research Question

Example: Answering a Research Question

Describing Tables

Code
ca_childcare_long <- ca_childcare |> 
  select(county_name, study_year, mc_infant:mfcc_preschool,
         total_property_taxes, total_pop, me_2018) |>
  pivot_longer(cols = starts_with("mc") | starts_with("mfcc"),
               names_to = c("type", "age_group"),
               names_sep = "_",
               values_to = "med_cost") |> 
  filter(!is.na(total_property_taxes)) |> 
  mutate(tax_per_cap = total_property_taxes / total_pop,
         wealth_level = case_when(tax_per_cap <= quantile(tax_per_cap, .25) ~ "Lower 1/4",
                                  tax_per_cap <= quantile(tax_per_cap, .75) ~ "Middle Half",
                                                                       TRUE ~ "Upper 1/4"))

ca_childcare_long |> 
  group_by(wealth_level, type, age_group) |> 
  summarize(mean_cost = mean(med_cost)) |> 
  pivot_wider(values_from = mean_cost,
              names_from = type) |> 
  filter(age_group != "preschool") |> 
  mutate(age_group = str_to_title(age_group),
         perc_dif = mc /mfcc) |> 
  rename(price_center = mc,
         price_family = mfcc) |> 
  kable(digits = 2)
wealth_level age_group price_center price_family perc_dif
Lower 1/4 Infant 254.67 158.89 1.60
Lower 1/4 Toddler 184.26 148.55 1.24
Middle Half Infant 272.55 172.92 1.58
Middle Half Toddler 194.56 159.85 1.22
Upper 1/4 Infant 276.37 182.78 1.51
Upper 1/4 Toddler 196.26 170.12 1.15

Can you tell??

Without looking at the code, what does each cell in this table represent?

Describing Plots

Code
ca_childcare_long |> 
  filter(age_group != "preschool") |> 
  mutate(age_group = str_to_title(age_group),
         type = fct_recode(type,
                           "Center-Based" = "mc",
                           "Family-Based" = "mfcc")) |> 
  group_by(wealth_level, age_group, type, study_year) |> 
  summarize(upper_q = quantile(med_cost, .75),
            lower_q = quantile(med_cost, .25),
            med_cost = median(med_cost),
            ) |> 
  ggplot(aes(x = study_year, y = med_cost,
             group = interaction(type, wealth_level))) +
  geom_ribbon(aes(ymin = lower_q, ymax = upper_q, fill = type),
                alpha = .15,
                linetype = 0) +
  geom_line(aes(color = type)) +
  geom_point(aes(shape = fct_reorder2(wealth_level, 
                                      .x = study_year,
                                      .y = med_cost),
                color = type)) +
  facet_wrap(vars(age_group)) +
  scale_color_manual(name = "Childcare Type",
                     values = c("#045a8d","#fd8d3c")) +
  scale_fill_manual(name = "Childcare Type",
                     values = c("#045a8d","#fd8d3c")) +
  labs(subtitle = "Median Weekly Cost ($)",
       y = "",
       x = "Year",
       shape = "County Wealth") +
  scale_x_continuous(breaks = c(2008, 2010, 2012, 2014, 2016, 2018))

Describing Plots

Code
ca_childcare_long |> 
  filter(age_group != "preschool") |> 
  mutate(age_group = str_to_title(age_group),
         type = fct_recode(type,
                           "Center-Based" = "mc",
                           "Family-Based" = "mfcc")) |> 
  mutate(week_income = me_2018 / 52,
         cost_income_rat = med_cost / week_income) |> 
  filter(study_year == 2018) |> 
  ggplot(aes(x = type,
             y = cost_income_rat,
             fill = wealth_level)) +
  geom_boxplot() +
  facet_wrap(vars(age_group)) +
  labs(x = "Childcare Type",
       subtitle = "Ratio of Median Weekly Childcare Cost to\nMedian Weekly Income",
       y = "",
       fill = "County Tax Income")

Open-Ended Analysis from Lab 4

Discussing Figures

  • Describe / explain what your table / plot is showing before analyzing it
  • If you are taking a statistical summary you should be clear what the summary is taken over
  • Make it clear what one point on your plot represents

Questions about Midterm Exam?

Version Control

What is version control?

A process of tracking changes to a file or set of files over time so that you can recall specific versions later.

git/GitHub Basics

Git vs GitHub

  • A system for version control that manages a collection of files in a structured way.
  • Uses the command line or a GUI.
  • Git is local.

Git vs GitHub

  • A system for version control that manages a collection of files in a structured way.
  • Uses the command line or a GUI.
  • Git is local.

  • A cloud-based service that lets you use git across many computers.
  • Basic services are free, advanced services are paid (like RStudio!).
  • GitHub is remote.

Why Learn GitHub?

  1. GitHub provides a structured way for tracking changes to files over the course of a project.
  • Think Google Docs or Dropbox history, but more structured and powerful!
  1. Share your work transparently!
  2. Designed for programming collaboration and project management.
  3. You can host a URL of fun things (like the class text, these slides, a personal website, etc.) with GitHub pages.

In this class…

  • We just want to introduce you to the basics of Git and GitHub
  • There is a lot more cool functionality in both!
  • Come chat with me if you want to learn more

Git Repositories

Git is based on repositories.

  • Think of a repository (repo) as a directory (folder) for a single project.
    • This directory will likely contain code, documentation, data, to do lists, etc. associated with the project.
    • You can link a local repo with a remote copy on GitHub.

  • To create a repository, you can start with your local computer or you can start with the remote copy.

Git Repositories

What does Git do?? (very basics)

By default:

  • Git tracks changes in any documents in a given repo
  • Git records any changes to lines in the document since the last version of the document was saved (committed)

You need to:

  • Tell Git if there are files you don’t want it to track (.gitignore)
  • Tell Git when to save changes to the repo (commit)
    • Once you do that, you can always look back on your previous versions and changes!

.gitignore

Sometimes there are files that you do not want to track.

  • A .gitignore file specifies the files that git should intentionally ignore.
    • Note that annoyingly a .gitignore is an “invisible file” in many file browsers
  • Often you want to ignore machine generated files (e.g., /bin, .DS_Store) or files/directories that you do not want to be shared (e.g., solutions/).
  • We want to ignore .Rproj files!

.gitignore example

Dr. C’s .gitignore for her STAT 331 materials repo

Actions in Git

Cloning a Repo


Create an exact copy of a remote repo on your local machine.

Committing Changes

Tell git you have made changes you want to add (save) to the repo.

  • Also provide a commit message – a short label describing what the changes are and why they exist.

The red line is a change we commit (add) to the repo.

The log of these changes (and the file history) is called your git commit history.

  • You can always go back to old copies!

Commit Tips

  • Use short, but informative commit messages.
  • Commit small blocks of changes – commit every time you accomplish a small task.
    • You’ll have a set of bite-sized changes (with description) to serve as a record of what you’ve done.
  • With frequent commits its easier to
    • find the issue when you mess up!
    • read back through what you changed!

Pushing Changes


Update the copy of your repo on GitHub so it has the most recent changes you’ve made on your machine.

Pulling Changes


Update the local copy of your repo (the copy on your computer) with the version on GitHub.

Pushing and Pulling

Workflow

When you have an existing local repo:

  1. Pull the repo (especially if collaborating).
  2. Make some changes locally.
  3. Commit the changes to git.
  4. Pull any changes from the remote repository (again!).
  5. Resolve any merge conflicts.
  6. Push your changes to GitHub.

On the left is a quote from Hadley Wickham and Jenny Bryan that says Using a Git commit is like using anchors and other protection when climbing…if you make a mistake you can’t fall past the previous commit. Commits are also helpful to others, because they show your journey, not just the destination. On the right, two little monsters climb a cliff face. Their ropes are secured by several anchors, each labeled Commit. Three monsters on the ground support the climbers.

Artwork by Allison Horst

Merge Conflicts

Merge Conflicts

These occur when git encounters conflicting changes.


Merge Conflicts

  1. Maybe you are working in real time on the same line of code or text as a collaborator.
  2. Maybe you forgot to push your changes the last time you finished working.
  3. Maybe you forgot to pull your changes before you started working this time.

Merge Conflicts

We will work on resolving merge conflicts today!


But when all else fails…

delete your local repo and clone again.

A bunny and a mouse, both looking stressed and sweaty, look on at a smoking laptop as flames start to grown from it. Text reads: Plan on it.

Artwork by Allison Horst

Tips for Avoiding Merge Conflicts

  • Always pull before you start working and always push after you are done working!
    • If you do this, you will only have problems if two people are making local changes to the same line in the same file at the same time.
  • If you are working with collaborators in real time, pull, commit, and push often.
  • Git commits lines – lines of code, lines of text, etc.
    • Practice good code format – no overly long lines!

Connect GitHub to RStudio

Install + Load R Packages

Work in your console or an Rscript for this.

  1. Install and load the usethis package.
install.packages("usethis")
library(usethis)
  1. Install and load the gitcreds Package.
install.packages("gitcreds")
library(gitcreds)

Configure git

  1. Tell git your email and GitHub username.
use_git_config(user.name = "JaneDoe2", user.email = "jane@example.org")

(Nothing should happen.)

Generate your Personal Access Token

  1. Generate a PAT.
create_github_token()
  • This will open GitHub and ask you to log in.
  • Fill in a Note and an Expiration (AT LEAST 60 days from now).
  • Click Generate Token.

Store your PAT

  1. Copy your PAT.
  1. Run the following code.
gitcreds_set()

When prompted to Enter password or token:, paste your PAT.

Verify your PAT

  1. Let’s verify.
git_sitrep()

PA 6: Merge Conflicts

You will be completing this activity in groups of 4.

IMPORTANT

This activity will only work if you follow the directions in the exact order that I have specified them. Do not work ahead of your group members!

Cartoon of the GitHub octocat mascot hugging a very sad looking little furry monster while the monster points accusingly at an open laptop with MERGE CONFLICT in red across the entire screen. The laptop has angry eyes and claws and a wicked smile. In text across the top reads gitHUG with a small heart.

Artwork by Allison Horst

To do…

  • PA 6: Merge Conflicts
    • Due Monday, 5/6 at 11:59pm – TODAY.
  • Midterm Exam
    • Wednesday, 5/8 + 48 hours.

Office Hours

Wednesday from 9-10 am and 1-2 pm

None scheduled on Friday but available upon request.