April 18, 2024
Ability for other people with a similar level of skill to reproduce your work.
Other people
Fundamental part of research
Also is best practice - which will allow others to reproduce your work.
We need to have confidence that our research is good quality and we are doing good science
Peter Fisher, Uni of Leicteser UK (1993) compared seven different pieces of GIS software doing a viewshed analysis
and got seven (slightly) different results!
Fisher also discovered a major error in one piece of software which gave completely incorrect results.
Highlights the need for:
Fisher, P. F. (1993). Algorithm and implementation uncertainty in viewshed analysis. International Journal of Geographical Information Systems, 7(4), 331–347. https://doi.org/10.1080/02693799308901965
Riggs & Dean, Colorado State (2007) did a similar investigation on viewshed analysis
Things have improved since 1993, but there are still differences in different software.
Riggs, P.D. and Dean, D.J. (2007), An Investigation into the Causes of Errors and Inconsistencies in Predicted Viewsheds. Transactions in GIS, 11: 175-196. https://doi.org/10.1111/j.1467-9671.2007.01040.x
“[…] when the same analysis steps performed on the same dataset […] produce the same answer.” (Turing Way)
Findable
Accessible
Interoperable
Reusable
Some journals & conferences ask you to submit code along with your paper
AGILE - https://reproducible-agile.github.io/
Anyone (with a similar level of skills) should be able to do reproduce your research and benefit from it.
One reason for open source tools.
If you do analysis in ArcGIS Pro, you need ArcGIS Pro to recreate that analysis.
If you don’t have ArcGIS Pro, what do you do?
Other work can be useful if it can be reproducible:
quarterly or annual reports
repeating work over 200 areas, 50 business units, 365 days,
coming back to your work 6 months later - “please can you update this with this new data?”
Documenting what you did is standard - Methods
If you can do what you did in a script, then you can also share this
ArcGIS Pro / QGIS
R / Python
To replicate a piece of work, you need to know what software they used
What version
What libraries / packages
What version of libraries or packages
Can record this in text
Or in code
Docker gives you a big box to put all this in
Then you say - I used this Docker environment
AGILE has a very nice overview
If your project evolves over time, you may need to use version control
Provides a snapshot of your code at a specific point in time - I used this version of my code
Version Control (Git) allows you to do this, while still developing your code, and to see the differences (diff).
GitHub allows you to collaborate with other people on this.
Also works for writing and presentations as well.
Markdown allows you to write plan text with tags - stars, hashes, etc.
Can also do analysis in this
LaTeX is a developved version of Markdown (or Markdown is a simple version of LaTeX)
RMarkdown allows you to run R code
Quarto allows you to run other code (Python, R, etc.)
This presentation is written in Quarto.
Syntax | Output |
---|---|
*Italic* |
Italic |
**Bold** |
Bold |
~~strikethrough~~ |
|
[Link](url) |
Link |
i\hbar \frac{\partial \Psi}{\partial t} = -\frac{\hbar^2}{2m} \nabla^2 \Psi + V(\mathbf{r},t) \Psi |
\(i\hbar \frac{\partial \Psi}{\partial t} = -\frac{\hbar^2}{2m} \nabla^2 \Psi + V(\mathbf{r},t) \Psi\) |
Syntax | Output |
---|---|
*Italic* |
Italic |
**Bold** |
Bold |
~~strikethrough~~ |
|
[Link](url) |
Link |
i\hbar \frac{\partial \Psi}{\partial t} = -\frac{\hbar^2}{2m} \nabla^2 \Psi + V(\mathbf{r},t) \Psi |
\(i\hbar \frac{\partial \Psi}{\partial t} = -\frac{\hbar^2}{2m} \nabla^2 \Psi + V(\mathbf{r},t) \Psi\) |
```{r}
#| label: "iris-plot"
#| echo: TRUE
#| fig-format: svg
#| cache: TRUEs
data(iris)
plot(iris$Sepal.Length, iris$Sepal.Width,
main = "Scatter Plot of Sepal Length vs Sepal Width",
xlab = "Sepal Length (cm)",
ylab = "Sepal Width (cm)",
pch = 16, col = iris$Species)
```
defaults to knitr engine (you can override the engine with engine: jupyter
)
```{python}
#| label: fig-polar
#| fig-cap: "A line plot on a polar axis"
import numpy as np
import matplotlib.pyplot as plt
r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
fig, ax = plt.subplots(
subplot_kw = {'projection': 'polar'}
)
ax.plot(theta, r)
ax.set_rticks([0.5, 1, 1.5, 2])
ax.grid(True)
plt.show()
```
defaults to jupyter engine
You can use Python and R code together using the reticulate
package
viewof bill_length_min = Inputs.range(
[32, 50],
{value: 35, step: 1, label: "Bill length (min):"}
)
viewof islands = Inputs.checkbox(
["Torgersen", "Biscoe", "Dream"],
{ value: ["Torgersen", "Biscoe"],
label: "Islands:"
}
)
Strengths 💪
Weaknesses 😢
By setting up your teaching materials in a reproducible manner, you demonstrate the value of reproducibility directly
Images: Scriberia with The Turing Way community (License: CC BY 4.0)
💻 Slides: Slides are publicly available at github.com/jansim/dra-reproducible-materials
📦 Software: Reproducible slides build with Quarto and deployed to GitHub Pages using GitHub Actions (details in the Quarto docs)
Source: Source code is available at github.com/jansim/dra-reproducible-materials
🖲️ DOI: (generated using GitHub + Zenodo, see GitHub docs)
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
💬 Contact: We welcome any feedback via email or GitHub issues. Thank you!
lennart.wittkuhn@uni-hamburg.de
lennartwittkuhn.com
GitHub
Mastodon
Images: Scriberia with The Turing Way community (License: CC BY 4.0)
💻 Slides: Slides are publicly available at github.com/jansim/dra-reproducible-materials
📦 Software: Reproducible slides build with Quarto and deployed to GitHub Pages using GitHub Actions (details in the Quarto docs)
Source: Source code is available at Github.com/nickbearman/reproducibility-replicability-gds-penn
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
💬 Contact: We welcome any feedback via email or GitHub issues. Thank you!