Codes, figures, maths... Start writing your reports in R Markdown!
Intro
Sharing the progress of your analysis using Word can be tricky. For the fact that there are always things to edit till the very last minute. Copying and pasting those changes between R and Word while managing version control is just unrealistic.
The first time I used R Markdown was also the first time I used any markdown editor, and I fell in love with it instantly.?In this blog, I will share my experience of writing reports in R Markdown. I am going to walk you through an rmd document and talk about the essential elements of a professional PDF report for data science projects.
I find that writing down my learnings helps me absorb the knowledge better; this blog is also my personal reference for future projects. However, it can be a bit verbose to you. If you have never used R Markdown before, the two tutorials below may provide a more structured way to learn. So help yourself to whatever suits you the best!
What's R Markdown?
To my understanding, R Markdown is a tool that allows you to write documents within the R environment, using the LaTeX system. Instead of highlighting and clicking as you do in Word, you use commands to define fonts, spacing, and so forth. The best thing is, you can include any codes and outputs in your report by putting them in a code chunk. This is super useful if you need to explain your analysis process to someone!
Later I also found out that you can create presentation slides and dashboards with R Markdown. I personally haven't used it for these purposes and am keen to hear any thoughts on that. Instead, I use Canva (presentation) and Shiny/other BI tools (dashboard).
If you need some inspiration on what R Markdown can accomplish, and want to do some practices following others' work, the Gallery is always the best go-to.
R Markdown vs Jupyter Notebook
Codes and outputs, can't Jupyter Notebook do the same?
Although Jupyter Notebook can be used to demonstrate the process of your work, I don't think it was built for report writing purposes. If you use Python and want to produce a nicely knit PDF report, it may be worth looking into this article:
(I have no experience combining Python and R Markdown; apologies if this article is not actually useful)
Create an rmd document
Creating an R Markdown file is very simple. Within R Studio, click File > New File > R Markdown and you will see the prompt window below. Remember to select PDF as the output format. That way you can preview how your work looks along the way. Also, some of the HTML configurations won't work for PDF so don't assume the output will be interchangeable.
Having said that, if you do want to switch between PDF and HTML, it is easy to do so when you knit the document (see below).
Once you click OK, a default rmd template like the one below will appear. Basically, a report document consists of three parts: YAML, code chunk, and markdown text. By enriching each of them, you eventually reach the report that you would like to produce.
Let's walk through how!
YAML
Wikipedia says YAML is "a human-readable data-serialization language". All I know is, it's something that stays at the top of your rmd file and is used to configure the layout. In the default document, the YAML only has three elements: title, author, and output. The first two are what will be printed on the first page of your report, the last one specifies the format of the output.?
Here's the YAML I normally have for academic report writing. I specify the output format as bookdown::pdf_document2, and set the figure location, font and spacing, and reference style. Don't worry, although it's truly quite human-readable as Wikipedia suggests, I will explain them one by one below.
---
output:
?bookdown::pdf_document2: default
?citation_package: biblatex
toc: yes
header-includes: ?
\usepackage{caption}
?\usepackage{float}
?\floatplacement{figure}{H}
?\usepackage{setspace}
?\onehalfspacing
?\hypersetup{colorlinks=true, linkcolor=blue, urlcolor=blue, citecolor=blue}
?\pagenumbering{gobble}
bibliography: ["reference.bib"]
link-citations: true
fontsize: 12pt
fontfamily: times
csl: apa.csl
---
1. Cross-referencing in report
The default PDF output didn't work well with my cross-referencing codes (and I have no answer to this). If you have a lot of tables and figures in your report, and would like to make them clickable when you mention them in the paragraphs, use the Bookdown::pdf_document2 output instead. You can read up about Bookdown here. Then, you can call the tables and figures by their names (which are given by you in the code chunks).
2. Table of Contents depth
The default depth is two levels but you can change it by adding toc options to YAML.
output:
? bookdown::pdf_document2: default
? citation_package: biblatex
toc: yes
toc-depth: 4
3. Headers-includes
Under header-includes, you can load different LaTeX packages to suit your layout needs. It always starts with \usepackage{PackageName}, then the actual arguments if any. For example, I called the package {float} and specified the figures to be always placed after the corresponding paragraph. (without this setting, the figures will show at a location that R thinks to be optimal in terms of space, so you may have Figure 3 next to a paragraph related to Figure 1. I find it rather confusing).
4. References
Same as LaTeX, you need to store all the references in a bib file and call it. You will also need to download a csl file and put it under the same repository of your project. R uses that to understand the reference style you are looking for, such as APA7. Make sure you specify both of them in YAML as I did.
Another tip: you may want to un-number the References section. So instead of having 7. References, you only want References. To remove the section number, you can add {-} to the section name in your markdown text.
# SECTION NAME {-}
领英推荐
Code chunk
The default document has three code chunks: the setup, a table output, and a figure output. Apart from the first one, you can add as many code chunks as you like. Usually, one table/figure each.
1. Setup
Here is where you call all the libraries needed to run the codes within the entire rmd document. If you want to print a table variable created within an R script, you also need to source that script. My sample codes for the setup:
```{r setup, include=FALSE
# RUN PACKAGES ----
require(tidyverse)
require(knitr)
require(kableExtra)
require(ggplot2)
# RUN R SCRIPTS ----
source('sample_code1.R') # must run first
source('sample_code2.R') # must run first
source('sample_code3.R')
source('sample_code4.R')
# stay last
source('sample_code5.R')
Note that I have some comments on which scripts to run first and which to stay last. This is also because of the variable dependency. Changing the order of scripts can cause errors if R doesn't know where to find that variable.
The alternative is to copy and paste all the codes in rmd, but that is not a very good practice of project management I think. Unless your project is very small.
2. Tables, or should I say... kables
The kable and kableExtra packages are a must-have for PDF reports. Here is a snippet of one table, but I highly recommend downloading the manual written by Hao Zhu. It provides examples of the most popular styles of tables. It is one of the best things I ever encountered among all those Google searches!
Note that the first line of the code chunk says r codeChunkName, echo = FALSE
The codeChunkName is what you use to call the table in paragraphs. echo means to print the code, eval means to print the output (the table in this case); both of them are default as true.
```{r codeChunkName, echo = FALSE
kable(tableName, booktabs = T,
caption = "Caption of table can be entered here") %>%
?kable_styling(font_size = 7,
? ? ? ? ? ? ? ?latex_options = c("striped", "hold_position"),
? ? ? ? ? ? ? ?full_width = F) %>%
?pack_rows("row title", 2, 3) %>%
?pack_rows("row title", 4, 11)
```}
3. Figures
To include figures in your report, you can either plot it directly in the code chunk, or use the include_graphics() function to call a saved image in your repository. I prefer the latter as I always like to have a copy of the figures; it also keeps the rmd file more clear and concise that way.
Similar to the table output, you specify the code chunk name, figure caption, graph width, alignment as per below.
```{r, label = "codeChunkName",
fig.cap = "Caption of figure can be entered here",
out.width = "85%", fig.align = 'center', echo = FALSE
knitr::include_graphics("/cloud/project/figure1.jpg")
```}
Markdown text
Finally, let's write some reports!
Configuring the YAML and code chunks may seem troublesome for just one report, the good thing is you only need to do it once. R Markdown has saved me so much time, and I'd say it's totally worth it to spend an hour or two to set that up.?
Just like Jupyter Notebook and the other markdown editors, writing text in R Markdown is straightforward, and you use different symbols to change the output. This is a screenshot of the R Markdown cheatsheet.
1. Cover page
It took me a while to find out how to add a cover page with images to the rmd file. Turns out it's pretty easy. Here's the code snippet I use, which was found on the internet. I tried to find the original post but had no luck. Whoever shared this out there - thank you!
The \newpage command is important cause this will force R Markdown to start a new page. Having it at the beginning and the end of this piece of code creates a cover page. You can also use \newpage when starting a new section.
\newpage
\begin{centering}
```{r logo, echo=F, out.width="100%"}
knitr::include_graphics("/cloud/project/logo.png")
```
\Large
{\bf Sample Report \\Move to the next line}
\vspace{0.5 cm}
\normalsize
By
\vspace{0.5 cm}
{\bf Lillian Lu}
\vspace{6 cm}
\normalsize
Organisation \\
Department \\
Whatever you want to put\\?
Just keep adding
\end{centering}
\newpage
2. Reference location
If you include a bib file in the YAML, it's set to display at the very end of your document. This is an issue when you have an Appendix. Use <div id="refs"></div> to specify the location of your reference list.
# References {-}
<div id="refs"></div>
\newpage
3. Math notation
It's super easy to include math symbols and equations in your report. This is a screenshot of a web page written by R Pruim; it provides a lot of examples for you to refer to.
Use $ to start your equation and close it with another $ to have it embedded in your text. If you want it as a standalone paragraph, use $$ for both ends instead.
So there you have it! A professional-looking data science report!
Remember to knit it often
All of the knowledge above was collected bit by bit over the last year. They may not be the best practices of how to use R Markdown, but they're certainly useful to me. You may have your personal style and needs, and it's important you knit often while you write, to ensure the output is as expected.
Knitting is straightforward. You can hit the Knit button to output the format you specified in YAML, or click the down arrow to change to other formats. So don't worry if you selected HTML but wanted to change to PDF later, this is very easy to do. However, it may cause some errors in your codes so make sure you check that (for example, settings in a table that applies to HTML output may not work for PDF).
That's the end of this article. Thank you again for all the contributors on StackExchange! The least I can do in return is to spread the word of R. On a reflection note, I tried to keep my content concise but in the end it always feels verbose. What do you think? Please let me know. Happy writing!
Analista de crédito e preven??o à fraude
8 个月Thanks. Since 2020, I have used Rmarkdown for doing business reports. You could show a sample.