R Tutorial for Beginners 1.3: Importing Data to R
To kick start this part of the tutorial, its important to identify the file types that R can read/manipulate. We have 2 main options to save the data:
- Comma Separated Value: ".csv"
- Tab Delimited Text File: ".txt"
Saving data as ".csv" file is easier and more efficient as you can do it directly from excel (save as -> .csv), you'll see it saved on the desktop and it will open by default in excel.
Importing the data file to R
- First option is the "read.csv" command
You can access the help menu by typing help and in brackets put the desired command to show what this command can do, example:
> help(read.csv)
You may also place a question mark infront of the command:
> ?read.csv
Import data and save it in the desired object, for the purpose of this exercise, lets call it "test"
> test<- read.csv(file.choose(), header=TRUE)
- Normally R would like us to specify the specific path to find the file, but we used a handy command called "file.choose()" this will open a menu allowing us to select the data file directly.
- Regarding the Header argument: you set it equal to TRUE (in capital letters) to let R know that the first row of our data set are variable names or headers, if the first row doesn't contain header then set it equal to FALSE
Note that you can use "T" instead of "TRUE" as well
Once you run the command and import the data to "test" view it as we learned before. Now the data is on R and ready for change/adjustments.
In the work space you can see the object and the number of observations and number of variables within it.
II. Importing using "read.table" command:
Import data and save it in the desired object, for the purpose of this exercise, lets call it "test1"
> test1 <- read.table(file.choose(), header=T, sep=",")
- As you noticed we used "sep" which tells R how the data is seperated and that we are importing a comma separated file (.csv) you put comma between " "
- When you run the above, again you can see that R prompted you to choose the .csv file and it will import the data as before under "test1" with the same number of observations for the same number of variables as "test".
III. Importing Tab Delimited Text file (.txt)
Now lets save the original file on the computer as a Tab Delimited Text file (.txt) and see how we can import it. Again go to the file > save as > .txt
By default the .txt will open in a text editor
You need to use the "read.delim" command to import .txt:
> test2 <- read.delim(file.choose(), header=T)
As before it will prompt you to choose the file and then import it, again it will show the same number of observations for the same number of variables as "test" and "test1", proving that it was imported correctly.
You can also import this data using a more generic "read.table" command:
> test3 <- read.table(file.choose(), header=T, sep="\t")
- As you noticed we used "sep" which tells R how the data is separated and that we are importing a Tab Delimited Text File "\t"
- Another way to import data is to use the "Import Dataset" option in R studio that allows you to import from a text file or from Web URL
- If it was a .txt when you choose import it will prompt you to choose the file, give it a name in R, choose if it has a heading (Yes/No), Separator (tab, Semicolon, Comma, Whitespace) {note that whitespace is when there is no separator between data}, Decimal (to know which character will specify a decimal point, period or comma, in Europe they use a decimal point, we use period), The last option is to let R know what to do with categorical variables or character strings, here we'll have R place double quotes around character strings, other options are single quote and None
- After specifying all the options go ahead and press "Import"