I am staring at my MacBook, thinking about the opening lines. As my mind wanders, I look over the window to admire the deceivingly bright sunny morning. The usually pleasant short walk to the coffee shop was rather uncomfortable. I ignored the weatherman’s advice that it will feel like 12 degrees today. Last week, my friend expressed his concern that February was warm. I begin to wonder how often is Saturday cold and how many days in the last month were warm. Can I look at daily temperature data and find these answers? Maybe, if I can get the data, I can ask my companion R to help me with this. Do you think I can ask R to find me the set of all cold days, the set of all warm days and the set of cold Saturdays? Let us check out. Work with me using your RStudio program.
Step 1: Get the data
I obtained the data for this review from the National Weather Service forecast office, New York. For your convenience, I filtered the data with our goals in mind. You will notice that the data is in a table with rows and columns. Each row is a day. The columns indicate the year, month, day, its name, maximum temperature, minimum temperature and average temperature. According to the National Weather Service’s definition, the maximum temperature is the highest temperature for the day in degrees Fahrenheit, the minimum temperature is the lowest temperature for the day, and the average temperature for the day is the rounded value of the average of the maximum and the minimum temperature.
Step 2: Create a new folder on your computer
When you are working with several data files, it is often convenient to have all the files in one folder on your computer. You can instruct R to read the input data from the folder. Download the “nyc_temperature.txt” file into your chosen folder. Let us call this folder “lesson5”.
Step 3: Create a new code in R
Create a new code for this lesson. “File >> New >> R script”.
Save the code in the same folder “lesson5” using the “save” button or by using “Ctrl+S”. Use .R as the extension — “code_lesson5.R”. Now your R code and the data file are in the same folder.
Step 4: Choose your working directory
Make it a practice to start your code with the first line instructing R to set the working directory to your folder. In this lesson, we have a folder named “lesson5”. So we should tell R that “lesson5” is the folder where the data files are stored. You can do this using the “setwd” command.
The path to the folder is given within the quotes. Execute the line by clicking the “Run” button on the top right. When this line is executed, R will read from “lesson5” folder. You can check this by typing “list.files()” in the console. The “list.files()” command will show you the files in the folder. If you followed the above steps, your would see “code_lesson5.R” and “nyc_temperature.txt” on the screen as the listed files in your folder.
Step 5: Read the data into R workspace
The most common way to read the data into R workspace is to use the command “read.table”. This command will import the data from your folder into the workspace. Type the following line in your code and execute it.
Notice that I am giving the file name in quotes. I am also telling R that there is a header (header=TRUE) for the data file. The header is the first row of the data file, the names of each column. If there is no header in the file, you can choose header=FALSE.
Once you execute this line, you will see a new name (nyctemperature) appearing in the environment space (right panel). We have just imported the data file from the “lesson5” folder into R.
Step 6: Use the data to answer the questions
Let us go back to the original questions. How many days in the last month were warm, and how often is Saturday cold.
Let us call data for the months of January and February as the sample space S. S is the set of all data for January and February. Type the following lines in your code to define sample space S.
Notice that S is a table/matrix with rows and columns. In R, S[1,1] is the element in the first row and first column. S[1,7] is the element in the first row and seventh column, i.e. the average temperature data for the first day. If you want to choose the entire first row, you can use S[1, ] (1 followed by a comma followed by space within the square brackets). If you want to select an entire column, for instance, the average temperature data (column 7), you can use S[ ,7] (a space followed by a comma followed by the column number 7).
To address the first question, we should identify warm days in February. We need to define a set A for all February data, and a set B for warm days.
Recall lesson 4 and check whether A and B are subsets of S.
Type the following lines in your code to define set A.
S[ ,2] is selecting the second column (month) from the sample space S. The “which(S[ ,2]==2)” will identify which rows in the month column are equal to 2, i.e. we are selecting the February days. Notice that A will give you numbers 32 to 59, the 32nd row (February 1) to 59th row (February 28) in the data.
Next, we need to define set B as the warm days. For this, we should select a criterion for warm days. For simplicity, let us assume that a warm day is any day with an average temperature greater than or equal to 50 degrees F. Let us call this set B = set of all warm days. Type the following lines in your code and execute to get set B.
S[ ,7] is selecting the 7th column (average temperature) from the sample space S. The “which(S[ ,7]>=50)” will identify which rows have an average temperature greater than or equal to 50. Notice that B will give your numbers 12, 26, 39, 50, 53, 54, 55, 56, and 59; the rows (days) when the average temperature is greater than or equal to 50 degrees F. February 28th, 2017 had an average temperature of 53 degrees F. I believe it was that day when my friend expressed his unhappiness about the winter being warm!
Now that we have set A for all February data, and set B for warm days, we need to identify how many elements are common in A and B; what is the intersection of A and B. The intersection will find the elements common to both the sets (Recall Lesson 4 – intersection = players interested in basketball and soccer). Type the following line to find the intersection of A and B.
The “intersect” command will find the common elements of A and B. You will get the numbers 39, 50, 53, 54, 55, 56, and 59. The days in February that are warm. Seven warm days last month — worthy of my friend’s displeasure.
Can you now tell me how often is Saturday cold based on the data we have? Assume cold is defined as an average temperature less than or equal to 25 degrees F.
Did you realize that I am just whining about Saturday being cold? Check out the “union” command before you sign off.
If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.
Please prof, post a video with the steeps, will be better to understand.