Hello, my name is Fran. I am a function in R.
Hello, my name is D. I am a functionin’ person.
I can perform any task you give me.
Interesting, can you add two numbers?
Yes, I can.
Can you tell me more about how you work?
Sure, I need the inputs and the instruction, i.e. what you want me to do with the inputs.
Okay. I am giving you two numbers, 10 and 15. Can you show me how you will create a function to add them?
This is easy. Let me first give you the structure.
# structure of a function # functionname = function(inputs) { instructions return(output) }
You can select these lines and hit the run button to load the function in R. Once you execute these lines, the function will be loaded in R, and you can use this name with any inputs.
Let us say the two numbers are a and b. These numbers are provided as inputs. I will first assign a name to the function — “add”. Since you are asking me to add two numbers, the instruction will be y = a + b, and I will return the value of y.
Here is a short video showing how to create a function to add two numbers a and b. You can try it in your RStudio program.
Neat. If I give you three numbers, m, x, and b, can you write a function for mx + b?
Yes. I believe you are asking me to write a function for a straight line: y = mx + b. I will assign “line_eq” as the name of the function, the inputs will be m, x, and b and the output will be y.
# function for line equation # line_eq = function (m, x, b) { # m is the slope of the line # b is the intercept of the line # x is the point on the x-axis y = m*x + b # equation for the line return(y) } # test the function # line_eq(0.5, 5, 10) > 12.5
Can you perform more than one task? For example, if I ask you for y = mx + b and x + y, can return both the values?
Yes, I can. I will have two instructions. In the end, I will combine both the outputs into one vector and return the values. Here is how I do it.
# function for line equation + (x + y) # two_tasks = function (m, x, b) { # m is the slope of the line # b is the intercept of the line # x is the point on the x-axis y = m*x + b # equation for the line z = x + y return(c(y,z)) } # test the function # two_tasks(0.5, 5, 10) > 12.5 17.5
Very impressive. What if some of the inputs are numbers and some of them are a set of numbers? For instance, if I give you many points on the x-axis, m and b, the slope and the intercept, can you give me the values for y?
No problemo. The same line_eq function will work. Let us say you give me some numbers x = [1, 2, 3, 4, 5], m = 0.5 and b = 10. I will use the same function line_eq(m, x, b).
# use on vectors # x = c(1,2,3,4,5) m = 0.5 b = 10 line_eq(m,x,b) > 10.5 11.0 11.5 12.0 12.5
I am beginning to like you. But, maybe you are fooling me with simple tricks. I don’t need a robot for doing simple math.
Hey, my name is Fran 😡
Okay Fran. Prove to me that you can do more complicated things.
Bring it on.
It is springtime, and I’d love to get a Citi bike and ride around the city. I want you to tell me how many people rented the bike at the most popular route, the Central Park Southern Loop and the average trip time.
aargh… your obsession with the city. Give me the data.
Here you go. You can use the March 2017 file. They have data for the trip duration in seconds, check out time and check in time, start station and end station.
Alright. I will name the function “bike_analysis.” The inputs will be the data for the bike ridership for a month, and the name of the station. The function will identify how many people rented the bikes at the Central Park S station and returned it back to the same station — completing the loop. You asked me for total rides and the average trip time. I threw in the maximum and minimum ride time too. You can use this function with data from any month and at any station.
# function to analyze bike data # bike_analysis = function(bike_data,station_name) { dum = which (bike_data$Start.Station.Name == station_name & bike_data$End.Station.Name == station_name) total_rides = length(dum) average_time = mean(bike_data$Trip.Duration[dum])/60 # in minutes max_time = max(bike_data$Trip.Duration[dum])/60 # in minutes min_time = min(bike_data$Trip.Duration[dum])/60 # in minutes output = c(total_rides,average_time,max_time,min_time) return(output) } # use the function to analyze Central Park South Loop # # bike data # bike_data = read.csv("201703-citibike-tripdata.csv",header=T) station_name = "Central Park S & 6 Ave" bike_analysis(bike_data,station_name) > 212.000000 42.711085 403.000000 1.066667
212 trips, 42 minutes of average trip time. The maximum trip time is 403 minutes and the minimum trip time is ~ 1 minute. Change of mind?
Wow. You are truly helpful. I would have spent a lot of time if I were to do this manually. I can use your brains and spend my holiday weekend riding the bike.
Have fun … and Happy Easter.
How did you know that?
Machine Learning man 😉
If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.
really helpful!