Why am I doing this?
What you just read is my blog post on January 1, 2017. It is what happens on a new year day when you don’t have a social life.
Over the last few years, I have concluded that the largest crowd I can ever reach, if I continue delivering lectures in the usual University setting is 80 per semester. I am sure some of them are not there by choice. Since I believe that I can distill ideas into easily understandable forms, I felt the urge to spread my voice to a larger audience. Hence our Data Analysis Classroom.
I usually emphasize the importance of collecting more data for understanding the uncertainties in the system and using them in the final decision-making process. Practicing the preachings, after 29 lessons, I want to pause, reflect on the readership data and rewind what we learned.
We started this journey on the fifth day of February 2017. After 202 days, i.e., six months and 21 days, the monthly readership data is
We more than made up for the slump in July. The total page views in this time are 19051 with average monthly page views of 2721. Being very conservative, I would like to think that the blog may only capture the interest of 0.5% of the readers. That will be approximately 95 more people I could reach in the six month period. As I said, that is easily more than who I can reach through a class in a semester. I just hope these 95 don’t include the same folks in the class who are already captivated!
Here is a map of the readership. If you are in one of the blue countries, thank you for your attention. Let’s get more people on board. They will like it too.
What did we learn so far?
Uncertainties surround us. Understanding where they arise from and recognizing their extent is fundamental to improving our knowledge about anything.
We started off with the fact that one needs to observe the system to understand it better. Ask for data. The more, the merrier.
Lesson 1: When you see something, say data. A total of 412 page views.
Data may be grouped in sets → collection of elements. We have various forms to visualize the sets. Unions, intersections, subsets, and their properties.
Lesson 3: The Setup. 321 page views.
Lesson 4: I am a visual person. 301 page views.
We defined probability and understood that there are some rules (axioms) of probability.
Lesson 6: It’s probably me. 377 page views.
Lesson 7: The nervousness axiom – fight or flight. 253 page views.
We learned conditional events, independent events and how the probability plays out under these events.
Lesson 9: The necessary condition for Vegas. 813 page views.
Lesson 10: The fight for independence. 641 page views.
We now know the law of total probability and the Bayes theorem.
Lesson 12: Total recall. 471 page views.
Lesson 13: Dear Mr. Bayes. 2280 page views — most popular lesson so far.
In lessons 14 through 20, you can learn the basics of exploratory data analysis; visualization techniques, and summary statistics.
Lesson 14: The time has come; execute order statistics. A lesson to explain the concept of percentiles and boxplots. 327 page views.
Lesson 15: Another brick in the wall — for building histograms. 297 page views.
Lesson 16: Joe meets the average. Explains the mean of the data. 218 page views.
Lesson 17: We who deviate from the norm. Explains the idea of variance and standard deviation. 159 page views.
Lesson 18: Data comes in all shapes. A short lesson explaining the concept of skewness. 116 page views.
Lesson 19: Voice of the outliers. Outliers are significant. As Nicholas Taleb puts it: “Don’t be a turkey” by removing them. 87 page views.
Lesson 20: Compared to what?. Use the coefficient of variation to compare data. 115 page views.
Lessons 22 to 28 introduce the idea of random variables and probability distribution.
Lesson 22: You are so random. The basic idea of discrete and continuous random variables. 109 page views.
Lesson 23: Let’s distribution the probability. This lesson will teach the concept of the probability distribution. 126 page views.
Lesson 24: What else did you expect? 130 page views.
Lesson 25: More expectation. 95 page views. These two lessons go through the expected value of a random variable.
Lesson 26: The variety in consumption. 175 page views.
Lesson 27: More variety. 568 page views. These two lessons are for understanding the variance of a random variable concept.
Lesson 28: Apples and Oranges. How to standardize the data for comparison. 325 page views.
Lessons in R
As I mentioned in lesson 2, R is your companion in this journey through data. Computer programming is very very (cannot emphasize enough “very”) essential for data analysis. Wherever required, I have provided brief lessons on how to use R for data analysis.
Lesson 2: R is your companion. The very first lesson to get going with R. 434 page views.
Lesson 5: Let us Review. Reading data files and more fun stuff. 228 page views.
Lesson 8: The search for wifi. Learning for and if else statements in R. 194 page views.
Lesson 11: Fran, the functionin R-bot. The essentials of writing functions in R. 332 page views.
Lesson 21: Beginners guide to summarize data in R. A step-by-step exploratory data analysis. 1786 page views.
Lesson 29: Large number games in R. 933 page views.
Where do we go from here?
A long way.
While you pause and reflect on these lessons, I will pause and come back with lesson 31 on the ninth day of September 2017. Help spread the word as we build this knowledge platform one lesson at a time while the university system becomes obsolete.
If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.