Data Analysis Assignment 2

I just turned in Assignment 2 for my Data Analysis Course so I can now share it on here (Unfortunately, I’ve been warned that people have been plagiarizing so I’ve removed my files to prevent cheating… which ironically I did not list as a challenge for a MOOC below, but should be added). In this assignment, we were given sensor data from the Samsung Galaxy SII recorded while users performed specific activities. The goal was to develop a model on some training data to predict what activity the test subjects are performing. As usual, I wish I had more time to spend on it because I always feel like there is more I can add. Using random forests, I got a misclassification error rate of about 5% on the test subjects. Not too shabby, but at some point I would like to compare it to other models such as SVMs or Neural networks.

Since there is only one week left, I can reflect on my overall experience as it is only the second course I’ve taken. My main goal coming into this course was to get a solid refresher on statistical and machine learning concepts I had not used in a while. I also wanted to familiar myself with R packages I hadn’t been exposed to yet. The course itself consisted of weekly lectures and quizzes along with 2 larger assignments. The quizzes are graded automatically by the system, but the longer assignments are peer reviewed. Another great aspect of this course is the sheer number of resources the professor shared with us. Every lecture is followed with links to expand our knowledge. And thanks to the structure of MOOCs, I can pause the lecture to start reading about a specific topic of interest. Finally, there is a buzzing discussion forum where students, TAs, and even the professor will jump in to help.

Personally, I loved the course. It was perfect for me as it didn’t take too much time. And, I was exposed to enough new topics to make it worthwhile. This was the first iteration of this particular class and as such had its share of issues, but, I feel that most aren’t difficult problems to fix.

The MOOC Guide lists five challenges of MOOCs and I wanted to address them with respect to the Data Analysis course

Possible challenges of a MOOC:

  1. It feels chaotic as participants create their own content
    The discussion forum was the outlay for all participants’ questions, comments, and frustrations. I didn’t actually use it as much as I should have, but the few times I needed a quick answer, that was the first place to look. I think the Coursera platform itself could revamp the search features in the forum as it got progressively more difficult to find useful discussions. However, it allows for open discussion and with tens of thousands students involved, questions are answered very rapidly.
  2. It demands digital literacy
    This course is very inaccessible to individuals not comfortable with computers as it requires some programming knowledge. Though I believe computer literacy (and accessibility) is a massive problem that needs to be solved, I don’t think that is or should be Coursera’s primary goal.
  3. It demands time and effort from the participants
    Yes, like most things worth doing. I think it is fair to say that whatever the required time estimate is per week, programming courses will inevitably take longer. I am a believer in learning by doing and Data Analysis provided just that with the assignments. I feel it helped everyone a lot more to actually implement the code from scratch than to just see a video of it or copy and run it. This is a great way to get practice because you have a goal and a deadline.
  4. It is organic, which means the course will take on its own trajectory (you have got to let go)
    I think this is more for the professors. I will definitely say that the participants will often make their own decisions based on majority. For example, when individuals commented that the peer review rubric wasn’t adequate, some of the questions were reinterpreted (on the discussion forums) to allow for “fairer” scores. If anything this serves to improve the teaching model, than detract from it.
  5. As a participant you need to be able to self-regulate your learning and possibly give yourself a learning goal to achieve
    Perhaps something Coursera could add in the future is a schedule. If I take multiple courses, I have to enter each course to see deadlines, but these could be aggregated in a calendar. These simple things would help in planning Coursera around our busy lives.
    Another cool feature would be a learning graph where similar courses can be linked. That way I can progress through multiple courses and get a better understanding of a larger topic.

Now that this is almost over, I’ve already signed up for a few more courses (Model Thinking and Natural Language Processing). I also intend to open a Kaggle account to try out one of the competitions… if I have the time.


  1. Rajdas – thanks for posting the intellectual investigations you are doing. Your paper was remarkably well-written, but re-used by another student (at least one!) in this semester’s version of the course. Wanted to make you aware as I’m sure that was’t your intent and you may prefer to take it down in order not to be plagiarized (nor enable cheating). Very disappointed in my fellow student though I’m impressed by your mastery of random forest in advance of the material being taught by Prof Leek. Thanks and all the best!

    1. Hey, Thanks for letting me know – I’ve taken down the material. It’s unfortunate that people are going this route… Hopefully the peer reviewers are aware. Anyway, enjoy the remaining few weeks of the class!

Leave a Reply

Your email address will not be published. Required fields are marked *