Word Count : 3000 words
Write a report using R Markdown to introduce, frame, and describe your story and figures.
This document should provide a discussion and demonstration of the steps you have undertaken to perform exploratory data analysis using data visualisation detailing the reasoning for undertaking each step. Please demonstrate the complete workflow and include all your code!
The report should include the following:
• Background information and summary of the dataset
• Research Questions (aim for 2 or 3 questions, but they may be related)
• Characteristics of the variables of interest
• Description of the Exploratory Data Analysis techniques used
• At least 3 different types of data visualisation that help answer the research questions
• Explanation/justification, description and code for each individual visualisation
• Conclusion and Insights from the analysis, and how these might help answer the research questions.
Neatness, coherency, and clarity will count. All analyses must be done in RStudio, using R.
There is no limit on what tools or packages you may use, as long as you predominantly use the packages we learned in class (tidyverse and ggplot2).
The goal of this coursework is for you to demonstrate proficiency in the techniques we have covered in this class (and beyond if you like) and apply them to a novel dataset in a meaningful way. In other words, you get to show off all the tools you learned by creating beautiful, truthful, narrative visualizations! For this coursework, you will take a dataset, explore it, play with it, and tell a story about it using at least three different types of graphs. In this coursework you will use R Studio to generate an R Markdown document.
Dataset
You can choose what data you will use for this project.
You need to choose one of the following:
1. Country-by-country time series data of COVID-19 vaccinations. The data is provided by
the United States Centers for Disease Control and Prevention. More information on this
dataset can be found here:
https://github.com/owid/covid-19-data/tree/master/public/data/vaccinations
2. 515K Hotel Reviews Data in Europe. The data is provided by Booking.com. More
information on this dataset can be found here: https://www.kaggle.com/jiashenliu/515khotel-reviews-data-in-europe (you need a Kaggle account to see this link; it’s free!).
3. Melbourne housing data. The data is provided by Tony Pino. More information on this dataset can be found here: https://www.kaggle.com/dansbecker/melbourne-housingsnapshot
4. Find your own dataset. You can use a dataset of your own choosing, but it needs to meet the following requirements:
• At least 10,000 rows (observations)
• At least 8 column variables, including categorical and continuous/discrete numerical variables
• There is no ready-to-run R code available online.
Reference Style : Harvard