TDM 10100: Project 12 — 2022
In the previous project we manipulated dates, this project we are going to take it a bit further and use Tidyverse, more specifically the Luberdate package.
Working with dates in R
can require more attention than working with other object classes. These packages will help simplify some of the common tasks related to date data.
Dates and times can be complicated, not every year has 365 days, not every day has 24 hours, and not every minute has 60 seconds. Dates are difficult because they have to accommodate for the Earth’s rotation and orbit around the sun as well as the occurrence of timezones, daylight savings etc. Suffice to say that when focusing on dates and date-times in R the simpler the better. Lubridate helps do so.
Dataset(s)
The following questions will use the following dataset(s):
-
/anvil/projects/tdm/data/zillow/State_time_series.csv
Questions
First lets import the libraries
-
data.table
-
lubridate
library(data.table) # make sure to load data.table first library(lubridate) # and then to load lubridate second; it will give you a warning in pink color but it is totally OK # You need to load `data.table` first and `lubridate` second for this project, because they both define `wday` and we want the version from `lubridate` so we need to load it second!
We are going to continue to dig into the Zillow time series data.
ONE
-
Go ahead and read in the dataset as
states
-
Find the class and the type of the column named
Date
-
Are there multiple functions that will return the same or similar information?
Insider Knowledge
Reminder:
- class
shows the class of the specified object used as the arguments. The most common ones include but are not limited to: "numeric", "character", "logical", "date".
- typeof
shows you the type or storage mode of objects. The most common ones include but are not limited to: "logical", "integer", "double", "complex", "character", "raw" and "list"
-
Code used to solve this problem.
-
Output from running the code.
TWO
-
In Project 11, we had to convert the
Date
column to a month, day, year format. Now convert the columnDate
into values from the class Date. (You can use lubridate to do so.) What do you think about the methods you have learned (so far) to convert dates? -
Create a new column in your data.frame
states
namedday_of_the_week
that shows (Sunday-Saturday). -
Lets create another column in the data.frame
states
that shows the days of the week as numbers.
county$Date <- as.Date(county$Date, format="%Y-%m-%d")
Helpful Hint
Take a look at the functions ymd
, mdy
, dym
Helpful Hint
-
Take a look at the functions
month
,year
,day
,wday
. -
The label argument is logical. It is also only available for wday() function. TRUE will display the day of the week as an ordered factor of character strings, such as "Sunday." FALSE will display the day of the week as a number.
-
The week_start argument by default the days are counted as 1 means Monday, 7 means Sunday When label = TRUE, this will be the first level of the returned factor. You can set lubridate.week.start option to control this parameter.
Insider Knowledge
Default values of class Date in R
is displayed as YYYY-MM-DD
-
Code used to solve this problem.
-
Output from running the code.
THREE
We want to see if there is a better month(s) for putting our house on the market?
-
Use
tapply
to compare the averageDaysOnZillow_AllHomes
for all months. -
Make a barplot showing our results.
-
Code used to solve this problem.
-
Output from running the code.
FOUR
Find the information only for the year 2017 and call it states2017
. Then create a lineplot that shows the average DaysOnZillow_AllHomes
by Date
using the states2017
data. What do you notice? When was the best month/months for posting a home for sale in 2017?
FIVE
Now we want to know if homes sell faster in different states? Lets look at Indiana, Maine, and Hawaii. Create a lineplot that uses DaysOnZillow_AllHomes
by Date
with one line per state. Use the states2017
dataset for this question. Make sure to have each state line colored differently and have a legend to identify which is which.
Helpful Hint
Use the lines()
function to add lines to your plot
Use the ylim
argument to show all lines
Use the col
argument to identify and alter colors.
-
Code used to solve this problem.
-
Output from running the code.
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |