Adithi R. Upadhya
11 August, 2022
Mutating joins, filtering joins, and set operations.
Crtl + Shift + N
Opens a new document.
lubridate
lubridate
is an R package that makes it easier to work with dates and times.
Date/times are represented in a unique class type.
There are three types of date/time data that refer to an instant in time:
A date. Tibbles print this as <date>
.
A time within a day. Tibbles print this as <time>
.
A date-time is a date plus a time: it uniquely identifies an instant in time (typically to the nearest second). Tibbles print this as <dttm>
.
nycflights13
On-time data for a random sample of flights departing New York City airports in 2013.
Helpers lubridate
in automatically work out the format once you specify the order of the component.
To use them, identify the order in which year, month, and day appear in your dates, then arrange ‘y’, ‘m’, and ‘d’ in the same order.
To create a date/time from this sort of input, use make_date()
for dates, or make_datetime()
for date-times.
To create a date/time from this sort of input, use make_date()
for dates, or make_datetime()
for date-times.
Sometimes we may want to switch between a date-time and a date.
Then we use as_datetime()
and as_date()
.
Use the appropriate lubridate function to parse each of the following dates:
d1 <- “January 1, 2010”
d2 <- “2015-Mar-07”
d3 <- “06-Jun-2017”
d4 <- c(“August 19 (2015)”, “July 1 (2015)”)
d5 <- “12/30/14” # Dec 30, 2014
In lubridate
you can get and set individual components.
Pull out individual parts of the date with the accessor functions.
Using relevant arguments will show us the month abbreviations or full names.
Time components also extracted in the same way.
[1] Jul
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
[1] Friday
7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday
An alternative approach to plotting individual components is to round the date to a nearby unit of time, with floor_date()
, round_date()
, and ceiling_date()
.
Each function takes a vector of dates to adjust and then the name of the unit round down (floor), round up (ceiling), or round to.
We can use each accessor function to set the components of a date/time.
update()
Using the data frame created above our_new_mutated_dataset
set the year values to 2022
instead of 1998
.
Durations, which represent an exact number of seconds.
Periods, which represent human units like weeks and months.
Intervals, which represent a starting and ending point.
In R if you subtract two dates you get a difftime
object.
A difftime
class object records a time span of seconds, minutes, hours, days, or weeks.
This ambiguity can make difftimes
a little painful to work with, so lubridate
provides an alternative which always uses seconds: the duration.
Durations come with a bunch of convenient constructors:
[1] "15s"
[1] "600s (~10 minutes)"
[1] "43200s (~12 hours)" "86400s (~1 days)"
[1] "0s" "86400s (~1 days)" "172800s (~2 days)"
[4] "259200s (~3 days)" "345600s (~4 days)" "432000s (~5 days)"
[1] "1814400s (~3 weeks)"
[1] "31557600s (~1 years)"
Pick the simplest data structure that solves your problem.
If you only care about physical time, use a duration; if you need to add human times, use a period; if you need to figure out how long a span is in human units, use an interval.
You can find out what R thinks your current time zone is with Sys.timezone()
.
In R, the time zone is an attribute of the date-time that only controls printing. Unless otherwise specified, lubridate always uses UTC.
Read two files provided in the link and name them as join_data_1 and join_data_2 accordingly. Next also look at the various statistics of these files.
Hint read_csv()
and summary()
.
Identify the keys in both the tables.
Convert the date
column in join_data_1
and the Date
column in join_data_2
to a datetime object with the timezone as “Asia/Kolkata”.
Hint ymd_hms()
for date
column in join_data_1 and dmy_hms()
for Date
column in join_data_2. Mention the timezone.