Introduction to datetime

Introduction to datetime#

In solar photovoltaic problems, it is very common to have data whose index involves dates and times. For this using the DateTimeIndex is very useful.

Note

If you have not yet set up Python on your computer, you can execute this tutorial in your browser via Google Colab. Click on the rocket in the top right corner and launch “Colab”. If that doesn’t work download the .ipynb file and import it in Google Colab.

Then install pandas and numpy by executing the following command in a Jupyter cell at the top of the notebook.

!pip install pandas numpy

import pandas as pd
import numpy as np

We can create a time index by indicating the start, frequency and the number of periods.

time = pd.date_range(start="2024-01-01", 
                           periods=365, 
                           freq='D')

We can also use the start, frequency and end.

time = pd.date_range(start="2024-01-01", 
                     end="2026-01-01", 
                     freq="D")

We can use the time index to create a series and plot it.

values = np.sin(2 * np.pi * time.dayofyear / 365)
ts = pd.Series(values, index=time)
ts.plot()

<Axes: >

_images/1120fca7cc83f8594d7a0e84acea22b0a3653080f599c920699f89fe8731f786.png

We can use Python’s slicing notation inside .loc to select a date range.

ts.loc["2024-01-01":"2024-07-01"].plot()

<Axes: >

_images/f6b12dcb9036e416043a8ec82a0d84f25d206cc88b62c5c9bc9ba4f638a07c83.png

ts.loc["2024-05"].plot()

<Axes: >

_images/8db32db39f7dbd398f9715bf3eeed18e74292643031a3980b310869c4feca4cd.png

The pd.TimeIndex object has lots of useful attributes

ts.index.month

Index([ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
       ...
       12, 12, 12, 12, 12, 12, 12, 12, 12,  1],
      dtype='int32', length=732)

ts.index.day

Index([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10,
       ...
       23, 24, 25, 26, 27, 28, 29, 30, 31,  1],
      dtype='int32', length=732)

Another common operation is to change the resolution of a dataset by resampling in time. Pandas exposes this through the resample function. The resample periods are specified using pandas offset index syntax.

Below we resample the dataset by taking the mean over each month.

ts.resample("M").mean().head()

/tmp/ipykernel_3790/3719689167.py:1: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
  ts.resample("M").mean().head()

2024-01-31    0.268746
2024-02-29    0.704299
2024-03-31    0.954333
2024-04-30    0.955056
2024-05-31    0.697250
Freq: ME, dtype: float64

ts.resample("M").mean().plot()

/tmp/ipykernel_3790/2719797856.py:1: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
  ts.resample("M").mean().plot()

<Axes: >

_images/809c958a5cfedcf6b2d778b2fdc2c99b05234c0d1bbbbdad7c348844bf4d27bd.png

We can now import the same file from previous introduction containing data measured at a weather station, and make its index a DateTimeIndex

fn = "weather_station_data.csv"
df = pd.read_csv(fn, index_col=0)
df.index = pd.to_datetime(df.index, utc=True) 
df.index

DatetimeIndex(['2024-01-01 00:00:00+00:00', '2024-01-01 00:05:00+00:00',
               '2024-01-01 00:10:00+00:00', '2024-01-01 00:15:00+00:00',
               '2024-01-01 00:20:00+00:00', '2024-01-01 00:25:00+00:00',
               '2024-01-01 00:30:00+00:00', '2024-01-01 00:35:00+00:00',
               '2024-01-01 00:40:00+00:00', '2024-01-01 00:45:00+00:00',
               ...
               '2024-07-01 23:10:00+00:00', '2024-07-01 23:15:00+00:00',
               '2024-07-01 23:20:00+00:00', '2024-07-01 23:25:00+00:00',
               '2024-07-01 23:30:00+00:00', '2024-07-01 23:35:00+00:00',
               '2024-07-01 23:40:00+00:00', '2024-07-01 23:45:00+00:00',
               '2024-07-01 23:50:00+00:00', '2024-07-01 23:55:00+00:00'],
              dtype='datetime64[ns, UTC]', length=52704, freq=None)

We can define a time series and use it to select data within the database.

start_date = '2024-05-01 00:00:00'
end_date =  '2024-05-30 00:00:00'
time_index= pd.date_range(start=start_date, 
                                  end=end_date, 
                                  freq='5min',
                                     tz='UTC')
df.loc[time_index].plot()

<Axes: >

_images/6b9b5c3b0ba75670d9ccf58eda168f66b298382047a0bc6f19f9e3c75b15bda9.png

We can also group the values per day, average them and plot them.

df.loc[time_index].groupby(df.loc[time_index].index.day).mean().plot()

<Axes: >

_images/adfeadb638de90ef3dc90aed1b8a5779725691ba24041443ef973388027844b6.png