30 July 2018

Exploratory Data Analysis of The Electricity Price in The Turkish Day-Ahead Market


By: Mohammed Saif Kazamel, Data Scientist

Introduction

In this analysis we try to understand how the price of electricity in the wholesale Turkish electricity market behaves. The data that we'll look at consists of the hourly prices of electricity in the day-ahead market for three years (15/07/2015 - 15/07/2018). The measurement unit of the price is TL/MWh where TL is an abbreviation for the local currency Turkish Lira. The price of the day-ahead market is called the Market clearing price with the abbreviation MCP, and is called Piyasa Takas Fiyatı in Turkish with the abbreviation PTF. We will use the English abbreviation MCP to refer to the price in this analysis. If you want to learn more about the Turkish electricity market, you can read my report which gives a good overview of the market.
I preferred not to go back in time more than three years because considerable changes took place (and are still taking place) in the electricity market structure as well as the country's economy, so the insights gained from old data my not be valid for predicting the future.
In this report, before each analysis I included a brief introduction to the analysis and after the analysis I added a short discussion of the insights we gain from it. The size of the figures in this report is quite large because the number of data points plotted is large and the graphs won't be clear otherwise.

Some of the questions we try to answer here include

  • Does the price show high variability through time?
  • Does it have an increasing or decreasing trend?
  • Does the price change in a seasonal manner and if so, how?
  • Does variance change overtime or does it have a constant variance?
  • Which outside factors have a significant effect on the price?

The main takeaways of the analysis

  • Prices are lower on weekends than weekdays and also have less variance
  • Extreme conditions such as supply crises cause some serious spikes in the price
  • Outliers tend to cluster together
  • USDTRY parity has significant effects on the price
  • The price has a daily cycle where it is lower at night 22:00 - 08:00 than in the rest of the day
  • Average price is increasing every year

Sections of the analysis

  1. Data Preparation and Overview
    1. Preparation
    2. Overview
  2. Analysis of the Price Behavior with Different Time Components
    1. Day of the year
    2. Month
    3. Day of the month
    4. Day of the week
    5. Hour of the day
    6. Heatmap of the year
  3. Analyzing Autocorrelation
    1. Autocorrelation between all hourly prices
    2. Autocorrelation between the daily prices of a single hour
  4. Analyzing Correlation with Some External Variables
    1. USD exchange rate
    2. Market clearing quantity
    3. Wind power generation
  5. Conclusion
  6. About the Author

For a summary of the results of all the analyses you can check the conclusion section.

1. Data Preparation and Overview

This section includes examining some summary statistics about the data, getting the general feel of how it behaves and looking at the long-term trend of the data.

A. Preparation

In [1]:
# Importing the needed libraries
import warnings
warnings.filterwarnings('ignore')
from scipy.stats.stats import pearsonr
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
import statsmodels.graphics.tsaplots as tsap
%matplotlib inline
In [2]:
# Loading the dataset
url = 'https://raw.githubusercontent.com/Mo-Saif/Turkish-Electricity-Market/master/MCP%20Exploratory%20Analysis%20Files/MCP-15072015-15072018.csv'
df = pd.read_csv(url, parse_dates = [[0,1]], dayfirst=True)
print(df.head(),'\n\n')
print(df.info())
            Date_Hour MCP (TL/MWh)
0 2015-07-15 00:00:00       151.98
1 2015-07-15 01:00:00       154.16
2 2015-07-15 02:00:00       134.99
3 2015-07-15 03:00:00          128
4 2015-07-15 04:00:00       124.99 


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26328 entries, 0 to 26327
Data columns (total 2 columns):
Date_Hour       26328 non-null datetime64[ns]
MCP (TL/MWh)    26328 non-null object
dtypes: datetime64[ns](1), object(1)
memory usage: 411.5+ KB
None

From the information about the data frame object, we see that the price column is not recognized as a numeric data type. We will convert it in the next cell.

In [3]:
df = df.set_index(['Date_Hour'])  # Setting the index to the datetime column
df.rename(columns={"MCP (TL/MWh)" : "MCP"}, inplace=True) # Renaming the column with a short name
df['MCP'] = df['MCP'].str.replace(',','')  # Removing the commas
df['MCP'] = df['MCP'].apply(pd.to_numeric) # Converting MCP to numeric value
df.head() # A view of its final form
Out[3]:
MCP
Date_Hour
2015-07-15 00:00:00 151.98
2015-07-15 01:00:00 154.16
2015-07-15 02:00:00 134.99
2015-07-15 03:00:00 128.00
2015-07-15 04:00:00 124.99

Looks like we're all set

B. Overview

In [4]:
df.describe() # Summary statistics
Out[4]:
MCP
count 26328.000000
mean 156.825280
std 53.932223
min 0.000000
25% 128.327500
50% 160.000000
75% 196.917500
max 1899.990000

As we can see, the average price is 156.8 TL with a standard deviation of 53.9 meaning that 99.7% of the data lie between 156.8 ± 3*53.9 = [0, 318.5]. This range is very large if a company wants to predict electricity in a reasonably accurate manner.
The maximum price is 1900 TL which is very far from the mean suggesting the presence of outliers and high volatility of the price. The minimum price is zero, which means that for some hours the electricity was free and also that the prices doesn't reach negative levels. This is partly because many major electricity consumers (e.g. large factories) and distributors had the opportunity to deal with the public sector producer (EÃœAÅž/TETAÅž) at a better rate than the market. Since demand side was not in the market, supply side (renewables included) matching price was very low. That structure is slowly dissolving and it is still in progress. We can check if this is a frequent occasion.

In [5]:
df[df['MCP'] == 0].count()
Out[5]:
MCP    117
dtype: int64

We see that the price was set to zero for 117 different hours in the past 3 years, which means that this does not happen frequently.

Now let's plot the price over time and see how it behaves.

In [6]:
fig, ax = plt.subplots(figsize=(16, 10))
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=2))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
ax.xaxis.set_minor_locator(mdates.MonthLocator())
ax.plot(df['MCP'])
ax.set_title("Electricity Price over Time")
ax.set_xlabel('Time')
ax.set_ylabel('MCP')
plt.grid(linestyle='--', alpha=0.4)
plt.xlim(df.index.min() - pd.Timedelta(weeks=2), df.index.max() + pd.Timedelta(weeks=2))
Out[6]:
(735780.0, 736904.9583333334)

Some characteristics of the data can be noticed from this graph. First, the price spikes tend to cluster together and take a couple of days to return to the original behavior, as can be noticed at the end of Jan 2016, end of Dec 2016 and end of Jan 2017. Second, The price is quite volatile. Third, The variance in price seems to be lower after Jan 2017 than before it. Fourth, there seems to be an upward trend in the price over the years. Fifth, the spikes above 400 seem not to happen in the normal conditions and there should be some special reason for it to happen. We will investigate these outliers.

In [7]:
df[df['MCP'] > 400]  # Checking the extreme values
Out[7]:
MCP
Date_Hour
2016-12-20 11:00:00 450.00
2016-12-22 09:00:00 450.01
2016-12-22 10:00:00 450.01
2016-12-22 11:00:00 499.77
2016-12-22 13:00:00 499.78
2016-12-22 14:00:00 500.00
2016-12-22 15:00:00 500.00
2016-12-22 16:00:00 500.01
2016-12-22 17:00:00 1169.55
2016-12-22 18:00:00 500.05
2016-12-22 19:00:00 500.00
2016-12-22 20:00:00 460.06
2016-12-23 09:00:00 699.18
2016-12-23 10:00:00 999.99
2016-12-23 11:00:00 1159.18
2016-12-23 12:00:00 999.98
2016-12-23 13:00:00 799.99
2016-12-23 14:00:00 1899.99
2016-12-23 15:00:00 1500.05
2016-12-23 16:00:00 800.00
2016-12-23 17:00:00 799.99
2016-12-23 18:00:00 777.00
2016-12-23 19:00:00 449.99
2016-12-23 20:00:00 460.06
2017-09-11 14:00:00 967.15

We can see that most of the spikes happened between 20-23 Dec 2016. After investigation, it was found that this price spike took place because of a natural gas crisis that happened in Turkey during that time. You can learn more about the crisis from this link (in Turkish). There is no clear reason for the spike that happened in 11 Sep 2017. We will remove these extreme values to have better analysis results.

In [8]:
df.loc[df['MCP'] > 400, 'MCP'] = np.nan  # Removing the values
df.interpolate(inplace=True)  # Replacing them with interpolated values

To get a better idea about the long-term trend, let's plot the monthly averages of the price.

In [9]:
df_monthly = df.resample('M', label = 'right').mean()  # Creating monthly averages
df_monthly.rename_axis('Month', inplace=True)
df_monthly.head()
Out[9]:
MCP
Month
2015-07-31 136.587721
2015-08-31 154.655349
2015-09-30 160.691597
2015-10-31 137.628065
2015-11-30 133.688139
In [10]:
x = np.array(range(len(df_monthly.MCP)))
z = np.polyfit(x,df_monthly.MCP,deg=1)  # Calculating the coeffecients of the trendline
p = np.poly1d(z)
fig, ax = plt.subplots(figsize=(16,8))
ax.plot(df_monthly, label='Monthly Price Means')  # Plotting the monthly averages
ax.plot(df_monthly.index, p(x), color='r', label='Trendline')  # Plotting the trendline
ax.legend()
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=2))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
ax.xaxis.set_minor_locator(mdates.MonthLocator())
ax.set_title("Average Monthly Price over Time")
ax.set_xlabel('Time')
ax.set_ylabel('MCP')
plt.grid(linestyle='--', alpha=0.4)
print('The slope of the trend line is {:.2f} TL/Month or {:.2f} TL/Year'.format(z[0],z[0]*12))
The slope of the trend line is 1.50 TL/Month or 17.98 TL/Year

We can see that the average electricity price increases by 18 TL approximately each year. We also cannot readily notice any monthly seasonality from the figure. We can do this same analysis using median instead of mean because medians are more robust to outliers. So, it would give us a better estimate for the actual increase.

In [11]:
df_m_median = df.resample('M', label = 'right').median()
xm = np.array(range(len(df_m_median.MCP)))
zm = np.polyfit(xm,df_m_median.MCP,deg=1)  # Calculating the coeffecients of the trendline
pm = np.poly1d(zm)
fig, ax = plt.subplots(figsize=(16,8))
ax.plot(df_m_median, label='Monthly Price Medians')  # Plotting the monthly averages
ax.plot(df_m_median.index, pm(xm), color='r', label='Trendline')  # Plotting the trendline
ax.legend()
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=2))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
ax.xaxis.set_minor_locator(mdates.MonthLocator())
ax.set_title("Median Monthly Price over Time")
ax.set_xlabel('Time')
ax.set_ylabel('MCP')
plt.grid(linestyle='--', alpha=0.4)
print('The slope of the trend line is {:.2f} TL/Month or {:.2f} TL/Year'.format(zm[0],zm[0]*12))
The slope of the trend line is 1.46 TL/Month or 17.46 TL/Year

We see that it gives us pretty much the same result.

2. Analysis of the Price Behavior with Different Time Components

In this section, we check if the price level moves in a certain cycle during the day, week, month, or year.

In [12]:
# Decomposing time components to check their effect on the price
df['day_of_year'] = df.index.dayofyear
df['month'] = df.index.month
df['day_of_month'] = df.index.day
df['week_day'] = df.index.day_name()
df['hour'] = df.index.hour
df.head()
Out[12]:
MCP day_of_year month day_of_month week_day hour
Date_Hour
2015-07-15 00:00:00 151.98 196 7 15 Wednesday 0
2015-07-15 01:00:00 154.16 196 7 15 Wednesday 1
2015-07-15 02:00:00 134.99 196 7 15 Wednesday 2
2015-07-15 03:00:00 128.00 196 7 15 Wednesday 3
2015-07-15 04:00:00 124.99 196 7 15 Wednesday 4

A. Day of the year

In [13]:
# Does the day of the year affect the electricity price?
fig, ax = plt.subplots(figsize=(16,8))
ax.scatter(df['day_of_year'], df['MCP'])
ax.xaxis.set_major_locator(plt.MultipleLocator(10))
ax.xaxis.set_minor_locator(plt.MultipleLocator(2))
ax.yaxis.set_major_locator(plt.MultipleLocator(25))
ax.set_title("Electricity Price for Each Day of the Year")
ax.set_xlabel('Day of the Year')
ax.set_ylabel('MCP')
plt.xlim(df['day_of_year'].min() - 5, df['day_of_year'].max() + 5)
plt.ylim(-10)
Out[13]:
(-10, 418.9470883002207)

Near the end of January (late 20s in the graph) we see that the price have been over 130 TL consistently for the last three years, and has shown some extreme values. If we look back at the first figure (titled Electricity Price over Time), we see that near the end of January in 2016 and 2017 there where spikes at the price. However, this didn't happen in Jan 2018. Further investigation is needed to determine if this is a recurrent event or non-related events happened to have taken place in the same time in two consecutive years.
The spike at the end of the year in the current figure is caused by the aforementioned natural gas crisis.

B. Month

In [14]:
# Does the month of the year affect the electricity price?
fig, ax = plt.subplots(figsize=(16, 10))
sns.boxplot(x='month', y='MCP', data=df, ax=ax).set(xlabel='Month', title="Monthly Price Distributions")
Out[14]:
[Text(0.5,0,'Month'), Text(0.5,1,'Monthly Price Distributions')]

We can notice some seasonal pattern here where the price is lower in the spring (especially March) and fall (especially October and November) and higher in summer and winter. The rise in summer is probably caused by the use of air conditioners and the rise in winter is probably caused by the use of electric heaters.

C. Day of the month

In [15]:
# Does the day of the month affect the electricity price?
fig, ax = plt.subplots(figsize=(16, 10))
sns.boxplot(x='day_of_month', y='MCP', data=df, ax=ax).set(xlabel='Day of the Month',
                                                            title="Day of the Month Price Distributions")
Out[15]:
[Text(0.5,0,'Day of the Month'),
 Text(0.5,1,'Day of the Month Price Distributions')]

There doesn't seem to be any seasonal pattern within each month according to the graph.

D. Day of the week

In [16]:
# Does the weekday affect the electricity price?
fig, ax = plt.subplots(figsize=(8, 5))
sns.boxplot(x='week_day', y='MCP', data=df, ax=ax).set(xlabel='Day of the Week', title="Day of the Week Price Distributions")
Out[16]:
[Text(0.5,0,'Day of the Week'),
 Text(0.5,1,'Day of the Week Price Distributions')]

Price on weekends (especially Sunday) tend to be lower than weekdays because a large portion of the industrial sector doesn't work on these days, which causes a large drop in demand.

E. Hour of the day

In [17]:
# Does the hour of the day affect the electricity price?
fig, ax = plt.subplots(figsize=(16, 10))
sns.boxplot(x='hour', y='MCP', data=df, ax=ax).set(xlabel='Hour', title="Hourly Price Distributions")
Out[17]:
[Text(0.5,0,'Hour'), Text(0.5,1,'Hourly Price Distributions')]

There is a clear seasonal pattern within days showing lower prices at night and early morning and higher prices at daytime with more upward spikes. The reason for this is probably the increased demand from the industrial sector during daytime.

F. Heatmap of the year

In this section, we visualize the average daily price of electricity for each day of the year using a heatmap.

In [18]:
df_daily = df['MCP'].resample('D', label = 'left').mean()
df_daily = df_daily.to_frame()
df_daily['Month'] = df_daily.index.month
df_daily['Day of the Month'] = df_daily.index.day
price_pt = pd.pivot_table(df_daily, values='MCP', index='Month',columns='Day of the Month',aggfunc='mean')
plt.figure(figsize=(14,10))
sns.heatmap(price_pt, cmap = sns.cm.rocket_r)
Out[18]:
<matplotlib.axes._subplots.AxesSubplot at 0x27337e32b70>

As we mentioned before, the prices in March and October seem to be lower than other months. We can add to them the second half of February as well. The prices have fallen in the middle of July after reaching high levels, but that doesn't seem to be a recurrent theme.

3. Analyzing Autocorrelation

Correlation coefficient shows how much two variables are related. Correlation ranges from -1.0 to 1.0. Values closer to 1 indicate a strong positive relationship between variables, and values closer to -1 indicate a strong negative relationship between variables. A correlation of zero means that there is no linear relationship between the variables.
Autocorrelation is the correlation between a variable and a past instance of itself. It can be used to detect seasonality.
Partial autocorrelation is the correlation between two points that are separated by some number of periods n, but with the effect of the intervening correlations removed. In the following sections we will analyze the autocorrelation and partial autocorrelation between the hourly prices and also the correlation between the prices and some external variables.

A. Autocorrelation between all hourly prices

We will take the difference of the prices before we plot the autocorrelation function to eliminate the effect of the trend in the data.

In [19]:
# Cheching autocorrelation of differenced hourly prices
fig, ax = plt.subplots(figsize=(16, 10))
ax.xaxis.set_major_locator(plt.MultipleLocator(10))
ax.xaxis.set_minor_locator(plt.MultipleLocator(2))
ax.yaxis.set_major_locator(plt.MultipleLocator(0.1))
ax.set_title("Autocorrelation Between All Hourly Prices")
ax.set_xlabel('Lags')
ax.set_ylabel('Correlation Coefficient')
plt.grid(linestyle='--', alpha=0.4)
d = tsap.plot_acf(df['MCP'].diff().dropna(), lags=200, ax=ax)

We can notice the correlation between point h and h-24, which is expected. Because the price at a specific hour is affected by the price at the same hour the previous day.

In [20]:
# Cheching the partial autocorrelation of differenced hourly prices
fig, ax = plt.subplots(figsize=(16, 10))
ax.xaxis.set_major_locator(plt.MultipleLocator(10))
ax.xaxis.set_minor_locator(plt.MultipleLocator(2))
ax.yaxis.set_major_locator(plt.MultipleLocator(0.1))
ax.set_title("Partial Autocorrelation Between All Hourly Prices")
ax.set_xlabel('Lags')
ax.set_ylabel('Correlation Coefficient')
plt.grid(linestyle='--', alpha=0.4)
dummy = tsap.plot_pacf(df['MCP'].diff().dropna(), lags=200, ax=ax)

This plot tells us that not only h-24 but also h-48, h-72... independently affect current hour's price. Also there is a weak negative correlation between h and h-20, h-21, h-22, h-23 which is not intuitive.

B. Autocorrelation between the daily prices of a single hour

In this section, we analyze the daily prices of a single hour to see the relationship between daily prices and to see if there is any seasonality. Here we again difference prices to exclude the effects of the trend. I arbitrarily chose the hour 11am to do this analysis but it can be done on any hour.

In [21]:
p11 = df[df['hour'] == 11]['MCP']
p11.head()
Out[21]:
Date_Hour
2015-07-15 11:00:00    170.00
2015-07-16 11:00:00    144.95
2015-07-17 11:00:00     44.19
2015-07-18 11:00:00    100.00
2015-07-19 11:00:00     99.99
Name: MCP, dtype: float64

We can plot the time series to check for any patterns

In [22]:
p11.plot(figsize=(16,8)) # Plotting the time series
plt.grid(linestyle='--', alpha=0.4)

We can see that there is no trend. No cyclic pattern can also be noticed just by looking.

In [23]:
# Cheching autocorrelation of differenced prices (daily at 11:00 hours)
fig, ax = plt.subplots(figsize=(16, 10))
ax.xaxis.set_major_locator(plt.MultipleLocator(10))
ax.xaxis.set_minor_locator(plt.MultipleLocator(2))
ax.yaxis.set_major_locator(plt.MultipleLocator(0.1))
ax.set_title("Autocorrelation Between Daily Prices at 11am")
ax.set_xlabel('Lags')
ax.set_ylabel('Correlation Coefficient')
plt.grid(linestyle='--', alpha=0.4)
dummy = tsap.plot_acf(p11.diff().dropna(), lags=200, ax=ax)

We can see some seasonality here. There is weekly positive correlation (i.e. between d and d-7). Also, interestingly, there is moderate negative correlation between d and d-1, d-6.

In [24]:
# Cheching the partial autocorrelation of differenced prices (daily at 11:00 hours)
fig, ax = plt.subplots(figsize=(16, 10))
ax.xaxis.set_major_locator(plt.MultipleLocator(10))
ax.xaxis.set_minor_locator(plt.MultipleLocator(2))
ax.yaxis.set_major_locator(plt.MultipleLocator(0.1))
ax.set_title("Partial Autocorrelation Between Daily Prices at 11am")
ax.set_xlabel('Lags')
ax.set_ylabel('Correlation Coefficient')
plt.grid(linestyle='--', alpha=0.4)
dummy = tsap.plot_pacf(p11.diff().dropna(), lags=200, ax=ax)

This graph suggests that there is a weak negative correlation between the price in a day and the prices in the other days of the week except for d-1 and d-6 which have moderate negative correlation. The graph also show weak positive correlation between d and d-7.

4. Analyzing Correlation with Some External Variables

In this section, we will see if there is a relationship between the electricity prices and the USD exchange rate, the market clearing quantity, and the amount of electricity generated from wind power.

A. USD exchange rate

The exchange rate of the USD against the Turkish Lira has considerably increased in the recent years. As Turkey imports its needs of natural gas from outside, this increase is expected to have an effect on the electricity prices. In this section we will investigate if there is correlation between these two variables.

In [25]:
url = 'https://raw.githubusercontent.com/Mo-Saif/Turkish-Electricity-Market/master/MCP%20Exploratory%20Analysis%20Files/EVDS.csv'
exr = pd.read_csv(url, parse_dates = [0], dayfirst=True) # exr --> daily exchange rate
exr.dropna(inplace=True)
exr.set_index('Date', inplace=True)
df1 = pd.concat([exr,df_daily['MCP']], axis=1)
In [26]:
fig, ax = plt.subplots(figsize=(12, 7))
ax.xaxis.set_major_locator(plt.MultipleLocator(0.1))
ax.xaxis.set_minor_locator(plt.MultipleLocator(0.05))
ax.yaxis.set_major_locator(plt.MultipleLocator(10))
ax.yaxis.set_minor_locator(plt.MultipleLocator(5))
ax.set_title("Scatter Plot of MCP and USD Exchange Rate")
ax.set_xlabel('USD Exchange Rate')
ax.set_ylabel('MCP')
ax.scatter(df1['USD_YTL'],df1['MCP'])
plt.grid(linestyle='--', alpha=0.4)
corr1 = pearsonr(df1['USD_YTL'],df1['MCP'])
print("The correlation coeffecient between the USD exchange rate and MCP is {0:.2f} and the p-value is {1:.2e}".format(corr1[0],corr1[1]))
The correlation coeffecient between the USD exchange rate and MCP is 0.54 and the p-value is 1.17e-83

As we can see above the figure, there is a significant moderate correlation between the USD exchange rate and MCP. As we said before, this is because Turkey imports a large amount of its energy sources from outside. The prices of energy sources in TL would be a better predictor of the electricity price than the exchange rate because of their direct effect.
We can also see how they move together over time.

In [27]:
exr_monthly = exr.resample('M', label = 'right').mean()
fig, ax1 = plt.subplots(figsize=(16,8))
ins1 = ax1.plot(df_monthly, color='#1f77b4', label='Monthly Price Averages')  # Plotting the monthly averages
ax1.set_xlabel('Time')
ax1.set_ylabel('MCP', color='#1f77b4')
ax1.tick_params('y', colors='#1f77b4')
ax2 = ax1.twinx()
ins2 = ax2.plot(exr_monthly, color='r', label='Monthly Exchange Rate Averages')  # Plotting the trendline
ax2.set_ylabel('Exchange Rate', color='r')
ax2.tick_params('y', colors='r')
ax1.xaxis.set_major_locator(mdates.MonthLocator(interval=2))
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
ax1.xaxis.set_minor_locator(mdates.MonthLocator())
ax1.set_title("Average Monthly Price and Exchange Rate over Time")
plt.grid(linestyle='--', alpha=0.4)
ins = ins1+ins2
labs = [l.get_label() for l in ins]
ax1.legend(ins, labs, loc=0)
Out[27]:
<matplotlib.legend.Legend at 0x2733d589be0>

B. Market clearing quantity

The market clearing quantity is how much electricity was sold in the day-ahead market for that day and it is measured in MWh. It is expected that in hours where lots of electricity is sold the price will be higher. We cannot use this piece of information to predict the prices because both pieces of information are determined at the same time. Nevertheless, we can check the correlation to gain a better understanding about the price behavior.

In [28]:
url = 'https://raw.githubusercontent.com/Mo-Saif/Turkish-Electricity-Market/master/MCP%20Exploratory%20Analysis%20Files/clearingQuantity-15072015-15072018.csv'
cq = pd.read_csv(url, parse_dates = [[0,1]], dayfirst=True)
cq.dropna(inplace=True)
cq = cq.set_index(['Date_Hour'])
cq.rename(columns={"Clearing Quantity (MWh)" : "CQ"}, inplace=True)
# Converting CQ to numeric value
cq['CQ'] = cq['CQ'].str.replace(',','')
cq['CQ'] = cq['CQ'].apply(pd.to_numeric)
df2 = pd.concat([cq, df['MCP']], axis=1)
df2.head()
Out[28]:
CQ MCP
Date_Hour
2015-07-15 00:00:00 11065.4 151.98
2015-07-15 01:00:00 10959.6 154.16
2015-07-15 02:00:00 10644.1 134.99
2015-07-15 03:00:00 10214.1 128.00
2015-07-15 04:00:00 9386.0 124.99
In [29]:
fig, ax = plt.subplots(figsize=(18, 9))
ax.set_title("Scatter Plot of MCP and Clearing Quantity")
ax.set_xlabel('Clearing Quantity (MWh)')
ax.set_ylabel('MCP')
ax.scatter(df2['CQ'], df2['MCP'])
corr2 = pearsonr(df2['CQ'], df2['MCP'])
print("The correlation coeffecient between the USD exchange rate and MCP is {0:.2f} and the p-value is {1:.2e}".format(corr2[0],corr2[1]))
The correlation coeffecient between the USD exchange rate and MCP is 0.64 and the p-value is 0.00e+00

We can see a non linear positive relationship between the clearing quantity and price where the relationship flattens as the clearing quantity increases. The correlation test shows a significant moderate positive correlation as shown above the figure.

C. Wind power generation

Wind and solar power are the cheapest in the market. All the electricity generated from these sources is sold in the market with government guarantee. The percentage of electricity generated from solar power is small and data about it is not available. On the other hand, approximately 6% of electricity is generated from wind last year. It is expected that in hours where wind generation is high, the price decreases.

In [30]:
url = 'https://raw.githubusercontent.com/Mo-Saif/Turkish-Electricity-Market/master/MCP%20Exploratory%20Analysis%20Files/Final%20Daily%20Production%20Program%20Wind-15072015-15072018.csv'
wind = pd.read_csv(url, parse_dates = [[0,1]], dayfirst=True)
wind.dropna(inplace=True)
wind = wind.set_index(['Date_Hour'])
# Converting wind to numeric value
wind['Wind'] = wind['Wind'].str.replace(',','')
wind['Wind'] = wind['Wind'].apply(pd.to_numeric)
df3 = pd.concat([wind, df['MCP']], axis=1)
df3.head()
Out[30]:
Wind MCP
Date_Hour
2015-07-15 00:00:00 395.40 151.98
2015-07-15 01:00:00 393.11 154.16
2015-07-15 02:00:00 380.76 134.99
2015-07-15 03:00:00 370.10 128.00
2015-07-15 04:00:00 351.00 124.99
In [31]:
fig, ax = plt.subplots(figsize=(18, 9))
ax.set_title("Scatter Plot of MCP and Wind Generation")
ax.set_xlabel('Wind Generation (MWh)')
ax.set_ylabel('MCP')
ax.scatter(df3['Wind'], df3['MCP'])
corr3 = pearsonr(df3['Wind'], df3['MCP'])
print("The correlation coeffecient between wind production and market clearing price is {0:.2f} and the p-value is {1:.2e}".format(corr3[0],corr3[1]))
The correlation coeffecient between wind production and market clearing price is 0.10 and the p-value is 1.95e-62
The p-value for the correlation coefficient is effectively zero. Meaning that the correlation is statistically significant. The result of this correlation test is rather counterintuitive. It was expected that there would be a negative correlation between wind power generation and clearing price, because renewable resources are the cheapest in the market. Some factors may have contributed to this correlation value; namely, the amount of electricity produced from wind is not large enough to cause significant decrease in price (Electricity from wind constitutes around 6% of the total electricity generated). Secondly, wind production has been increasing in the recent years but also has the USD exchange rate, which has a greater effect on the electricity prices in the positive direction.

5. Conclusion

Although the electricity price is volatile and hard to predict, we can reach the following conclusions:

  • The minimum MCP is zero TL and the average price is 157 TL
  • The standard deviation of the price is 54 TL which is quite high
  • The variance of the prices has increased in the last year compared to before
  • Infrequent price spikes happen, where the price reaches very high levels
  • The average price of electricity increases by about 18 TL every year
  • Price tend to be lower in spring and fall than in summer and winter
  • It reaches its highest levels in December and January
  • The price is lower in weekends than weekdays and show less variance
  • Price is lower at night than during the day
  • The price of an hour is moderately positively correlated with the price at the same hour in the previous day
  • It is also weakly negatively correlated with the price at the following couple of hours of the previous day
  • The price of an hour is weakly positively correlated with the price of the same hour and the dame day in the previous week
  • It is also moderately negatively correlated with the price of the same hour in the previous day and six days before
  • The price is moderately positively correlated with the USD exchange rate and the market clearing quantity

Other factors can be checked to see if they have a greater effect on the price. Fuel prices are a good example.

6. About the Author

Mohammed Saif is a data scientist who also has experience in the mathematical optimization and scientific computing domains. Currently working on forecasting electricity load and price in the Turkish electricity market using machine learning algorithms as part of an internship at Algopoly R&D, Istanbul. He studied for his industrial engineering degree at the Middle East Technical University, Ankara, Turkey.
To contact Mohammed, you can send an email to m.saif.ka@gmail.com
Or through LinkedIn: www.linkedin.com/in/MoSaifK