In this analysis we try to understand how the price of electricity in the wholesale Turkish electricity market behaves.
The data that we'll look at consists of the hourly prices of electricity in the day-ahead market for three years (15/07/2015 - 15/07/2018). The measurement unit of the price is TL/MWh where TL is an abbreviation for the local currency Turkish Lira. The price of the day-ahead market is called the Market clearing price with the abbreviation MCP, and is called Piyasa Takas Fiyatı in Turkish with the abbreviation PTF. We will use the English abbreviation MCP to refer to the price in this analysis. If you want to learn more about the Turkish electricity market, you can read my report which gives a good overview of the market.
I preferred not to go back in time more than three years because considerable changes took place (and are still taking place) in the electricity market structure as well as the country's economy, so the insights gained from old data my not be valid for predicting the future.
In this report, before each analysis I included a brief introduction to the analysis and after the analysis I added a short discussion of the insights we gain from it. The size of the figures in this report is quite large because the number of data points plotted is large and the graphs won't be clear otherwise.
For a summary of the results of all the analyses you can check the conclusion section.
# Importing the needed libraries
import warnings
warnings.filterwarnings('ignore')
from scipy.stats.stats import pearsonr
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
import statsmodels.graphics.tsaplots as tsap
%matplotlib inline
# Loading the dataset
url = 'https://raw.githubusercontent.com/Mo-Saif/Turkish-Electricity-Market/master/MCP%20Exploratory%20Analysis%20Files/MCP-15072015-15072018.csv'
df = pd.read_csv(url, parse_dates = [[0,1]], dayfirst=True)
print(df.head(),'\n\n')
print(df.info())
From the information about the data frame object, we see that the price column is not recognized as a numeric data type. We will convert it in the next cell.
df = df.set_index(['Date_Hour']) # Setting the index to the datetime column
df.rename(columns={"MCP (TL/MWh)" : "MCP"}, inplace=True) # Renaming the column with a short name
df['MCP'] = df['MCP'].str.replace(',','') # Removing the commas
df['MCP'] = df['MCP'].apply(pd.to_numeric) # Converting MCP to numeric value
df.head() # A view of its final form
Looks like we're all set
df.describe() # Summary statistics
As we can see, the average price is 156.8 TL with a standard deviation of 53.9 meaning that 99.7% of the data lie between 156.8 ± 3*53.9 = [0, 318.5]. This range is very large if a company wants to predict electricity in a reasonably accurate manner.
The maximum price is 1900 TL which is very far from the mean suggesting the presence of outliers and high volatility of the price.
The minimum price is zero, which means that for some hours the electricity was free and also that the prices doesn't reach negative levels. This is partly because many major electricity consumers (e.g. large factories) and distributors had the opportunity to deal with the public sector producer (EÃœAÅž/TETAÅž) at a better rate than the market. Since demand side was not in the market, supply side (renewables included) matching price was very low. That structure is slowly dissolving and it is still in progress. We can check if this is a frequent occasion.
df[df['MCP'] == 0].count()
We see that the price was set to zero for 117 different hours in the past 3 years, which means that this does not happen frequently.
Now let's plot the price over time and see how it behaves.
fig, ax = plt.subplots(figsize=(16, 10))
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=2))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
ax.xaxis.set_minor_locator(mdates.MonthLocator())
ax.plot(df['MCP'])
ax.set_title("Electricity Price over Time")
ax.set_xlabel('Time')
ax.set_ylabel('MCP')
plt.grid(linestyle='--', alpha=0.4)
plt.xlim(df.index.min() - pd.Timedelta(weeks=2), df.index.max() + pd.Timedelta(weeks=2))
Some characteristics of the data can be noticed from this graph. First, the price spikes tend to cluster together and take a couple of days to return to the original behavior, as can be noticed at the end of Jan 2016, end of Dec 2016 and end of Jan 2017. Second, The price is quite volatile. Third, The variance in price seems to be lower after Jan 2017 than before it. Fourth, there seems to be an upward trend in the price over the years. Fifth, the spikes above 400 seem not to happen in the normal conditions and there should be some special reason for it to happen. We will investigate these outliers.
df[df['MCP'] > 400] # Checking the extreme values
We can see that most of the spikes happened between 20-23 Dec 2016. After investigation, it was found that this price spike took place because of a natural gas crisis that happened in Turkey during that time. You can learn more about the crisis from this link (in Turkish). There is no clear reason for the spike that happened in 11 Sep 2017. We will remove these extreme values to have better analysis results.
df.loc[df['MCP'] > 400, 'MCP'] = np.nan # Removing the values
df.interpolate(inplace=True) # Replacing them with interpolated values
To get a better idea about the long-term trend, let's plot the monthly averages of the price.
df_monthly = df.resample('M', label = 'right').mean() # Creating monthly averages
df_monthly.rename_axis('Month', inplace=True)
df_monthly.head()
x = np.array(range(len(df_monthly.MCP)))
z = np.polyfit(x,df_monthly.MCP,deg=1) # Calculating the coeffecients of the trendline
p = np.poly1d(z)
fig, ax = plt.subplots(figsize=(16,8))
ax.plot(df_monthly, label='Monthly Price Means') # Plotting the monthly averages
ax.plot(df_monthly.index, p(x), color='r', label='Trendline') # Plotting the trendline
ax.legend()
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=2))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
ax.xaxis.set_minor_locator(mdates.MonthLocator())
ax.set_title("Average Monthly Price over Time")
ax.set_xlabel('Time')
ax.set_ylabel('MCP')
plt.grid(linestyle='--', alpha=0.4)
print('The slope of the trend line is {:.2f} TL/Month or {:.2f} TL/Year'.format(z[0],z[0]*12))
We can see that the average electricity price increases by 18 TL approximately each year. We also cannot readily notice any monthly seasonality from the figure. We can do this same analysis using median instead of mean because medians are more robust to outliers. So, it would give us a better estimate for the actual increase.
df_m_median = df.resample('M', label = 'right').median()
xm = np.array(range(len(df_m_median.MCP)))
zm = np.polyfit(xm,df_m_median.MCP,deg=1) # Calculating the coeffecients of the trendline
pm = np.poly1d(zm)
fig, ax = plt.subplots(figsize=(16,8))
ax.plot(df_m_median, label='Monthly Price Medians') # Plotting the monthly averages
ax.plot(df_m_median.index, pm(xm), color='r', label='Trendline') # Plotting the trendline
ax.legend()
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=2))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
ax.xaxis.set_minor_locator(mdates.MonthLocator())
ax.set_title("Median Monthly Price over Time")
ax.set_xlabel('Time')
ax.set_ylabel('MCP')
plt.grid(linestyle='--', alpha=0.4)
print('The slope of the trend line is {:.2f} TL/Month or {:.2f} TL/Year'.format(zm[0],zm[0]*12))
We see that it gives us pretty much the same result.
In this section, we check if the price level moves in a certain cycle during the day, week, month, or year.
# Decomposing time components to check their effect on the price
df['day_of_year'] = df.index.dayofyear
df['month'] = df.index.month
df['day_of_month'] = df.index.day
df['week_day'] = df.index.day_name()
df['hour'] = df.index.hour
df.head()
# Does the day of the year affect the electricity price?
fig, ax = plt.subplots(figsize=(16,8))
ax.scatter(df['day_of_year'], df['MCP'])
ax.xaxis.set_major_locator(plt.MultipleLocator(10))
ax.xaxis.set_minor_locator(plt.MultipleLocator(2))
ax.yaxis.set_major_locator(plt.MultipleLocator(25))
ax.set_title("Electricity Price for Each Day of the Year")
ax.set_xlabel('Day of the Year')
ax.set_ylabel('MCP')
plt.xlim(df['day_of_year'].min() - 5, df['day_of_year'].max() + 5)
plt.ylim(-10)
Near the end of January (late 20s in the graph) we see that the price have been over 130 TL consistently for the last three years, and has shown some extreme values. If we look back at the first figure (titled Electricity Price over Time), we see that near the end of January in 2016 and 2017 there where spikes at the price. However, this didn't happen in Jan 2018. Further investigation is needed to determine if this is a recurrent event or non-related events happened to have taken place in the same time in two consecutive years.
The spike at the end of the year in the current figure is caused by the aforementioned natural gas crisis.
# Does the month of the year affect the electricity price?
fig, ax = plt.subplots(figsize=(16, 10))
sns.boxplot(x='month', y='MCP', data=df, ax=ax).set(xlabel='Month', title="Monthly Price Distributions")
We can notice some seasonal pattern here where the price is lower in the spring (especially March) and fall (especially October and November) and higher in summer and winter. The rise in summer is probably caused by the use of air conditioners and the rise in winter is probably caused by the use of electric heaters.
# Does the day of the month affect the electricity price?
fig, ax = plt.subplots(figsize=(16, 10))
sns.boxplot(x='day_of_month', y='MCP', data=df, ax=ax).set(xlabel='Day of the Month',
title="Day of the Month Price Distributions")
There doesn't seem to be any seasonal pattern within each month according to the graph.
# Does the weekday affect the electricity price?
fig, ax = plt.subplots(figsize=(8, 5))
sns.boxplot(x='week_day', y='MCP', data=df, ax=ax).set(xlabel='Day of the Week', title="Day of the Week Price Distributions")
Price on weekends (especially Sunday) tend to be lower than weekdays because a large portion of the industrial sector doesn't work on these days, which causes a large drop in demand.
# Does the hour of the day affect the electricity price?
fig, ax = plt.subplots(figsize=(16, 10))
sns.boxplot(x='hour', y='MCP', data=df, ax=ax).set(xlabel='Hour', title="Hourly Price Distributions")
There is a clear seasonal pattern within days showing lower prices at night and early morning and higher prices at daytime with more upward spikes. The reason for this is probably the increased demand from the industrial sector during daytime.
In this section, we visualize the average daily price of electricity for each day of the year using a heatmap.
df_daily = df['MCP'].resample('D', label = 'left').mean()
df_daily = df_daily.to_frame()
df_daily['Month'] = df_daily.index.month
df_daily['Day of the Month'] = df_daily.index.day
price_pt = pd.pivot_table(df_daily, values='MCP', index='Month',columns='Day of the Month',aggfunc='mean')
plt.figure(figsize=(14,10))
sns.heatmap(price_pt, cmap = sns.cm.rocket_r)
As we mentioned before, the prices in March and October seem to be lower than other months. We can add to them the second half of February as well. The prices have fallen in the middle of July after reaching high levels, but that doesn't seem to be a recurrent theme.
Correlation coefficient shows how much two variables are related. Correlation ranges from -1.0 to 1.0. Values closer to 1 indicate a strong positive relationship between variables, and values closer to -1 indicate a strong negative relationship between variables. A correlation of zero means that there is no linear relationship between the variables.
Autocorrelation is the correlation between a variable and a past instance of itself. It can be used to detect seasonality.
Partial autocorrelation is the correlation between two points that are separated by some number of periods n, but with the effect of the intervening correlations removed. In the following sections we will analyze the autocorrelation and partial autocorrelation between the hourly prices and also the correlation between the prices and some external variables.
We will take the difference of the prices before we plot the autocorrelation function to eliminate the effect of the trend in the data.
# Cheching autocorrelation of differenced hourly prices
fig, ax = plt.subplots(figsize=(16, 10))
ax.xaxis.set_major_locator(plt.MultipleLocator(10))
ax.xaxis.set_minor_locator(plt.MultipleLocator(2))
ax.yaxis.set_major_locator(plt.MultipleLocator(0.1))
ax.set_title("Autocorrelation Between All Hourly Prices")
ax.set_xlabel('Lags')
ax.set_ylabel('Correlation Coefficient')
plt.grid(linestyle='--', alpha=0.4)
d = tsap.plot_acf(df['MCP'].diff().dropna(), lags=200, ax=ax)
We can notice the correlation between point h and h-24, which is expected. Because the price at a specific hour is affected by the price at the same hour the previous day.
# Cheching the partial autocorrelation of differenced hourly prices
fig, ax = plt.subplots(figsize=(16, 10))
ax.xaxis.set_major_locator(plt.MultipleLocator(10))
ax.xaxis.set_minor_locator(plt.MultipleLocator(2))
ax.yaxis.set_major_locator(plt.MultipleLocator(0.1))
ax.set_title("Partial Autocorrelation Between All Hourly Prices")
ax.set_xlabel('Lags')
ax.set_ylabel('Correlation Coefficient')
plt.grid(linestyle='--', alpha=0.4)
dummy = tsap.plot_pacf(df['MCP'].diff().dropna(), lags=200, ax=ax)
This plot tells us that not only h-24 but also h-48, h-72... independently affect current hour's price. Also there is a weak negative correlation between h and h-20, h-21, h-22, h-23 which is not intuitive.
In this section, we analyze the daily prices of a single hour to see the relationship between daily prices and to see if there is any seasonality. Here we again difference prices to exclude the effects of the trend. I arbitrarily chose the hour 11am to do this analysis but it can be done on any hour.
p11 = df[df['hour'] == 11]['MCP']
p11.head()
We can plot the time series to check for any patterns
p11.plot(figsize=(16,8)) # Plotting the time series
plt.grid(linestyle='--', alpha=0.4)
We can see that there is no trend. No cyclic pattern can also be noticed just by looking.
# Cheching autocorrelation of differenced prices (daily at 11:00 hours)
fig, ax = plt.subplots(figsize=(16, 10))
ax.xaxis.set_major_locator(plt.MultipleLocator(10))
ax.xaxis.set_minor_locator(plt.MultipleLocator(2))
ax.yaxis.set_major_locator(plt.MultipleLocator(0.1))
ax.set_title("Autocorrelation Between Daily Prices at 11am")
ax.set_xlabel('Lags')
ax.set_ylabel('Correlation Coefficient')
plt.grid(linestyle='--', alpha=0.4)
dummy = tsap.plot_acf(p11.diff().dropna(), lags=200, ax=ax)
We can see some seasonality here. There is weekly positive correlation (i.e. between d and d-7). Also, interestingly, there is moderate negative correlation between d and d-1, d-6.
# Cheching the partial autocorrelation of differenced prices (daily at 11:00 hours)
fig, ax = plt.subplots(figsize=(16, 10))
ax.xaxis.set_major_locator(plt.MultipleLocator(10))
ax.xaxis.set_minor_locator(plt.MultipleLocator(2))
ax.yaxis.set_major_locator(plt.MultipleLocator(0.1))
ax.set_title("Partial Autocorrelation Between Daily Prices at 11am")
ax.set_xlabel('Lags')
ax.set_ylabel('Correlation Coefficient')
plt.grid(linestyle='--', alpha=0.4)
dummy = tsap.plot_pacf(p11.diff().dropna(), lags=200, ax=ax)
This graph suggests that there is a weak negative correlation between the price in a day and the prices in the other days of the week except for d-1 and d-6 which have moderate negative correlation. The graph also show weak positive correlation between d and d-7.
In this section, we will see if there is a relationship between the electricity prices and the USD exchange rate, the market clearing quantity, and the amount of electricity generated from wind power.
The exchange rate of the USD against the Turkish Lira has considerably increased in the recent years. As Turkey imports its needs of natural gas from outside, this increase is expected to have an effect on the electricity prices. In this section we will investigate if there is correlation between these two variables.
url = 'https://raw.githubusercontent.com/Mo-Saif/Turkish-Electricity-Market/master/MCP%20Exploratory%20Analysis%20Files/EVDS.csv'
exr = pd.read_csv(url, parse_dates = [0], dayfirst=True) # exr --> daily exchange rate
exr.dropna(inplace=True)
exr.set_index('Date', inplace=True)
df1 = pd.concat([exr,df_daily['MCP']], axis=1)
fig, ax = plt.subplots(figsize=(12, 7))
ax.xaxis.set_major_locator(plt.MultipleLocator(0.1))
ax.xaxis.set_minor_locator(plt.MultipleLocator(0.05))
ax.yaxis.set_major_locator(plt.MultipleLocator(10))
ax.yaxis.set_minor_locator(plt.MultipleLocator(5))
ax.set_title("Scatter Plot of MCP and USD Exchange Rate")
ax.set_xlabel('USD Exchange Rate')
ax.set_ylabel('MCP')
ax.scatter(df1['USD_YTL'],df1['MCP'])
plt.grid(linestyle='--', alpha=0.4)
corr1 = pearsonr(df1['USD_YTL'],df1['MCP'])
print("The correlation coeffecient between the USD exchange rate and MCP is {0:.2f} and the p-value is {1:.2e}".format(corr1[0],corr1[1]))
As we can see above the figure, there is a significant moderate correlation between the USD exchange rate and MCP. As we said before, this is because Turkey imports a large amount of its energy sources from outside. The prices of energy sources in TL would be a better predictor of the electricity price than the exchange rate because of their direct effect.
We can also see how they move together over time.
exr_monthly = exr.resample('M', label = 'right').mean()
fig, ax1 = plt.subplots(figsize=(16,8))
ins1 = ax1.plot(df_monthly, color='#1f77b4', label='Monthly Price Averages') # Plotting the monthly averages
ax1.set_xlabel('Time')
ax1.set_ylabel('MCP', color='#1f77b4')
ax1.tick_params('y', colors='#1f77b4')
ax2 = ax1.twinx()
ins2 = ax2.plot(exr_monthly, color='r', label='Monthly Exchange Rate Averages') # Plotting the trendline
ax2.set_ylabel('Exchange Rate', color='r')
ax2.tick_params('y', colors='r')
ax1.xaxis.set_major_locator(mdates.MonthLocator(interval=2))
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
ax1.xaxis.set_minor_locator(mdates.MonthLocator())
ax1.set_title("Average Monthly Price and Exchange Rate over Time")
plt.grid(linestyle='--', alpha=0.4)
ins = ins1+ins2
labs = [l.get_label() for l in ins]
ax1.legend(ins, labs, loc=0)
The market clearing quantity is how much electricity was sold in the day-ahead market for that day and it is measured in MWh. It is expected that in hours where lots of electricity is sold the price will be higher. We cannot use this piece of information to predict the prices because both pieces of information are determined at the same time. Nevertheless, we can check the correlation to gain a better understanding about the price behavior.
url = 'https://raw.githubusercontent.com/Mo-Saif/Turkish-Electricity-Market/master/MCP%20Exploratory%20Analysis%20Files/clearingQuantity-15072015-15072018.csv'
cq = pd.read_csv(url, parse_dates = [[0,1]], dayfirst=True)
cq.dropna(inplace=True)
cq = cq.set_index(['Date_Hour'])
cq.rename(columns={"Clearing Quantity (MWh)" : "CQ"}, inplace=True)
# Converting CQ to numeric value
cq['CQ'] = cq['CQ'].str.replace(',','')
cq['CQ'] = cq['CQ'].apply(pd.to_numeric)
df2 = pd.concat([cq, df['MCP']], axis=1)
df2.head()
fig, ax = plt.subplots(figsize=(18, 9))
ax.set_title("Scatter Plot of MCP and Clearing Quantity")
ax.set_xlabel('Clearing Quantity (MWh)')
ax.set_ylabel('MCP')
ax.scatter(df2['CQ'], df2['MCP'])
corr2 = pearsonr(df2['CQ'], df2['MCP'])
print("The correlation coeffecient between the USD exchange rate and MCP is {0:.2f} and the p-value is {1:.2e}".format(corr2[0],corr2[1]))
We can see a non linear positive relationship between the clearing quantity and price where the relationship flattens as the clearing quantity increases. The correlation test shows a significant moderate positive correlation as shown above the figure.
Wind and solar power are the cheapest in the market. All the electricity generated from these sources is sold in the market with government guarantee. The percentage of electricity generated from solar power is small and data about it is not available. On the other hand, approximately 6% of electricity is generated from wind last year. It is expected that in hours where wind generation is high, the price decreases.
url = 'https://raw.githubusercontent.com/Mo-Saif/Turkish-Electricity-Market/master/MCP%20Exploratory%20Analysis%20Files/Final%20Daily%20Production%20Program%20Wind-15072015-15072018.csv'
wind = pd.read_csv(url, parse_dates = [[0,1]], dayfirst=True)
wind.dropna(inplace=True)
wind = wind.set_index(['Date_Hour'])
# Converting wind to numeric value
wind['Wind'] = wind['Wind'].str.replace(',','')
wind['Wind'] = wind['Wind'].apply(pd.to_numeric)
df3 = pd.concat([wind, df['MCP']], axis=1)
df3.head()
fig, ax = plt.subplots(figsize=(18, 9))
ax.set_title("Scatter Plot of MCP and Wind Generation")
ax.set_xlabel('Wind Generation (MWh)')
ax.set_ylabel('MCP')
ax.scatter(df3['Wind'], df3['MCP'])
corr3 = pearsonr(df3['Wind'], df3['MCP'])
print("The correlation coeffecient between wind production and market clearing price is {0:.2f} and the p-value is {1:.2e}".format(corr3[0],corr3[1]))
Although the electricity price is volatile and hard to predict, we can reach the following conclusions:
Other factors can be checked to see if they have a greater effect on the price. Fuel prices are a good example.
Mohammed Saif is a data scientist who also has experience in the mathematical optimization and scientific computing domains. Currently working on forecasting electricity load and price in the Turkish electricity market using machine learning algorithms as part of an internship at Algopoly R&D, Istanbul. He studied for his industrial engineering degree at the Middle East Technical University, Ankara, Turkey.
To contact Mohammed, you can send an email to m.saif.ka@gmail.com
Or through LinkedIn: www.linkedin.com/in/MoSaifK