Attempting to Predict the Price of Ethereum using Linear Regression
What are Cryptocurrencies?
I remember first hearing about cryptocurrencies back in 2013 when I was still a college student. The idea sounded interesting at the time but I didn’t understand much about the technology aside from it being a new form of “internet money”.
When the pandemic came along, I suddenly had a lot more free time on my hands than usual, so I decided to use this time to brush up on my investment knowledge. One day while browsing through finance videos on YouTube, I stumbled across Ethereum and immediately became fascinated by it.
After spending countless hours going down the rabbit hole, I realized that Ethereum was much more than just “internet money”, it’s a fully decentralized and open-source platform which gives its users complete transparency regarding what is happening in the network. Additionally, a wide variety of applications can be built on top of the network such as databases, games, financial instruments and smart contracts.
One of the biggest drawbacks of cryptocurrencies at the moment is the fact that they are highly volatile. It is not uncommon to see double-digit price fluctuations within the span of a single day. As a current data science student with an interest in this emerging market, I wanted to put my newly-acquired skills to the test in an attempt to predict the future price of Ethereum.
Gathering the Information and EDA
The data for this exercise was collected from Yahoo Finance. There is a Python library called yfinance which makes this process super easy.
# Importing the libraryimport yfinance as yf# Creating a variable for the data that is being pulleddata = yf.download(tickers='ETH-USD',
start='2017-01-01',
end='2021-07-31',
progress=False)
The search resulted in 1,669 records without any missing values.
Building a Linear Regression Model
Since the goal of this exercise was to be able to predict the future prices of Ethereum, I created a new variable called ‘projection’ and set it equal to the number of days in the future that I wanted my model to predict upon.
# Creating a variable for predicting 'n' days out into the futureprojection = 30# Creating a new column for the models predictionsdata['Prediction'] = data[['Close']].shift(-projection)
Next, I created variables for the independent and dependent data sets as NumPy arrays and split the data into 80% training and 20% testing sets.
# Creating independent (X) and dependent data sets (y)X = np.array(data[['Close']])
y = np.array(data[['Prediction'].values# Removing the last 30 rowsX = X[:-projection]
y = y[:-projection]# Splitting the data into 80% training and 20% testingX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2)
I then built a Linear Regression model and fit it using the training data.
# Instantiating a Linear Regressionlr = LinearRegression()# Training the modellr.fit(X_train, y_train)
Now it’s time to test how well the model does by using the .score method which tells us how well the regression model approximate’s the actual data points.
# Testing the model using the .score method which returns the r^2 valuelr_confidence = lr.score(X_test, y_test)
The model returned an r² score of 0.845 which means that 84.5% of the variance in predicted price can be explained by closing price. Now let’s take a look at the results!
# Creating a variable for our future price projections and setting it equal to the last 30 days from our original data setX_projection = np.array(data[['Close']])[-projection:]# Printing out the Linear Regressions predictions for the next 30 dayslr_prediction = lr.predict(X_projection)
print(lr_prediction)