Finalize project: Complete paper.tex and README.md.

Lenr4 · Lenr4 · commit 5f0e4bc2b75b · 2025-03-05T18:54:16.000+01:00
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# Apple Stock AR-Process Analysis & Multistep Forecasting
+# Autoregressive Model Analysis and Multistep Forecast of Apple Stock Data
 
 ## Table of Contents
 
@@ -11,26 +11,27 @@ ______________________________________________________________________
 
 ## Overview
 
-In this project, I analyzed apple stock data using time series econometrics methods. The
-final result of my project is a latex file that roughly describes the analysis steps and
-the charts and to what extent it is possible to use/interpret my results.
+In this project, Apple stock data was analyzed using time series econometrics methods.
+The final output of this project is the LaTeX file **paper.pdf** that roughly describes
+the analysis steps as well as the figures/tables and deals with the question to what
+extent it is possible to interpret the results.
 
 In this project, I:
 
 - fit several AR processes to Apple's historical stock data.
-- compare model performance to identify the best fitting AR process.
-- evaluate the ability of the best model to perform multistep forecasts.
-- investigate the extent to which it is possible to use the AR process for analysis.
-- provide analysis and plots to visualize both the model fit and forecasting
+- compared model performance to identify the best fitting AR process.
+- evaluated the ability of the best model to perform multistep forecasts.
+- investigated the extent to which it is possible to use the AR process for analysis.
+- provided analysis and plots to visualize both the model fit and forecasting
   performance.
 
 ______________________________________________________________________
 
 ## System Prerequisites
 
-To make sure that the project works on your machine you need to have installed *Python*,
-*a modern LaTeX distribution*, *Git*, and if applicable a *text editor*. For a more
-detailed explanation see the
+To make sure that the project works on your device it is necessary to have installed
+*Python*, *a modern LaTeX distribution*, *Git*, and if applicable a *text editor*. For a
+more detailed explanation see the
 [documentation](https://econ-project-templates.readthedocs.io/en/stable/getting_started/index.html).
 
 ______________________________________________________________________
@@ -43,7 +44,7 @@ First one needs to clone the repository:
 git clone https://github.com/iame-uni-bonn/final-project-Lenr4.git
 ```
 
-Next navigate to the project root and create and activate the environment:
+Next, navigate to the project root and create and activate the environment:
 
 ```bash
 mamba env create lennart_epp
@@ -56,7 +57,7 @@ After the environment is activated, one can run the project by:
 pytask
 ```
 
-> 🛑 **Caution**: If you had trouble with kaleido on windows you need to use this
+> 🛑 **Caution**: If there were any trouble with kaleido on windows you need to use this
 > [workaround](https://effective-programming-practices.vercel.app/plotting/why_plotly_prerequisites/objectives_materials.html#windows-workaround):
 >
 > ```bash
@@ -71,22 +72,22 @@ The Project is structured into three different parts.
 
 - **bld**: The Build directory cointaing all output files.
 
-  - **plots**: top 3 AR models for fitting(1 step forecast), multistep forecast, ACF all
-    as interactive html and pdf
-  - **forecasts**: 10 step forecast using AR(1) as pkl file
-  - **data**: cleaned apple data as pkl file
-  - **memory**: pkl file of ACF, and tex files of *Hurst* and *ADF* statistics
-  - **models** pkl file of all AR models and tex file with top model statistics
+  - **plots**: Top 3 AR models for fitting (1 step forecast), ACF, Multistep forecast;
+    all as interactive .html and .pdf files.
+  - **forecasts**: Multistep forecast using AR(1) as .pkl file.
+  - **data**: Cleaned Apple data as .pkl file.
+  - **memory**: .pkl file of ACF and .tex files of *Hurst* and *ADF* statistics.
+  - **models**: .pkl file of all AR models and .tex file with top model statistics.
 
-- **src**: The source directory containing all python files needed for the analysis.
+- **src**: The Source directory containing all python files needed for the analysis.
 
   - **data**: CSV file containing the raw data for reproducibilty.
   - **data_management**: Python files for cleaning and downloading the data from
     [Yahoo Finance](https://de.finance.yahoo.com/).
   - **analysis**: Python files which analyse the data.
   - **final**: Python files which plot the results.
 
-- **tests**: The test directory containing all python files which are used for testing.
+- **tests**: The Test directory containing all python files which are used for testing.
 
   - **data_management**: Python files for testing the data management steps.
   - **analysis**: Python files for testing the analysis steps.
diff --git a/documents/paper.lof b/documents/paper.lof
@@ -1,3 +1,3 @@
-\contentsline {figure}{\numberline {1}{\ignorespaces Comparison of the top-performing AR models.}}{1}{}%
-\contentsline {figure}{\numberline {2}{\ignorespaces Autocorrelation Function (ACF) of the differenced time series with 95\% confidence bands.}}{2}{}%
-\contentsline {figure}{\numberline {3}{\ignorespaces Multi-step forecast for Apple stock price.}}{3}{}%
+\contentsline {figure}{\numberline {1}{\ignorespaces Comparison of the top-performing AR models}}{1}{}%
+\contentsline {figure}{\numberline {2}{\ignorespaces Autocorrelation Function (ACF) of the differenced time series with 95\% confidence bands}}{2}{}%
+\contentsline {figure}{\numberline {3}{\ignorespaces Multi-step forecast for Apple stock price}}{3}{}%
diff --git a/documents/paper.tex b/documents/paper.tex
@@ -37,8 +37,8 @@
 
 \begin{document}
 
-\title{Apple Stock AR-Process Analysis and Multistep Forecasting}
-\author{Lennart Epp}
+\title{Autoregressive Model Analysis and Multistep Forecast of Apple Stock Data}
+\author{Lennart Lülsdorf}
 \date{\today}
 
 \maketitle
@@ -57,73 +57,72 @@
 \section{Introduction}
 
 In this project, Apple stock data is analyzed using time series econometrics methods. This
-paper is structured as follows: first, I present the Top 3 best-fitting AR(p) models for approximating
-i.e for one step forecasting of the Apple stock data. So for this forecast/approximation, only values
-of the original data namley the close price of the apple stock are used as in input for the forecast.\\
-Then, I proceed with an analysis to what extent it is possible to fit an AR(p) process on Apple.
-For this I checked if the differenced Apple data is stationary and after that I concern whether the
-time series has long or short memory.\\
-Lastly, I will present a multi-step forecast, where each forecasted value is used as input
-for the next prediction. I will analyze why this approach fails to capture Apple stock dynamics
-beyond a one-step forecast and discuss the overall feasibility of fitting an AR model to Apple stock data.
+project is structured as follows: first, the three best autoregressive AR(p) models for approximating
+i.e. for one step forecasting of the Apple stock data are presented. Therefore, for this approximation, only values
+of the original data, namley the close price of the apple stock, are used as an input for the forecast.\\
+This was followed by an analysis, to what extent it is possible to fit an AR(p) process on Apple.
+To do this, the stationarity of the differenced Apple data was examined and also whether the time series has long or short memory.\\
+Finally, a multi-step forecast is presented, where each forecasted value is used as an input
+for the next prediction. The failure of this approach to capture Apple stock dynamics beyond a one-step forecast
+will be analyzed and a discussion of the overall feasibility to fit an AR model to Apple stock data completes the project.
 
 \section{Top AR(p) Approximations}
 
-The following plots shows the top 3 AR model fits in the sense of the Akaike Information
-Criterion. It also shows the residual plot of the top 3 AR models. The figure shows that
-one-step forecasts closely follow the original data. However, the variance of the residuals
-increases over time, suggesting that the model struggles to maintainforecast accuracy over
+In the first graph of Figure 1 the three best AR model fits in the sense of the Akaike Information
+Criterion (AIC) are shown. The second graph visualizes the residual plot of these AR models. Overall, the figure displays
+that one-step forecasts closely follow the original data. However, the variance of the residuals
+increases over time, suggesting that the model struggles to maintain forecast accuracy over
 longer periods.
 
 \begin{figure}[H]
     \centering
     \includegraphics[scale=1.8, width=\textwidth, trim=10 10 10 10, clip]
     {../bld/plots/top_ar_models_plot.pdf}
-    \caption{Comparison of the top-performing AR models.}
+    \caption{Comparison of the top-performing AR models}
     \label{fig:top_ar_models}
 \end{figure}
 
-\noindent In the following table, you see the metrics of the best AR(P) processes in terms of their AIC.
-First, notice that the AR(p) was fitted on the differenced close price since the P-Value of
-the Augmented Dickey-Fuller test suggested differencing, indicating that the original close
-price is likely not stationary.\\
-Therefore, the AR coefficients had to be integrated to approximate the original time series,
-which could lead to accumulated errors. So given the AIC as you can see in the table, the AR(1) process fitted
-Apple best. In total I tested p values up to 12.
+\noindent Table 1 contains the metrics of the best AR(p) processes in terms of their AIC.
+Notice that the AR(p) model was fitted on the differenced time series of the close price from Apple stock data,
+since the p-value of the Augmented Dickey-Fuller (ADF) test showed that the original close price time series
+is nonstationary.\\
+Therefore, the AR coefficients had to be integrated, to approximate the original time series,
+which increases the probability of accumulated errors. So, given the AIC the AR(1) process fitted
+Apple best (Table 1). In total, values for p from 1 to 12 were tested.
 
 \input{../bld/models/top_models.tex}
 
 \section{Memory Analysis}
 
-In this section, I check to what extent it is possible to fit an AR model on the differenced data.
-Therefore, I first used the ADF test to check if the differenced close price is stationary.
-The results are shown in the following table, indicating that the differenced series is likely
-stationary. This is a necessary prerequest for fitting AR models.
+This section addresses the question to what extent it is possible to fit an AR model on the differenced data.
+Therefore, the ADF test was used to check if the differenced close price was stationary.
+The results are shown in Table 2, indicating that the differenced series is likely
+stationary, which is a necessary prerequest for fitting AR models.
 
 \input{../bld/memory/diff_close_stat_test.tex}
 
 
 \noindent After confirming stationarity, the next critical question is whether the differenced time series
-exhibits short or long memory. Therefore I computed the Autocorrelation Function of the time series. As
-you can see in the following plot the ACF decreases over time but still has a few outlieres which
-indicates that the time series has the characteristics of a process with a short memory,
-but with potential components with a long memory.
+exhibits short or long memory. Therefore, the Autocorrelation Function (ACF) of the time series was computed. Visualized
+in Figure 2, the ACF decreases over time, which indicates the characteristics of a process with short memory.
+However, a few outlieres visualize a potential of a time series with long memory.
+
 
 \begin{figure}[H]
     \centering
     \includegraphics[scale=1.2, width=\textwidth, trim=10 10 10 10, clip]
     {../bld/plots/acf_plot.pdf}
     \caption{Autocorrelation Function (ACF) of the differenced time series with 95\% confidence
-    bands.}
+    bands}
     \label{fig:acf_plot}
 \end{figure}
 
 
-\noindent Since the differenced close price is likely stationary, but the ACF indictaes some long run effects
-I also cumpted the hurst coefficient which has an value of approximately  .052 which indicates
-that the time series shows characteristics of a process with almost random behavior, as the Hurst coefficient
-close to 0.5 indicates the absence of strong long-term dependencies. However, since the ACF indicates some
-long-term effects, this result could indicate a mixture of short-term autocorrelations with occasional
+\noindent Since the differenced close price is likely stationary, but the ACF indictaes some long run effects,
+the Hurst exponent was computed (Table 3). This exponent indicates the absence of strong long-term dependencies
+with a value close to 0.5. For the Apple stock data a value of approximately  0.52 was computed. This demonstrates
+that the time series shows characteristics of a process with almost random behavior. However, since the ACF indicates some
+long-term effects (Figure 2), this result might be an indication for a mixture of short-term autocorrelations with occasional
 persistence.
 
 
@@ -134,44 +133,44 @@ \section{Memory Analysis}
 \section{Multistep Forecast}
 
 Although the differenced close price was found to be stationary, the presence of a Hurst
-coefficient of 0.52 and some significant autocorrelation function (ACF) values suggest that
+coefficient of 0.52 and some significant ACF values suggest that
 the series retains some degree of long memory.
 
 \noindent AR models are designed to capture short-term dependencies and assume that the impact of past
 values decays rapidly. However, in a long-memory process, dependencies persist for a longer
 time, meaning that an AR(p) model may fail to account for the full structure of the series
 beyond a few steps ahead.
 
-\noindent In a one-step-ahead forecast, the AR model predicts the next value based solely on observed
-historical data. In a multi-step forecast, each predicted value is used as input for the next
+\noindent In an one-step-ahead forecast, the AR model predicts the next value based solely on observed
+historical data. While in a multi-step forecast, each predicted value is used as an input for the next
 prediction. This recursive approach leads to error accumulation.
 
-\noindent The following figure illustrates that the AR model fails to capture the long-term structure
+\noindent Figure 3 illustrates that the AR model fails to capture the long-term structure
 of Apple stock price movements when applied to multi-step forecasting. This is due to error accumulation
-and the model's inability to account for evolving market dynamics.
+and the model's incapacity to account for evolving market dynamics.
 
 
 \begin{figure}[H]
     \centering
     \includegraphics[scale=1.8, width=\textwidth, trim=10 10 10 10, clip]
     {../bld/plots/multistep_forecast.pdf}
-    \caption{Multi-step forecast for Apple stock price.}
+    \caption{Multi-step forecast for Apple stock price}
     \label{fig:apple_forecast}
 \end{figure}
 
 \section{Conclusion}
 
-To conclude my analysis i went through the following steps:\\
+The autoregressive model analysis of Apple stock data can be summarized in three steps:
 First, the stationarity analysis, conducted using the Augmented Dickey-Fuller (ADF) test,
-indicated that the original close price series was non-stationary, requiring differencing to
-achieve stationarity.\\
-Second, the evaluation of different AR(p) models based on the Akaike Information Criterion
-(AIC) revealed that an AR(1) model provided the best fit among the examined options.\\
+indicated that the original close price series was nonstationary, requiring differencing to
+achieve stationarity. Second, the evaluation of different AR(p) models based on the
+AIC revealed that an AR(1) model provided the best fit among the examined options.
 Third, the study investigated the memory characteristics of the time series by computing the
 Hurst exponent and analyzing the autocorrelation function (ACF). The results suggested that
-the differenced time series exhibited short-memory behavior.\\
-Finally, the limitations of AR models for multi-step forecasting were assessed. While the AR(1)
-model performs well for short-term forecasts, its accuracy deteriorates over multiple steps
+the differenced time series exhibited a mixture of short-term autocorrelations with occasional
+persistence.\\
+Finally, the limitations of AR models for multi-step forecasting were assessed. While the AR
+models performs well for short-term forecasts, its accuracy deteriorates over multiple steps
 due to error propagation and the inability to capture long-term dependencies.
 
 \end{document}