A colleague recently asked me perform a time series analysis of gold prices and forecast future prices with the assumption that they would continue to decline. In this analysis with will not only forecast gold prices but also look at factors that may impact those prices.

To conduct this analysis, I looked for leading economic indicators and not being an economist I turned to some of the Moody’s top economic indicators, namely consumer price index (CPI), inflation, U.S. gross domestic product (GDP-US), S&P 500, prime rate, and long interest rate. Since the gold prices are available in monthly increments from January 1950to June 2014, I wanted the other data to cover the same period. All of these including gold prices, except for prime rate, were found at GitHub, https://github.com/datasets?page=1. Prime rate does not change every moth. In fact is can change several times within the same month or go several months without change. I obtained the prime rate data for Mortgage-X at http://mortgage-x.com/general/indexes/prime.asp. Then I filled in the missing data with the last previous value, except when there were multiple changes within a month. In those cases I averaged the changes for the month in question.

To perform the analysis I used R and built both ARIMA and UCM models, using the following R packages: * forecast*,

*, and*

`stlplus`

*. I also did the analysis using SAS Studio University Edition with PROC UCM.*

`rucm`

## R Code

I first load the libraries I intend to use. I use the function `require(package)`

but you can just use `library(package)`

. Both load the namespace of the package with name package and attach it on the search list. require is designed for use inside other functions; it returns FALSE and gives a warning (rather than an error as `library()`

does by default) if the package does not exist. Both functions check and update the list of currently attached packages and do not reload a namespace which is already loaded.

`require(forecast)`

`require(stlplus)`

`require(rucm)`

Next, I designate the file I want to load assigning it the name file, and the read it into the workspace. I used two different names here to show that there is a distinction between one or the other and I can use them interchangeably.

### Setup

`file = "C:/Users/Strickland/Documents/Python Scripts/gold_plus.csv"`

`read.csv(file) -> gold`

`read.csv(file) -> mygold`

Now, I set up the time series variable, which is the second column of the file.

`y_var<-mygold[,2]`

`gold.ts <- ts(y_var, start=c(1950, 1), end=c(2014,6), frequency=12)`

`plot(gold.ts,col=4)`

And the independent variables are the third through fourteenth columns.

`x_vars<-mygold[,3:14]`

`plot(x_vars)`

### Component Analysis

Before we dive into modeling, we want to analyze the time series variable so we use the `stlplus()`

function from the stlplus-package and make five plots, one for all components, one for seasonality analysis, one for trend analysis, one for cycle analysis, and one for remainder analysis.

`gold_stl <- stlplus(y_var, t = as.vector(time(y_var)), n.p = 12, l.window = 13, t.window = 19, s.window = 35, s.degree = 1, sub.labels = substr(month.name, 1, 3))`

`plot(gold_stl, ylab = "Gold Prices (USD)", xlab = "Time (months)")`

`plot_seasonal(gold_stl)`

`plot_trend(gold_stl)`

`plot_cycle(gold_stl)`

`plot_rembycycle(gold_stl)`

By observation, I see no seasonality but do observe two distinct cycles occurring roughly at 376 months (Aug 1979, just before the 180 recession), and one after 693 months (Sep 2007, just before the 2008 recession). So, in my UCM model I take seasonality out by setting season equal to FALSE and set the cycle period to 376.

### Initial Model

`gold.model <- ucm(gold~cpi+inflation+SP500+dividend+earnings+lg_int_rate, data = gold, level = TRUE, slope = TRUE, season = FALSE, cycle = TRUE, cycle.period=376)`

`gold.model`

`Call:`

`ucm(formula = gold ~ cpi + inflation + SP500 + dividend + earnings + lg_int_rate, data = gold, level = TRUE, slope = TRUE, season = FALSE, cycle = TRUE, cycle.period = 376)`

`Parameter estimates:`

`Estimate Approx.StdErr t.val p.value`

`cpi 7.25700 1.82142 3.9843 7.408e-05 ***`

`inflation 4.80970 2.44910 1.9639 0.04990 *`

`SP500 -0.11200 NA NA NA`

`dividend -9.77310 6.58572 -1.4840 0.13822`

`earnings 1.36070 0.73976 1.8394 0.06624 .`

`lg_int_rate -6.25060 3.98984 -1.5666 0.11761`

`---`

`Signif. codes:`

`0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1`

`Estimated variance:`

`Irregular_Variance Level_Variance Slope_Variance`

`0.0002 0.0810 0.0000`

`Cycle_Variance`

`479.8603`

It appears that CPI has the greatest effect on gold prices. However, I know that inflation is calculated using CPI, so I use its more relevant form, inflation and remodel, adding GDP-US and keeping S&P 500, which demonstrated no variance (technically, it could not be calculated). I also construct a forecast for 48 months into the future.

`gold.model2 <- ucm(gold~SP500+inflation+gdp_us, data = gold, level = TRUE, slope = TRUE, season = FALSE, cycle = TRUE, cycle.period=420)`

`gold.model2`

`gold.for2<-forecast(gold.model2$s.cycle,h=48,lambda=NULL)`

`plot(gold.for2)`

`Call:`

`ucm(formula = gold ~ SP500 + inflation + gdp_us, data = gold, level = TRUE, slope = TRUE, season = FALSE, cycle = TRUE, cycle.period = 420)`

`Parameter estimates:`

`Estimate Approx.StdErr t.val p.value`

`SP500 -0.0971000 0.0334187 -2.9056 0.0037700 **`

`inflation 8.3858000 2.3025877 3.6419 0.0002886 ***`

`gdp_us 0.0108000 0.0072548 1.4887 0.1369798`

`---`

`Signif. codes:`

`0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1`

`Estimated variance:`

`Irregular_Variance Level_Variance Slope_Variance`

`0.0015 0.1563 0.0000`

`Cycle_Variance`

`492.1741`

Now, inflation shows its importance and S&P 500 is added with variance. GDP-US does not appear to be a factor. So, I keep this model and check the accuracy of its forecast.

`accuracy(gold.for2)`

`ME RMSE MAE MPE`

`Training set 0.4900831 22.78806 11.78584 11.84575`

`MAPE MASE ACF1`

`Training set 37.04665 0.9895334 0.1014228`

Keying in on MAPE, this model does not produce a very accurate forecast. So, I develop another model to challenge it. In this instance I build an ARIMA model and use an autoregressive component, one dependent lag, and one moving average. The `Arima()`

function is different from the basic `arima()`

function in that you can explicitly model seasonality using AR, D-lag, and MA components as well. Since I saw no seasonality, I omit these from the model.

## Challenger Model

In this instance, I built an ARIMA model and use an autoregressive component, one dependent lag, and one moving average. The `Arima()`

function is different from the basic `arima()`

function in that you can explicitly model seasonality using AR, D-lag, and MA components as well. Since I saw no seasonality, I omit these from the model.

`x_vars3<-mygold[,4:5]`

`gold.model3<- Arima(y_var,order=c(1,1,1),lambda=0,xreg=x_vars3)`

`gold.model3`

`gold.for3<-forecast(gold.model3$x,h=48,lambda=NULL)`

`plot(gold.for3)`

`accuracy(gold.for3)`

`Series: y_var`

`ARIMA(1,1,1)`

`Box Cox transformation: lambda= 0`

`Coefficients:`

`ar1 ma1 SP500 inflation`

`-0.1389 0.4784 -1e-04 0.0061`

`s.e. 0.1127 0.1020 1e-04 0.0034`

`sigma^2 estimated as 0.001549: log likelihood=1415.32`

`AIC=-2820.64 AICc=-2820.56 BIC=-2797.34`

`ME RMSE MAE MPE`

`Training set 0.7340026 22.82116 10.88003 0.1975731`

`MAPE MASE ACF1`

`Training set 2.400621 1.006989 0.01282057`

Both inflation and S&P 500 are significant, but the latter contributes very little. The MAPE of 2.4 demonstrates an accurate forecast based on the historical data and model selection.

Looking more closely at the forecast, we see the prices of gold continuing to go down but the starting to level off at the end of the forecast period (which is what it is actually doing).

## Variable Contribution

Looking at the contribution of the components on gold prices, we see that inflation and S&P 500 both have negative effects, but the contribution of the S&P 500 does not appear until after the 2008 recession. GDP_US seems to make no contribution at all, which is why it was not significant in the model. Now, that is not to say that the world GDP or Western European GDP does not, because we did not include them in the model.

## Conclusion

So the cycle for gold is to stay relatively constant untile just before a recession, then increase in vale only to fall after a recession. Once the price levels out, it behaves as it did before the recession untile the next recession and repeats the cycle. However, the price of gold after the recession is higher is higher that it was before. Also, there seems to be a delayed reaction to the recession. If you think of a recession as an intervention, how would you change the model?

**Authored by:
**

**Jeffrey Strickland, Ph.D.**Jeffrey Strickland, Ph.D., is the Author of * Predictive Analytics Using R* and a Senior Analytics Scientist with Clarity Solution Group. He has performed predictive modeling, simulation and analysis for the Department of Defense, NASA, the Missile Defense Agency, and the Financial and Insurance Industries for over 20 years. Jeff is a Certified Modeling and Simulation professional (CMSP) and an Associate Systems Engineering Professional (ASEP). He has published nearly 200 blogs on LinkedIn, is also a frequently invited guest speaker and the author of 20 books including:

__Operations Research using Open-Source Tools__*Discrete Event simulation using ExtendSim**Crime Analysis and Mapping**Missile Flight Simulation**Mathematical Modeling of Warfare and Combat Phenomenon**Predictive Modeling and Analytics**Using Math to Defeat the Enemy**Verification and Validation for Modeling and Simulation**Simulation Conceptual Modeling**System Engineering Process and Practices*

Connect with __Jeffrey Strickland
__Contact

__Jeffrey Strickland__

Categories: Articles, Education & Training, Jeffrey Strickland