Applying Machine Learning Techniques to Analyze and Explore Precious Metals ()
1. Introduction
Precious metals such as gold, silver, palladium, and platinum have been highly esteemed for their scarcity, industrial uses, and historical importance as a means of preserving wealth. Gaining a comprehensive understanding of the factors that impact the values of these metals is of utmost importance for investors, policymakers, and academics. The emergence of machine learning techniques provides novel and potent tools for analyzing and forecasting the behavior of precious metal markets. Hence, the significance of this research is in utilizing these sophisticated methods to get a more profound understanding of the price fluctuations of valuable metals, ultimately assisting in improved decision-making and strategy development for different participants in the market.
The structure of the study is as follows:
Introduction: This section elucidates the importance of valuable metals, namely gold, silver, palladium, and platinum. The text explores the historical significance of these assets, their practical uses in industry, and their value as investments. The introduction emphasizes the significance of employing machine learning methods to analyze and forecast the pricing of these metals.
Literature Review: This section examines prior research, conventional analytical techniques, and the recent incorporation of machine learning in financial analysis. The study conducts a meticulous assessment of previous research and identifies areas that need further investigation, which this study intends to fill.
Gaussian Mixture Model (GMM):
This section explains the Gaussian Mixture Model (GMM) and its use in the analysis of financial data. The text discusses the mathematical definition of GMM, the utilization of the Expectation-Maximization (EM) algorithm for parameter estimation, and the benefits of employing GMM to represent intricate price distributions of precious metals.
Results and discussion:
This section focuses on the application of the Gaussian Mixture Model (GMM) and other advanced machine learning techniques to analyze the price data of gold, silver, palladium, and platinum from the years 2017 to 2023. The method involves several steps, including data preprocessing, model training and validation, and the assessment of model performance using metrics such as BIC, AIC, ICL, log-likelihood, NEC, and entropy. The discoveries and patterns identified in the price dynamics of the selected precious metals are thoroughly addressed, emphasizing crucial results.
Conclusion:
The concluding part provides a concise overview of the study’s discoveries and their significance for investors, legislators, and scholars. The text explores the advantages of employing machine learning techniques to analyze the precious metal markets and proposes prospective avenues for future research. The conclusion also evaluates the study’s impact on the current pool of knowledge and its practical implications in making financial decisions and developing strategies.
2. Literature Review
The study by Çelik and Başarır (2017) examines the use of ANN for predicting the prices of precious metals and shows encouraging outcomes. The research utilized RapidMiner software to analyze the daily prices of gold, silver, palladium, and platinum, together with 15 macroeconomic factors, spanning from January 2010 to December 2015. The performance of the model was assessed using metrics like root mean squared error, absolute error, relative error, Spearman’s Rho, and Kendall’s Tau. The results suggest that ANN can accurately and effectively forecast the prices of gold and palladium. However, the forecasts for silver and platinum were less accurate. The study highlights the potential of ANN as a prediction tool for forecasting precious metal prices. It suggests that future research could enhance accuracy by including additional data and making comparisons with other machine learning algorithms and time series analysis approaches.
The objective of the study by Varshini et al. (2024) is to forecast the future prices of mineral commodities, including gold, silver, copper, platinum, palladium, and aluminum, by employing a blend of machine learning and deep learning models. The covered models include stacked LSTM, bidirectional LSTM, SVR with EGB, and gated recurrent units. The evaluation of model performance relies on three metrics: RMSE, MAE, and MAPE. The selection of mineral, duration of the sample period, and choice of input data significantly impact the precision of forecasts.
The study by Zhou and Xu (2023) introduces an innovative three-stage hybrid learning method to precisely predict the values of silver, palladium, and platinum. The hybrid model surpasses other models, emphasizing the significance of integrating diverse components to improve prediction accuracy and generalization power. These findings provide useful information for investing in and producing precious metals.
The study by Ulsami and Junejo (2017) provides evidence that machine learning is successful in forecasting gold prices, revealing that the stock valuation of a renowned company has a more significant influence on gold rates than the condition of the US economy. The study conducted by Sarangi et al. (2021) assesses the effectiveness of a hybrid model driven by machine learning in predicting gold prices. This model utilizes an ANN-PSO model, which integrates Particle Swarm Optimisation. The study applied the ANN-PSO model to monthly gold prices in India from January 2012 to June 2021. The findings demonstrate that this model can effectively forecast future gold prices.
The study by Oztoprak and Orman (2021) specifically examines the application of Explainable Artificial Intelligence (XAI) in forecasting the values of precious metals. It offers a thorough survey of existing research in this field. The study by Çelik and Başarır (2017) uses an ANN model created with Rapid Miner to evaluate the capacity to forecast the prices of gold, silver, palladium, and platinum. The model uses daily data from Jan. 2010 to Dec. 2015. The results suggest precise and efficient forecasts for gold and palladium, whereas they are less accurate for silver and platinum. This model is designed to forecast the pricing of materials utilized by jewelry manufacturers.
The study by Phitthayanon and Rungreunganun (2019) presents two separate deep-learning models for predicting the prices of gold, silver, and diamonds. It highlights that the proposed approach can be simply implemented in small jewelry enterprises in Thailand, as pricing data is readily available. The model attains a level of accuracy that is comparable to what is found in existing literature, while using less data. However, there is a notable difference in accuracy between the models for precious metals and diamond prices. This difference is related to the greater availability of data for precious metals.
Kangalli Uyar et al.’s (2024) study outlines a two-step approach for forecasting price bubbles in precious metals. The technique commences by conducting a unit root test on the right side in order to identify the presence of bubbles. Subsequently, a variety of machine learning techniques, such as logistic regression, SVM, CART, random forests, EGB, and neural networks, are employed to detect and assess the factors that contribute to the formation of bubbles. The study found that macroeconomic factors influenced the monthly price swings of gold, silver, palladium, and platinum from 1990 to 2022, resulting in the emergence of price bubbles. The study affirms that the US consumer confidence index exerts a favorable influence on determining the speculative expansion in the prices of gold, platinum, and silver. The paper by Ling and Zhu (2017) utilized GMM to determine the probability of precipitation occurring on a specific day and time based on historical meteorological data.
The objective of the study by Varshini et al. (2024) is to forecast future metal prices in commodity markets by employing machine learning and deep learning models such as Stacked LSTM, Convolutional LSTM, Bidirectional LSTM, SVR EGB, and Gated Recurrent Unit. Performance is assessed using three indicators: RMSE, MAE, and MAPE. The accuracy of predictions can vary significantly based on the particular mineral, the time period of sampling, and the unique input data. Finance, healthcare, education, transportation, and others have used AI to innovate and advance (Liu & Zheng, 2020). Forecasting in financial markets is difficult because of its non-linear, dynamic nature and the presence of several uncertainties (Vaidya, 2020). Technical analysis indicators are effective instruments for quickly earning gains, especially when dealing with intricate data that has non-linear relationships, all without the need for specialized financial knowledge (Chandar, 2022), the study by Díaz et al. (2022) investigated the influence of the COVID-19 pandemic on the volatility of the global stock market, revealing a notable and adverse impact. An in-depth investigation of classified stock markets helps clarify the correlation between worldwide markets (Li et al., 2022). Machine learning, a subset of artificial intelligence, is exceptionally efficient at tackling a wide range of difficulties in different domains (Chan et al., 2022). Researchers have made predictions about the potential failure of certain firms in the wake of the economic slowdown caused by the coronavirus pandemic (Elhoseny et al., 2022).
3. Basic Concepts
The Gaussian Mixture Model (GMM) is a statistical model that describes the likelihood distribution of a dataset by combining various Gaussian (normal) distributions (Bishop, 2006). It is commonly employed for clustering and estimating density in the fields of machine learning and statistics (Lee & McLachlan, 2013).
The probability density function of a Gaussian Mixture Model (GMM) is expressed as a summation of distinct Gaussian distributions, each weighted by a specific factor (Du Roy de Chaumaray & Marbac, 2024):
(1)
where
denotes the weight of the k-th Gaussian component,
represents the mean (centers of Gaussian components).
(2)
(3)
where
represents the covariance.
The mixing coefficients are denoted by
relative magnitudes of each Gaussian component in the mixture.
, the Gaussian distribution, with mean and covariance:
(4)
Expectation-Maximization (EM) algorithm
The Expectation-Maximization (EM) algorithm is a commonly employed technique for determining the maximum probability estimates of parameters in statistical models that involve hidden variables (Dempster et al., 1977). The EM technique is highly efficient when used to the Gaussian Mixture Model (GMM) (McLachlan & Krishnan, 2007).
Maximum A Posteriori (MAP)
MAP classification is a statistical method employed in pattern recognition and machine learning to make choices using observable data (Murphy, 2012). The approach is based on Bayesian probability theory and its objective is to maximize the posterior probability of a class, given the available data (Müller et al., 2015).
4. Results and Discussion
4.1. Statistical Summary
From 2017 to 2023, the prices of precious metals experienced fluctuations. During the COVID-19 period, the prices increased due to a combination of factors, including the recovery of industrial demand, disruptions in the supply chain, increased interest in investment, and favorable developments in regulations and technology.
During the COVID-19 pandemic, gold prices exhibited a consistent upward trend, followed by a period of stabilization. Palladium also saw a significant rise during the pandemic, but this was followed by a gradual decline. In contrast, silver and platinum prices experienced noticeable volatility before the pandemic, then rose during the pandemic period, only to decline afterward, this can be observed in Figure 1.
Figure 1. The series for the price of the precious metals over the period of 2 Jan. 2017 to 2 Jan. 2023.
Table 1 displays the minimum and highest values, the mean, and the standard deviation for the four precious metals: gold, palladium, platinum, and silver. The price of gold ranged from a minimum of 1156.110 to a maximum of 2056.100. The average price was 1599.357, with a standard deviation of 275.813. Palladium exhibited a wider spectrum, ranging from a minimum of 670,000 to a maximum of 3,015,000. The average value was 1,647,291, with a standard deviation of 596,739. The minimum value for platinum was 593,000, the maximum value was 1,294,000, the mean was 941,534, and the standard deviation was 101,390. The silver option, which is the most affordable among the four, had a minimum value of 12.130, a maximum value of 28.990, a mean value of 19.924, and a standard deviation of 4.041.
Table 1. Summary statistics.
Variable |
Observations |
Minimum |
Maximum |
Mean |
Std. deviation |
GOLD |
1800 |
1156.110 |
2056.100 |
1599.357 |
275.813 |
PALLADIUM |
1800 |
670.000 |
3015.000 |
1647.291 |
596.739 |
PLATINUM |
1800 |
593.000 |
1294.000 |
941.534 |
101.390 |
SILVER |
1800 |
12.130 |
28.990 |
19.924 |
4.041 |
4.2. Analysis of Data Using GMM
Table 2 displays the categorization of gold price data into five distinct classes, with each class providing information on the proportions, mean, and variation. The data is presented in the following manner:
Class 1, which accounts for 36.0% of the data, has an average gold price of 1271.832 and a variance of 2084.211.
Class 2, which represents 10.0% of the data, has an average of 1498.097 and a variance of 2675.235.
Class 3, which represents 6.7% of the observations, has an average value of 1686.743 and a spread of 2321.233.
Class 4, which accounts for 28.5% of the data, has an average of 1802.348 and a variance of 2718.252.
Among all the classes, Class 5 has the greatest mean value of 1939.292, which accounts for 18.9% of the total observations. Additionally, it has a variance of 1896.110.
Table 2. The proportions, the mean, and the variance by class (GOLD).
Class |
1 |
2 |
3 |
4 |
5 |
Proportions |
0.360 |
0.100 |
0.067 |
0.285 |
0.189 |
Mean |
1271.832 |
1498.097 |
1686.743 |
1802.348 |
1939.292 |
Variance |
2084.211 |
2675.235 |
2321.233 |
2718.252 |
1896.110 |
The probable values of each class are illustrated in Figure 2, as determined by the gold data. Each class denotes a particular range of gold prices that are categorized according to their probability, as determined by the Maximum A Posteriori (MAP) criterion. The mean of Class 1 (Green) is 1271.832, this classification denotes the lowest price range of gold, Class 5 (Grey), consisting of the most expensive gold prices (1896.110). The classes also demonstrate the extent to which the data is consistent with the gold data models, enabling a visual comparison between the observed data and the predictions made by the models. The GMM is typically employed to construct the fitted model for gold prices. This statistical method presupposes that the data is produced from a combination of multiple Gaussian (normal) distributions, each of these represents a distinct class or component of the overall distribution. Class 1 (Green): The most affordable price tier. Class 2 (Pink): Affordable to moderately priced. Class 3 (Orange) is under the mid-pricing category. Class 4 (Blue) is under the high-mid pricing range. Class 5 (Grey) is the category with the highest pricing range. The Cumulative Distribution Functions (CDFs) for gold, as observed, illustrate the probability distribution and variability over time, and demonstrate a significant correspondence between the predicted and empirical data.
![]()
Figure 2. The MAP classification, fitted model, and cumulative distribution function for gold.
Table 3 displays the classification of palladium price data into four separate classes, each with their own proportions, mean values, and variances. Here is a comprehensive breakdown of the data provided: In Class 1, the proportion is 28.7%. The mean is 945.296 and the variance is 13310.870. This class is the most affordable price range for palladium, which is widely available. Prices within this class have experienced significant fluctuations, indicating a relatively high variance.
Figure 3 displays the MAP classification for each category of palladium. There are four classes, each represented by a different color. Green, representing the lowest class, ranges in price from 670 to 1170. The other classes are represented by the colors pink, orange, and blue, with a price range of 2170 to 2670, which is the highest price. These statistics visually represent the models that have been fitted for palladium. Furthermore, you can access the cumulative distribution functions for Palladium.
Table 3. The proportions, the mean, and the variance by class (PALLADIUM).
Class |
1 |
2 |
3 |
4 |
Proportions |
0.287 |
0.239 |
0.131 |
0.343 |
Mean |
945.296 |
1404.649 |
1896.126 |
2309.525 |
Variance |
13310.870 |
24457.115 |
11435.411 |
89477.510 |
Figure 3. The MAP classification, fitted model, and cumulative distribution function for palladium.
Table 4 displays the classification of platinum price data into three classes, we can understand price distribution and market behavior. Most platinum price data (51.8%) is in Class 2, indicating mid-range pricing. Platinum prices stayed in the mid-range, indicating market stability. Class 1 has the lowest mean price and variance, indicating steady prices. However, Class 3, the highest pricing bracket, is highly variable. This suggests market volatility and dynamics that could be affected by supply and demand shocks, economic conditions, or speculative behavior.
Table 4. The proportions, the mean, and the variance by class (PLATINUM).
Class |
1 |
2 |
3 |
Proportions |
0.137 |
0.518 |
0.345 |
Mean |
820.014 |
939.615 |
992.808 |
Variance |
731.694 |
2484.984 |
17253.166 |
Market trends: Class 2’s large number of observations and moderate variance suggest that platinum prices are stable, occasionally fluctuating as evidenced by Classes 1 and 3. The large variance increase from Class 2 to Class 3 suggests rapid price movements during market turbulence.
Figure 4 illustrates the MAP classification for each of the four price classes of platinum. It visually represents the fitted models for this important metal, with different colors indicating distinct models. Additionally, the figure displays the cumulative distribution functions for Palladium.
Figure 4. The MAP classification, fitted model, and cumulative distribution function for platinum.
Table 5 divides silver price data into three classes with different proportions, mean values, and variations. A full data analysis follows. Class 1: This class has modest price variability (mean 16.618, variance 1.986, Proportion 55.6%). Class 3: This class has the highest price variability (mean 27.318, variance 0.421, Proportion 4%).
Figure 5 illustrates the MAP classification for each category of silver’s pricing. There are three distinct classes, each denoted by a unique color. The color green, which symbolizes the lowest category, has a price range of 12 to 21. The remaining classes are denoted by the hues pink and orange, with a price range of 26 to 27, which represents the uppermost price. The colors of the Fitted Model, corresponding to each peak, should be identical to the MAP categorization mentioned above. The fitting model represents the Probability Density Function (PDF) of silver prices.
Table 5. The proportions, the mean, and the variance by class (SILVER).
Class |
1 |
2 |
3 |
Proportions |
0.556 |
0.404 |
0.040 |
Mean |
16.618 |
23.747 |
27.318 |
Variance |
1.986 |
2.575 |
0.421 |
Figure 5. The MAP classification, fitted model, and cumulative distribution function for silver.
The colors used in the model correspond to the three Maximum A Posteriori (MAP) classes. The black color represents the joint distribution. Furthermore, the Cumulative Distribution Functions (CDFs) for the prices of silver are displayed.
Between 2017 and 2023, the prices of precious metals saw periodic changes. Amidst the COVID-19 pandemic, prices surged as a result of a confluence of events, including the rebound in industrial demand, interruptions in the supply chain, heightened interest in investment, and positive advancements in regulations and technology. The epidemic highlighted the intricate and varied demand dynamics for precious metals, encompassing both industrial and financial considerations (see Figure 1 and Table 1). The data in Tables 2-5 display the proportions, mean, and variation for each class of the four precious metals: gold, silver, palladium, and platinum. Figures 2-5 illustrate a Maximum A Posteriori (MAP) classification for each individual class of the four valuable metals: gold, silver, palladium, and platinum. These figures also present graphically adjusted models for each valuable metal and exhibit the cumulative distribution functions for the prices of gold, silver, palladium, and platinum. By examining these Cumulative Distribution Function (CDF) charts, one can obtain valuable information about the probability distribution and price volatility of each precious metal during the chosen time period. The projected carve closely matches the empirical carve, indicating a strong alignment and implying a high level of accuracy. Table 6 presents the selection criteria used to evaluate the models fitted for each of the four precious metals: gold, palladium, platinum, and silver. The criteria include the Bayesian Information Criterion (BIC), Akaike Information Criterion (AIC), Integrated Completed Likelihood (ICL), Log-likelihood, Normalized Entropy Criterion (NEC), and Entropy. The results offer valuable insights into the performance of the model for each individual precious metal. In general, smaller values of BIC, AIC, and ICL indicate a more accurate fit, whereas lower entropy and NEC values indicate a higher level of confidence in the model’s predictions. Palladium exhibits the most favorable model fit as determined by BIC, AIC, and ICL, since it demonstrates the lowest values compared to the other metals.
Table 6. The selection criterion for the selected model.
|
DF |
Entropy |
NEC |
Log-likelihood |
ICL |
AIC |
BIC |
GOLD |
14.000 |
334.094 |
0.380 |
−11789.643 |
−24352.411 |
−23607.285 |
−23684.223 |
PALLADIUM |
11.000 |
447.767 |
1.473 |
−13754.281 |
−28486.547 |
−27530.562 |
−27591.013 |
PLATINUM |
8.000 |
1038.578 |
10.327 |
−10767.177 |
−23671.474 |
−21550.354 |
−21594.319 |
SILVER |
8.000 |
120.567 |
0.241 |
−4567.057 |
−9435.213 |
−9150.114 |
−9194.078 |
5. Conclusion
The study, which utilized machine learning models, analyzed the price trends of gold, silver, palladium, and platinum from 2017 to 2023, yielding various insights and findings. The model of Palladium showed the most favorable fit among the precious metals, as shown by the lowest BIC, AIC, and ICL values. In contrast, the model based on platinum had the highest levels of entropy and NEC values, suggesting greater levels of uncertainty and complexity. The Cumulative Distribution Function (CDF) plots for each metal yielded valuable insights into the probability distribution and price volatility within the selected time frame. The strong agreement between the empirical and estimated Cumulative Distribution Functions (CDFs) confirmed the assumptions and dependability of the selected models. During the COVID-19 epidemic, the values of precious metals, namely gold and silver, experienced substantial rises. This demonstrates their classification as safe-haven investments during times of economic turmoil. Palladium and platinum prices also exhibited volatility; however, the patterns were more diverse and less prominent in comparison to gold and silver. The results emphasize the need to employ sophisticated machine-learning methodologies for financial modeling and forecasting. Investors and policymakers can utilize these observations to make well-informed judgments regarding investments in precious metals and measures for managing risks.
Nomenclature
ANN |
Artificial Neural Networks |
LSTM |
Long-Short Term Memory |
SVR |
Support Vector Regressor |
MAE |
Mean Absolute Error |
RMSE |
Root Mean Squared Error |
MAPE |
Mean Absolute Percentage Error |
EGB |
Extreme Gradient Boosting |
CART |
Classification and Regression Trees |