Forecasting Hotel Prices in Selected Middle East and North Africa Region (MENA) Cities with New Forecasting Tools ()
1. Introduction
According to the World Tourism Organization, tourism is considered a sector of hope. Tourism accounts for 10% of world GDP. It contributes to world societies by promoting cultures, as well as by adding more jobs to the economy. In 2016, 1.2 billion tourists were recorded worldwide and this number is expected to grow to 1.8 billion (50% growth) by 2030 [1] . This research paper focuses on one aspect of the tourism and hospitality sector, which is accommodation and, more specifically, hotels. Practitioners as well as researchers have been motivated to study the dynamic prices of hotel rooms and understand the determinants of these changes and, accordingly, be able to forecast prices more effectively. Akm [2] , Drew et al. [3] , Hassani et al. [4] , Padhi and Aggarwal [5] , Yang et al. [6] , Youn and Gu [7] , Jovanovic et al. [8] , Uysal [9] , and Magnini et al. [10] have all forecasted hotel or tourism demand using neural networks jointly with support vector analysis, the autoregressive integrated moving average (ARIMA), logistic regression, fuzzy goal programming, and decision trees.
This study introduces a novel approach to hospitality forecasting in this region and more specifically to hotel rooms’ average daily rates (ADR) based on eight major cities in the Middle East and North Africa (MENA) region. These methods include linear models (simple moving average and ARIMA) and nonlinear models (radial basis function (RBF) and the support vector machine (SVM). Research on hotels and advanced machine learning forecasting techniques is very limited in this region. This study adds to the literature in this field.
The rest of the paper is organized as follows: the second section reviews the historical literature on Tourism and Hospitality studies and the different models used for forecasting; the third section covers the problem definition and research objective of this study; the fourth section highlights the conceptual framework which would provide good visuals for the study objective and hypothesis; the fifth is the methodology section which covers the data collection part, a detailed data analysis, a list of the key variables and the models used in the research; the sixth section covers the statistical performance evaluation and measures which were employed to select the best model. Finally, a conclusion section covers a summary of the study outcomes and provides recommendation for future studies.
2. Literature Review
In the tourism and hospitality industry, it is very important to understand the variables affecting demand and eventually the performance of the industry (or a particular hotel or restaurant). In the past, managers and decision-makers relied on simple forms of data analysis (simple linear regression or multiple regression) to investigate the influence of those variables. More recently, big data analytics has unlocked the potential for studying complex business situations to understand the correlation between variables or causes and eventually forecast performance. Machine learning techniques have made it possible to analyze tens or hundreds (even thousands) of data variables that are either stored or live-streamed to help shape a time bound decision-making process.
Hospitality and tourism studies using machine learning to predict demand have gained momentum lately. Simple neural networks (NN) (or artificial neural networks (ANN)) have been the most-used machine learning technique. Philips et al. [11] , Pattie and Snyder [12] , Govers et al. [13] , and Law [14] have all used NN or ANN to forecast tourism or hotel demand using historical tourists’ arrival data or room occupancy data. Some researchers have combined neural networks with other ML techniques, either to pick the best model or to improve the performance of the model.
Others have explored different machine learning techniques as well. Vu et al. [15] , Tkaczynski et al. [16] , Toral et al. [17] , Dolnicar and Leisch [18] , Brochado et al. [19] , and Geetha et al. [20] have used clustering while dealing with market or customer segmentations and consumer behaviors. Hadavandi et al. [21] , Yu and Schwartz [22] and Sohrabi et al. [23] have used fuzzy systems to predict hotel demand using arrival data. Fuzzy systems are also widely used in planning and decision-making in retail and banking. Li and Sun [24] used support vectors to predict firm failure using financial and non-financial data, while Chen and Wang [25] used the support vector technique to forecast the demand. Pantano et al. [26] used tourist attraction characteristics and the random forest method to predict tourist response, while Shapoval et al. [27] used inbound visitors numbers and the decision tree technique to develop effective destination marketing.
Rong et al. [28] , Xiang et al. [29] , Li et al. [30] , Yang et al. [31] , Guo et al. [32] , Versichele et al. [33] , Sun et al. [34] , Athanasopoulos et al., [35] , Schwartz and Hiemstra [36] , Yang et al. [37] , Tussyadiah and Wang [38] , Pereira [39] , Liu et al. [40] , Lim et al. [41] , Wu et al. [42] , Zhang and Zhang [43] , and Chiu et al. [44] have all introduced new “unconventional” variables while studying hospitality performance. The results are evidence that machine learning techniques are extremely powerful in enhancing the accuracy of analysis and prediction. With a very globalized industry such as the hospitality and tourism industry, machine learning has indeed proven its potential.
3. Problem Definition and Research Objective
Prior studies have focused mostly on tourism demand in countries, that is, at the macro level. Tourism arrivals have been extensively used as a forecasting component and most of the studies are country-specific. Macroeconomic factors take longer time to respond to changes of what is known as the lag effect. On the other hand, firm specific microeconomic factors which represent the fundamental factor model are much faster to respond to changes and are in the control of the hotel managers/owners. They often lead to significant characterization of the dependent variable under investigation [45] . The objective of this study is to firstly combine macro and micro elements while studying dynamic hotel prices. Several cities from the MENA region, which, it is assumed, go through the same economic effects, are included in the study. In addition, this study explores the benefits of using machine learning techniques in forecasting.
4. Research Framework and Proposed Hypotheses
Hotel practitioners have used inventory planning and pricing as major inputs in their revenue management systems, which has led to successful hotel performance [39] . Lee [46] found while measuring hotel room rates that prices were affected by both internal (hotel-specific) and external (economic) factors.
Driven by literature findings, hotel attributes have a direct effect on hotel performance. This leads to the first hypothesis:
H1: Big data on hotels leads to better price prediction.
Moreover, various economic factors were investigated in hotel performance studies. This has led to the inclusion of economic factors as a moderating effect to be tested in this study:
H2: Economic factors moderate the relation between hotels’ attributes and hotel room pricing.
The following diagram (Figure 1) provides a visual representation of the study direction.
5. Methodology
Data collection and analysis
The daily hotel data used in this research paper came from STR and covered eight cities in the MENA region (at our request): Dubai, Jeddah, Manama, Muscat, Kuwait, Beirut, Amman, and Sharm El Sheikh. The hotels were split into three categories (Luxury and Upper Upscale Class, Upscale and Upper Midscale Class, Midscale and Economy Class). However, due to some missing data and the need for uniform analysis across all eight cities, it was decided to deal only with the Luxury and Upper Upscale class since data were available for this class
Figure 1. Hotel performance determinants in selected MENA cities.
for all cities. The sample contained 2800 observations of daily room sales covering the period between January 2010 and August 2017. The data set was split in the following way (Table 1).
The data split is done to test the models’ performance in predicting unknown observations (test data is part of unsupervised learning) after determining all network/model parameters using the training data.
Economic variables were obtained from other credible sources such as the World Bank, the World Tourism Organization, the World Economic Forum and the US Energy Information Administration. The only challenge was that most of these data had different frequencies (monthly, yearly, and once every two years). To deal with different frequencies, international tourist arrival data were converted to daily rates by dividing the annual rate by 365, while country-level annual GDP growth percentage rate, inflation rate (average consumer prices), oil price (WTI and Brent), and index data on each country’s business environment, safety and security, health and hygiene, human resources, and labor market were all converted to daily rates by maintaining the same rate throughout the year. The aim was to use these variables and measures to gain insight into the determinants of hotel room prices that would help us, and eventually decision-makers, to predict these prices with more accuracy. Innovative ways of handling mixed data frequencies could be an opportunity for future research. This study would also be a good piece of research to validate the hotel performance determinants (HPD) model suggested by Assaf et al. [47] . The table below (Table 2) represents a list of the variables found in the literature that we utilized in our study based on data availability/accessibility.
6. Models in the Research
This research is based on predicting dynamic hotel room prices based on the selection of the best forecasting model. These models are linear (simple moving average and ARIMA) and non-linear (RBF and SVM) in form. The goal is to compare the above-mentioned models using the model performance measures to determine the best model or a combination of them.
6.1. Time Series Forecast
Using ADR values from the years, time series forecast would help in predicting the future ADR of hotels based on historical data. The main goal is to find a model with a better fit for the data, hence reducing the noise or error. The models that we used are the simple moving average and the Box?Jenkins ARIMA model. These models are widely used in tourism and hospitality research.
6.2. Simple Moving Average
The simple moving average method uses the average of previous n-periods as a forecast value [59] .
ARIMA
Table 2. Summary table of variables used in tourism and hospitality literature.
ARIMA (p, d, q) consists of the autoregressive AR(p), the moving average MA(q) and (d), which represents the order of differencing used to achieve stationarity. ARIMA is one of the models most widely used to forecast with time series data [60] .
(1)
6.3. Machine Learning Models
6.3.1. RBF
RBF is one of the most widely used neural network techniques. Used for classification as well as regression, the RBF model is a feed-forward neural network that is based on three layers: input, hidden and output [61] . RBF models gained interest due to their advantage in achieving faster convergence with fewer errors while also being reliable (Moradkhani et al., 2004) [62] .
(2)
where C represents the center and represents the width of the neuron or the radius (Wei, 2012) [61] .
6.3.2. SVM/SVR
SVM was introduced by Vapnik in the early 1990s. SVM is a statistical technique used widely for classification and more recently for regression (support vector regression (SVR)). Unlike other models, SVM aims to minimize the generalized error (structural risk minimization). When visualized, SVM works to maximize a hyperplane that separates two classes (or more). SVR is another version of SVM that was proposed by Drucker, Burges, Kaufman, Smola, and Vapnik [63] . SVM could also work in higher dimensions if a kernel function is applied, which allows SVM to solve non-linear equations [25] .
(3)
7. Data Analysis
As a first step, we ran a descriptive statistics analysis for the data which highlights the mean, max, min and standard deviation of the variables used in the study. The data used in this study represents daily observations obtained from STR which provided insights of the internal or microeconomic factors used by the industry to study the hospitality sector. Other macro-economic factors, which appeared also in several studies within the tourism and hospitality field; were obtained from different sources such as the world bank the World Tourism Organization, the World Economic Forum and the US Energy Information Administration. However, those factors where country based since capturing them at city level and on daily basis was impossible and out of the scope of this study. The table blow (Table 3) provides a summary of the descriptive statistics for significant variables generated from Dubai Luxury upper data which was produced using IBM SPSS:
With skewness that is close to zero and less than 1 for most variables, this indicated that though we are dealing with very dynamic environment, yet data is normally distributed with means around zero.
Using documented steps in literature for each proposed model in this study, the data was then used in each model to produce the forecasts and to carry further analysis. The aim is to compare models and choose the best model for ADR prediction (or perhaps a combination of models) based on
,
,
,
and
performance criteria.
As a first step in the time series analysis, the data were plotted to check visually for any seasonal trends throughout the year (refer to Figure 2 for ADR). The aim was to regulate this seasonality or make the data stationary in order to be able to explain the data using the autoregressive model, ARIMA. Many of these cities showed acceptable stationarity in data while some (i.e. Jeddah) showed some increasing trends over a number of years, which necessitated some treatment of the data to make the mean constant. As a result, and to maintain uniformity, first-order differencing for all cities’ data was considered to make those data stationary following the 1970 Box-Jenkins method (see Figure 3) [64] .
After dealing with stationarity, AIC and BIC tests were employed. A different combination of ARIMA models for each city was tested and based on the tests criteria the best model was selected for data analysis.
8. Statistical Performance
The following table (Table 4) represents the result of the models’ performance measures for each city as a measure of forecasting accuracy.
8.1. Conventional Techniques
By employing the ARIMA and simple moving average techniques to forecast future room rates, the study found that the simple moving average performed poorly
Figure 2. Average daily rates for Luxury and Upper Upscale hotels in selected cities in the MENA region (StataSE13).
Figure 3. D1 of average daily rates for Luxury and Upper Upscale hotels in selected cities in the MENA region (StataSE 13).
Table 3. Descriptive statistics for Dubai Luxury & upper sample data.
Table 4. Performance for all cities.
according to the performance measures, while ARIMA was a significantly better predictor. From the above table, it appears that Amman, Dubai, and Sharm El-Shaikh have the lowest errors in models compared to other cities, while Manama produced the highest errors. Overall, ARIMA performed better than the simple moving average method in term of forecasting accuracy using conventional techniques.
8.2. Innovative Techniques
One of the contributions of this paper is the use of innovative machine learning tools to forecast room prices. Both RBF and SVM were utilized for prediction, which resulted in significant improvements in performance. The inclusion of external economic factors could also be one reason why these models outperformed the conventional models.
When comparing the forecasting accuracy of different models and for the eight different cities, it was found that SVM and RBF performed better than ARIMA or the simple moving average. The results show the machine learning technique’s superiority in prediction compared to conventional forecasting models.
9. Conclusions
The use of innovative tools in hotel performance forecasting would help researchers as well as practitioners in planning effectively. Hotel internal attributes positively affect hotel performance and more specifically prices. External economic factors moderate the relationship between the hotel attributes and hotel performance.
The main objective of the study was to predict hotel room prices using new tools. The study shows that SVM is the leading model in “luxury and upscale” hotel room price forecasting, followed by RBF and then ARIMA, while the simple moving average is found in this study to be the inferior model.
Machine learning is insufficiently studied in the hotel and tourism sector in the MENA region, and this study adds to the academic literature. Due to their abundance and rapid development in recent years, future studies could explore other machine learning and artificial intelligence models and compare their performance against traditional models. Other performance measures such as precision and speed of returning results could be used for model evaluation as well. Given the dynamic environment within the tourism and hospitality sector, policy makers and hotel operators could use these tools to maintain their strategic lead. SVM model or a combination of forecasting models can be utilized to forecast short-term and long-term market direction based on the model strength.