1. Introduction
Like in any safety-critical industry, safety is at the heart of the aviation industry. The rapid growth of air traffic and recent technological advancements have led to the development of more robust ways of making commercial air travel the safest means of travel. According to the latest safety report by the International Civil Aviation Organization (ICAO) [1] , in the year 2021, the global accident rate dropped from 2.14 accidents per million departures in 2020 to 1.93 accidents per million departures in 2021, a decrease of 9.8 percent, and compared to the last five years, it is the lowest it has ever been. Over the last two decades, aviation deaths around the world have been falling steadily, and this is mainly due to the implementation of more robust and proactive safety procedures. Notwithstanding these notable improvements in the aviation industry, the safety standards in General Aviation (GA) safety standards are still substantially lower than compared to those of commercial aviation. According to an estimate by the International Air Transport Association (IATA) [2] , by 2024, the overall number of travelers is expected to grow to 4 billion, exceeding pre-COVID numbers by 103 percent. This anticipated growth in air travel will not be limited to commercial aviation, but will also have a significant impact on GA, and this will be especially true for the upcoming Urban Air Mobility (UAM) industry. With increased demand comes a growing need to improve operational safety in GA. Safety improvements used to rely on accidents to generate mitigation strategies, but recent advances in computation power and the abundance of data have caused us to shift from a reactive to a predictive approach to safety, in which we try to identify potentially hazardous events ahead of time, using operational data and implementing mitigation strategies that can prevent such accidents. Machine learning has been at the forefront of this paradigm shift, where we use historical data to find unknown anomalies with the help of domain experts.
One such means of improving aircraft and operational safety is through trajectory prediction. According to the Single European Sky ATM Study (SESAR) [3] proposed by the European Control Center and the Next Generation Air Transport System (NextGen) [4] [5] , flight trajectory prediction will play a key role in optimizing air traffic control and improving overall flight safety [6] . Trajectory prediction involves calculating or predicting an aircraft’s future position using the altitude, latitude, longitude, and some other parameters based on existing time series data. The overall aircraft trajectory relies on multiple factors, which also include multiple external factors such as the weather, wind direction, the aircraft’s weight, etc., which complicates trajectory prediction. Trajectory prediction is usually divided into two parts: short-term trajectory prediction and long-term trajectory prediction, depending on the scale at which the prediction is done. In short-term prediction, also known as tactical trajectory prediction, the goal is to predict the aircraft position for short time intervals, such as a few seconds or minutes. However, given the very nature of predictions, all variables have a significant impact on the final prediction, making it much more accurate. In contrast, long-term trajectory prediction, which is also known as strategical trajectory prediction, deals with the prediction of a complete flight and uses historical flight data to produce the potential flight trajectory for the current flight.
In this paper, we are proposing a way of predicting the short- to mid-term trajectory of aircraft using a meta-learning approach that uses historical real flight data collected from multiple GA aircraft. Meta-learning is a machine learning approach that combines multiple models together to improve prediction accuracy. In our case, we use meta-learning to predict the trajectory of aircraft. In the worst case, the accuracy of the output will match that of one of the selected models. However, in most cases, the output accuracy should be better. The majority of current research uses automatic dependent surveillance-broadcast (ADS-B) [7] [8] as the main source of their data, which can contain many gaps within each flight and noise. In contrast, we leverage complete flight data extracted from actual flights enabling our model to better understand the variations during flight and deliver stable and consistent results.
This paper is divided into two parts: data preprocessing, which is crucial for any machine learning model, and the Meta-Learning part, where we stack two models, namely, Random Forest Regression (RFR) and Long Short-term Memory (LSTM), using k-Nearest Neighbors (k-NN). The output of our main models is combined to provide the final prediction, giving our model an edge over single-model predictions. We begin with a literature review, followed by a discussion of the selected data and data pre-processing in Section 3. Section 4 explains the proposed methodology for aircraft trajectory forecasting, while Section 5 presents the results of our meta-learner and evaluates its performance against individual models. Section 6 discusses the limitations and potential possible improvements of the proposed method. Finally, Section 7 concludes this paper with a summary of our findings.
2. Literature Review
In this paper, trajectory prediction refers to the process of calculating an aircraft’s future trajectory with the help of certain aircraft parameters, such as positional data, heading, velocity, etc., using historically collected data from multiple airports. Several approaches are available for predicting aircraft trajectory, and can be classified as physics-based models, estimation methods [9] , data-driven methods, or a combination of these methods. A physics-based model, which can also be classified as an aerodynamic model-based method, relies on kinetic assumptions, where the parameters of the models are selected on the basis of the flight route, aircraft performance, anticipated weather conditions, and pre-flight commands. The base of Aircraft Data (BADA) Family 4 [10] , introduced by EUROCONTROL, is considered to be one of the most mature aircraft performance models. It was designed to help with the research and development of Air Traffic Management (ATM) by simulating and predicting aircraft trajectories. Xing et al. [11] proposed a unique way of using a Markov statistical model, in which a kernel variable length Markov model (KVLMM) was employed to forecast the trajectory of an aircraft. Another paper [12] proposed the Kalman Filtering (KF) algorithm as an efficient location tracking technique, and in that case, the authors used the inherent fixed coefficient feature for efficiently cycling the location information between the prediction phase and the correction phase. Junfeng et al. [13] proposed a four-dimensional trajectory prediction (4D-TP) method that relied on four components, namely performance parameters, aircraft intent, a computation model, and environmental conditions for aircraft trajectory forecasting. In another paper, Vilardaga Prats [14] presented a method using the time of arrival at a fixed navigation point to implement sub-optimal trajectories in dense traffic areas. Optimized trajectory fore-casting could result in reduced delays, more efficient fuel consumption, and the resolution of air traffic conflicts. However, physics-based methods tend to perform poorly in real-world scenarios due to the high number of unknown or partially known factors affecting the actual trajectory.
With the technological and research advancements in the field of machine learning, a data-driven approach that learns the complex laws of aerodynamics from historically available datasets related to aircraft trajectories has become much more feasible. These models are constructed with weak, or sometimes, with no, assumptions, and in most cases, show better prediction capabilities than the physics-based models.
For instance, in 2013, Leege et al. [15] proposed the Supervised Learning Regression (SLR) technique, in which they used Generalized Linear Models (GLMs), Artificial Neural Networks (ANN) and Support Vector Regression (SVR) for trajectory prediction. They showed the results for the fixed arrival routes with a prediction horizon of 15 NM to 45 NM obtained from GLM. The model’s input was based on the aircraft’s ground speed, altitude, type, surface wind, and altitude wind.
In 2018, Shi et al. [16] proposed Long Short-Term Memory (LSTM)-based trajectory prediction, which uses four interacting layers and a sliding window to maintain the continuity of predictions. Here, they used a timestamp, longitude, latitude, and altitude for the trajectory predictions [17] . Siami-Namini et al. [18] used Bidirectional Long Short-Term Memory (BiLSTMs), which is a variation of LSTM. In their research, they added an additional training step by traversing the input data from left to right and from right to left, and this extension of the training layer resulted in better prediction accuracy than with the original LSTM based model.
Zhou et al. [19] took an interesting approach, in which they implemented multiple trajectories forecasting methods, such as LSTMs, back propagation (BP) neural net-works, flight plan interpolation, etc., and created a hybrid trajectory prediction model using all of them. The proposed method extracted the best method at different prediction time spans. Additionally, the authors also compared the results obtained from the machine learning model with the Kalman Filter (KF), which is a physics-based prediction model widely used for short-term trajectory predictions [20] [21] .
In another paper, Puranik et al. [22] proposed a novel framework for the analysis of aviation flight data using supervised machine learning, specifically using an offline Random Forest Regression (RFR). The proposed framework was demonstrated through a practical use case for predicting true aircraft landing airspeed and ground speed during the approach phase. The authors used the historical data from commercial air-line operations to train a global prediction model. The main goal there was to assist the pilot with direct prediction of landing airspeed or ground speed using the proposed framework during the approach phase, when stabilized approach criteria become operationally complex.
Another study [23] introduced novel machine learning-based approaches, including a Multi-Layer Perceptron (MLP) architecture with residual connections, a Random Forest (RF), and a Recurrent Neural Network (RNN) regression approach, for maneuvering classification and direct prediction of the next state in aerial robotics. The authors demonstrated that their proposed RF algorithm outperforms the ornithopter Segmentation-based Planning Approach (OSPA), even in the landing scenario, and that their algorithms are robust against simulated sensor noise. Additionally, the study proposed an RNN for obtaining trajectories without using the mathematical model of the ornithopter, which is particularly useful for re-planning when the target state changes during the flight. Overall, this study provides valuable insights into the use of machine learning in online scenarios for aerial robotics. Random Forest is a widely-used algorithm for predicting the trajectories of various moving objects [24] [25] .
Choi et al. [26] presented a grid prediction model that predicts the trajectory of surrounding vehicles and determines the position of the grid using RF and LSTM encoder-decoder architectures. The model is trained using a dataset recorded with a vehicle-to-vehicle (V2V) communication device, a camera sensor, and LIDAR. The pro-posed method shows high positional accuracy after 1 second, but has limitations in areas where V2V communication is not possible. The paper shows the combined prediction capability of LSTM and RFR for real-time trajectory prediction of ego vehicles to help prevent collisions with surrounding vehicles.
Following the same trend in this paper, we are exploring an extendable meta-learning approach to trajectory prediction. Here, we are stacking RFR and LSTM with the help of k-NN to see if the prediction accuracy improves versus that of individual models.
3. Data and Data Pre-Processing
3.1. Data Source
The use of real flight data is crucial for the development and testing of flight prediction algorithms. However, obtaining such data is often challenging due to regulations and the sensitive nature of such data. In this paper, we used a private dataset of approximately 5643 flights from multiple different airports, covering the period of June 2021 to February 2022. The data distribution can be seen in Figure 1. The data was collected from the onboard Garmin 1000 (G1000) device, an integrated glass cockpit for businesses and smaller aircraft. To diversify the dataset, we used three aircraft: the Diamond DA42 NG, Piper PA-28 181 Archer II, and Cessna 172 Skyhawk, which are all very popular GA aircraft. Using the G1000, we logged more than 60 flight parameters for each flight, including the timestamp, GPS positional data, pitch, yaw, heading, barometric altitude, vertical speed, ground speed, COM frequency, etc. All these parameters were recorded at one second intervals. The actual flight data had some missing values and redundant parameters, and therefore, to improve the quality of the predictions, we started by cleaning and re-sampling the data. A sample trajectory of the flight can be seen in Figure 2.
3.2. Data Preparation
Initial data preprocessing involved removing all parameters with over certain percentage of missing records. Specifically, we examined each parameter to determine its missing data percentage, and any parameters that exceeded the 40% threshold were removed. Next, we refactored the data based on its data type (e.g., numerical, categorical) to prepare it for further analysis. A quick summary of the raw data and missing columns can be seen in Figure 3. Each data file, depending on the length of the flight, contained approximately 30,000 to 60,000 records, and because the remaining features had significantly fewer missing values, we chose to remove the records with missing or empty values altogether, rather than to rely on any interpolation techniques, such as linear interpolation, to fill the missing data. This resulted in a more concise and clean starting point (Figure 4).
Figure 1. Flight dataset visualization on a geographical map based on the frequency of airports used.
Figure 2. A sample aircraft trajectory path from Aeródromo Municipal de Ponte de Sor (LPSO).
Figure 3. Raw data availability matrix showing the availability of flight parameters and the missing values from the collected data. We found 10 columns with over 40 percent missing data.
Figure 4. Post-filtering data availability matrix showing the availability of flight data after applying data cleaning and filtering techniques.
The basic features required for trajectory prediction are altitude, latitude, longitude, and a timestamp [19] [20] . Because the aircraft trajectory is determined by the time sequence, the aircraft’s current position is heavily influenced by changes that occurred in previous timestamps. Therefore, in addition to the basic features, we also added some helper features such as the heading and ground speed, which improved the model’s accuracy and understanding of the data. The generated dataset was in chronological order, which simplified the cleaning procedure. We used the aircraft’s positional data, heading, and speed for the trajectory prediction, which can be represented as T = {r1, r2, r3,..., rn}, where T refers to the trajectory, n represents the total number of records in each data file, and r represents the latitude, longitude, altitude, heading, and speed of the points of the aircraft trajectory. Because our dataset contained full flights instead of just chunks of flights collected via automatic dependent surveillance-broadcast (ADS-B), we first removed the invalid and empty values from each data file. Secondly, we resampled the data at 1 Hz to keep it consistent across all the flights.
4. Methodology
Meta-learning, also known as compositional learning, is an approach to machine learning that gained traction after Schmidhuber’s work was published in 1994 [27] . In this work, the author proposed a time-varying environment based on the principle of divide and conquer. The proposed system was designed to learn from generated sub-goals using “time-bridging” adaptive models. Time-bridging refers to modeling the dependencies between the sub-tasks and using the resulting knowledge to guide future decision-making. By learning from the experiences of previous tasks, a meta learner can adaptively bridge the gap between tasks, allowing it to generalize to new tasks more effectively. Time-bridging is particularly useful in situations where there is significant variability between tasks or where data is scarce, as it allows the meta learner to leverage previous learning experiences to improve its performance on new tasks. A meta-learning algorithm is specifically designed to learn from the outputs and metadata of other machine learning algorithms. During training, the algorithm is fed with the predictions made by individual models as well as their metadata. Once trained, the model is tested and used to make final predictions. In this section, we first discuss two individually selected models, LSTM and RFR, which are often used in trajectory prediction tasks. We then explain how these models can be combined to improve the final prediction. The overall flow can be seen in Figure 5.
4.1. Individual Selected Models
LSTM, which is a variation of a Recurrent Neural Network (RNN), consists of hidden and cell states, usually denoted by h and c, respectively. In practice, LSTM contains multiple layers to increase the functional complexity represented by the network. Another difference between RNNs and LSTMs is the function of memory, which performs selective information passing to preserve the timesteps. This improves the accuracy of the time series prediction. In addition to using LSTM, we also implemented sliding windows (Figure 6), which helps converts our prediction problem from unsupervised, where we did not have any labels due to the nature of the prediction, to a supervised learning problem, where every nth data record is our label. The final input for the LSTM is constructed with a sliding window of size 10, which was identified after testing multiple window sizes, where every 11th data record was converted to a label for the previous 10 records.
The final data shape can be represented as (x, s, y), where x is the number of records, s is the size of the sliding window, and y represents the selected features for trajectory prediction. Subsequently, the data was divided into two partitions,
Figure 5. Flowchart of the evaluation of the accuracy, precision, and recall of the prediction model.
Figure 6. Overlapping sliding windows for sequence prediction.
namely, the training set and the validation set. In order to ensure that our model was exposed to the entire range of data, we divided our dataset into a training set and a testing set instead of dividing each individual flight into separate sets for training, testing, and validation. Additionally, to further diversify our model’s exposure, we selected a random sequence from each flight to use as the validation set, helping our model to process a full range of inputs across multiple iterations.
Random Forest (RF) has demonstrated its predictive power on numerous occasions [28] . RF, which is an ensemble method, contains multiple decision trees, which rely on the inferred decision rules to make predictions. However, decision trees face the problem of over-fitting, also known as the generalization problem. This issue can be addressed by using ensemble methods, such as Random Forest, which reduce variance through bagging. This helps overcome the risk of over-fitting by combining multiple weak learners into one strong learner. Generally, the performance of Random Forest improves with an increase in the number of decision trees, although this also increases the computation cost.
The implementation flow for these models can be seen in Figure 7, where the initial flow—formatting the data files for readability, data cleaning, and formatting for the model input—is the same as described in the data preprocessing section. Next, we feed the data to the RF and LSTM models and train them. The same sliding window configuration was used in both models.
Figure 7. Workflow of LSTM and RF models with sliding window implementation for aircraft trajectory prediction.
4.2. Meta-Learning: A Model Ensemble Technique
Meta-Learning is a technique in machine learning, in which a model is trained to learn how to learn. It involves combining multiple pre-trained models to achieve improved generalization ability, allowing the model to quickly adapt to new tasks. This leads to better performance on unseen data, even if the task is different from what the model has been trained on. One of the key advantages of meta-learning is transfer learning, which enables the model to transfer knowledge from one task to another, making it easier to solve new problems.
In this paper, we leverage meta-learning to improve the final prediction of the aircraft trajectory. To evaluate the effectiveness of this approach, we trained two individual machine learning models, Long Short-Term Memory (LSTM) and Random Forest Regression (RFR), on the selected dataset. Subsequently, we combined the predictions generated by these models through the k-Nearest Neighbors (k-NN) meta-model to obtain the final prediction. The complete workflow of this approach is illustrated in Figure 8. Here, we are using the combined output of the LSTM and RFR as input for the k-NN model. This approach resulted in an improved final prediction. Our results show that this approach led to a significant improvement in the final prediction as compared to that of the individual models.
5. Results
This section presents the results of our evaluation of the LSTM, Random Forest,
Figure 8. Meta-learning flow diagram with pre-trained LSTM and RFR models.
and meta-learner models. The dataset used for this evaluation was comprised of over 5000 complete flights, with each flight consisting of approximately 30,000 to 60,000 data points recorded at 1-second intervals using the Garmin G1000’s flight data logging capability. To ensure the validity of the results, the dataset was randomly divided into training and testing, with an 80:20 split, with the training set further divided into a training and validation set, with a 90:10 split for each flight. This enabled us to effectively evaluate the performance of the models on complete flights. All experiments were conducted on an 11th Gen Intel Core i7-1185G7 CPU with 16 GB of internal memory and an NVIDIA MX 450 GPU, running on a 64-bit Windows Enterprise operating system.
We started by evaluating the performance of Long Short-Term Memory (LSTM) and Random Forest models using three accuracy indicators: Mean Absolute Error (MAE), Absolute Altitude Error (AAE), and Root Mean Squared Error (RMSE). The formulas for these metrics are presented in Formulas (1)-(3).
(1)
(2)
(3)
where n is the number of samples,
is the i-th predicted value and
is the i-th actual value at the given timestep. AAE, on the other hand, calculates the absolute altitude distance between the predicted altitude value (
) and the actual altitude value (
) at that timestep. Lower values of MAE, AAE, and RMSE indicate more accurate predictions for a model.
The models were trained on 80% of the flight data and evaluated on 20% of same, with 10% of each flight used for validation. The progress of the LSTM model training is depicted in Figure 9. To save resources and time, we employed
Figure 9. Model loss and validation loss progression for LSTM.
early stopping based on the validation loss, where training would automatically stop if there is no improvement for 5 consecutive epochs. On average, the optimal number of epochs was found to be 21. As a result, we present the performance based only on these epochs.
A comparison between the MAE, AAE and RMSE is presented in Table 1 The distribution of these metrics is depicted in Figure 10, which plots the histograms of the errors for the selected evaluation criteria. Sequence (a) represents the results for the LSTM model, while sequence (b) represents the results for the RFR model. The histograms for the LSTM model skew towards the left, indicating the convergence of the error rate to zero. However, in the case of RFR, most of the values stay around the 0.10 error mark, with a few staying well below 0.02, and others crossing further away from the 0.10 error mark. Overall, the errors are consistent, and the histogram shows the performance of the RFR effectively. As illustrated in Figure 10, the performance of the LSTM model was found to be significantly more consistent than that of the RFR model. The LSTM model performed much better in terms of MAE, AAE, and RMSE.
While the LSTM and RFR models were effective in making accurate trajectory predictions, we wanted to see if combining these models could result in even better predictions. So, in the second phase of our project, we aimed to enhance the prediction accuracy by combining the previously trained models into a meta-learner. While individual machine learning models have shown great potential in predicting the trajectories of aircraft, they still face limitations when it comes to handling the variability in flight patterns. Hence, the idea behind this approach is to leverage the strengths of multiple models to overcome their individual weaknesses and produce a more accurate and robust prediction. We implemented a meta-learner, a higher-level model that learns how to combine the predictions generated by the individual models. The meta-learner was trained on the predictions generated by the LSTM and RFR models using a k-Nearest Neighbors (k-NN) algorithm. This approach enabled the meta-learner to learn how to combine the predictions generated by the individual models, resulting in improved accuracy in the final prediction.
Table 1 compares the MAE, AAE and RMSE for each individual model and the improvements we get from the meta-learner approach. MAE measures the average magnitude of errors in the predictions, while AAE measures the average absolute difference between the predicted and actual altitude values. RMSE measures the standard deviation of the errors. The lower the values of these metrics, the better the performance of the model. According to Table 1, our meta-learner approach outperforms both LSTM and RFR by a significant margin. The RFR model has the highest MAE and RMSE, indicating that its predictions have a higher magnitude of errors as compared to the other two models. It also has the highest AAE, indicating that its predictions pre-sent the highest absolute difference between the predicted and actual altitude values. The LSTM model performs better than the RFR model in terms of MAE, AAE, and RMSE, showing a reduction in errors and absolute differences as compared to RFR. The
Figure 10. MAE, MSE, RMSE distribution for (a) showing results for LSTM; (b) showing results for RFR.
Table 1. Mean performance comparison for each model (MAE, AAE and RMSE).
meta-learner model performs even better than the LSTM model in terms of MAE, AAE, and RMSE, indicating that it makes more accurate and precise predictions than the other two models.
Comparing these models, the results showed that the meta-learner model out-performed the other two models by a significant margin. Specifically, the MAE of the meta-learner model (0.00163) was 83.29% and 86.53% lower than that of the RFR (0.09768) and LSTM (0.01238) models, respectively. The AAE of the meta-learner model (9.23 ft) was also much lower than that of the other two models, representing a 95.79% and 80.76% improvement over the RFR (218.54 ft) and LSTM (47.96 ft) models, respectively. Similarly, the RMSE of the meta-learner model (0.00292) was 70.21% and 84.09% lower than that of the RFR (0.09799) and LSTM (0.01834) models, respectively. This indicates that the Meta-Learner model is significantly more accurate and precise than the RFR model in its predictions. The suggested approach enabled us to predict the mid- to long-term trajectory with reasonable accuracy, highlighting the effectiveness of the Meta-Learner model in predicting the trajectory.
To evaluate the performance of both the individual models and the final model, we selected a random trajectory from the test dataset and analyzed a subsection of it. The selected trajectory represented a typical scenario in which the aircraft is in its final approach and landing phase. We compared the predicted trajectories of the LSTM and RFR models with those of the final Meta-Learner model to determine the method that was more effective at predicting the aircraft’s path. Figure 11 shows the individual comparison between the latitude, longitude, and altitude for all the models. Figure 12 illustrates the comparison between the original labels and the final predictions for latitude, longitude, and altitude, separately. Finally, Figure 13 shows the same comparison in the 3D space, which helps better visualize the performance of the meta-learner with LSTM and RFR. By analyzing the actual trajectories alongside the predicted ones, we evaluated the accuracy and precision of each model in capturing the nuances of the trajectory. Our analysis focused on metrics such as MAE, AAE, and RMSE to determine the most effective model. The visualizations in the figures clearly show that the final Meta-Learner model outperforms both the LSTM and RFR models in terms of trajectory prediction accuracy and precision, indicating its superiority in predicting the mid- to long-term aircraft trajectory.
6. Discussion
In this paper, we proposed a meta-learning approach to improve the accuracy
Figure 11. Aircraft’s latitude, longitude and altitude comparison for LSTM, RFR and meta-learner.
Figure 12. Meta-learner comparison for latitude, altitude and longitude.
Figure 13. 3D trajectory comparison with RFR, LSTM and Meta-Model.
and stability of trajectory prediction for general aviation aircraft. Our method involved combining multiple complementary individual machine learning models to create a higher-level learner that could leverage the strengths of each model and produce better predictions by combining their outputs. To this end, we used two machine learning models, Long-Short Term Memory (LSTM) and Random Forest Regression (RFR). We used a private dataset of over 5000 real flights collected from three different aircraft: Diamond DA42 NG, Piper PA-28 181 Archer II, and Cessna 172 Skyhawk, to train all the models. In this paper, we only focused on the parameters available from the air-craft cockpit (altitude, latitude, longitude, heading, and speed). We used historical air-craft data as inputs for the initial training of the individual models. The meta-learner combines the outputs of both models and learns to estimate an output based on the expected values.
The final output accuracy was significantly improved, as evidenced by our results. We evaluated each individually selected model, including LSTM and RFR, and our approach, using three evaluation criteria: Mean Absolute Error (MAE), Average Absolute Error (AAE), and Root Mean Squared Error (RMSE). The results showed that RFR was the least accurate model, while LSTM performed relatively better. However, our approach outperformed both, with an 83.29% improvement in MAE, an 80.76% improvement in AAE, and a 70.21% improvement in RMSE, demonstrating that combining multiple methods can result in a more accurate and stable prediction.
While our study provides valuable insights, it also has some limitations. For in-stance, we only combined two data-driven machine learning models. To enhance the stability of our approach, integrating other models that overcome the shortcomings of these models has the potential to be beneficial. Furthermore, incorporating physics-based prediction models such as Kalman Filters (KF) with our data-driven model may further increase the accuracy and stability of the meta-learner on unseen data. Additionally, incorporating weather data in the trajectory prediction could help the machine learning model better understand the data and generalize the predictions. Moreover, with recent advancements in technology, leveraging simulated data to train these models on appropriately large datasets can potentially result in more robust and improved prediction quality. Further research could explore the effectiveness of these approaches and expand upon our findings.
7. Conclusion
In this paper, we proposed a meta-learning approach to predict short to mid-term trajectories of aircraft using historical real-flight data collected from multiple GA aircraft. A combination of Random Forest Regression (RFR) and Long Short-term Memory (LSTM) is combined using k-Nearest Neighbors (k-NN) to improve the prediction accuracy and to output the final prediction based on the combined output of the individual models. The proposed methodology for aircraft trajectory forecasting was discussed in detail, along with a literature review and an overview of data preprocessing techniques. We evaluated each individually selected model, including LSTM and RFR, and our approach, using three evaluation criteria: MAE, AAE, and RMSE. The results demonstrate that the proposed meta-learner outperforms individual models in terms of accuracy, providing a more robust and proactive approach to improve operational safety in GA. Our findings suggest that combining multiple models can significantly improve the accuracy and stability of trajectory prediction. Future work could include incorporating other models or physics-based prediction models to improve the stability of the proposed approach. Incorporating weather data in the trajectory prediction could help the machine learning model better understand the data and generalize the predictions, while the use of simulated data on appropriately large datasets could result in more robust and improved prediction quality.
Acknowledgements
I would like to sincerely thank my supervisor, Prof. Rene Jr. Landry, who provided all the necessary resources and equipment to complete this research. I also sincerely appreciate Jamal H. Markani, for his inspiration and support.