Flexible Stock Market Algorithm

Alexander Rubchinsky; Daria Chubarova

doi:10.4236/ti.2025.164013

Technology and Investment > Vol.16 No.4, November 2025

Flexible Stock Market Algorithm

Alexander Rubchinsky, Daria Chubarova
National Research University Higher School of Economics, Moscow, Russia.
DOI: 10.4236/ti.2025.164013 PDF HTML XML 21 Downloads 142 Views

Abstract

The article considers one of the most famous examples of socio-economic systems characterized by significant uncertainty—the S&P-500 stock market, where shares of 500 largest US companies are traded. The flexible algorithm for daily trading has been developed. It is based on known fixed data about cost of shares in previous days as well as on some previously calculated values. Each day, one of the two algorithms for tomorrow trading is selected, depending on assumptions about the cost changes of some stocks at the next day. It can be similar to the change on the previous day or the opposite of it. The validity of the assumptions made is confirmed by the experimental results of trading using the proposed algorithm for 20 years from 1995 to 2014.

Keywords

Flexible Algorithm, Stock Market, Graph Decomposition, Clustering

Share and Cite:

Rubchinsky, A. and Chubarova, D. (2025) Flexible Stock Market Algorithm. Technology and Investment, 16, 211-240. doi: 10.4236/ti.2025.164013.

1. Introduction

Stock market analysis and stock trading algorithms based on it have been of great interest to researchers for many decades. We would like only to mention some review and research publications of recent years (Mohammed Almasani et al., 2017; Aldridge, 2020; Cover, 1991; Bruni, 2017; Chan, 2021; Chen & Tsang, 2021; Clenow, 2023; Dashore et al., 2010; Donadio & Ghosh, 2019; Fazli & Jafari, 2012; Fung, 2021; Georgieva et al., 2015; Jallo et al., 2013; Jing et al., 2023; Kissell, 2014; Mayanja, Mataramvura, & Wilson, 2013; Pakpahan et al., 2019; Prado, 2018; Ruke et al., 2024; Seth, 2023; Guo, 2020; Ziemba, Lleo, & Zhitlukhin, 2018; Zou et al., 2023). The purpose of this article is not to analyze the achievements and shortcomings in this broad and important field (any reasonable review will require not only a large text amount, but also serious research work). It can only be noted that the wide variety of proposed methods, problem statements, and algorithms most likely indicates a certain dissatisfaction with both the existing formal models and the financial results obtained on their basis.

Trying to at least superficially describe the main direction of various works in this area, we can come to the following conclusion. Almost all stock market trading algorithms proposed in the literature are based on the following scheme.

1) The behavior of the stock market over a certain period of time (up to its end, including the current moment t) is analyzed, and a mathematical model (stochastic, deterministic, multicriteria, fuzzy, etc.) describing this behavior is proposed.

2) It is explicitly or implicitly assumed that the market behavior (i.e., price changes for stocks from at least some groups) in the next period will correspond to the previously constructed model.

3) The specified groups of stocks are offered for subsequent trading.

Of course, all the details mentioned may differ significantly from each other within the framework of this general scheme.

The proposed article also fits into this scheme. It relies to some extent on recent publications (Rubchinsky & Baikova, 2023). At the same time, the previously proposed approaches are noticeably developing, which has allowed us to obtain new results related to trading on the stock market. The essence of the matter lies in the new implementation of several steps within the framework of the general scheme mentioned above. Let’s focus on two of them.

А) When determining groups of stocks to trade for the next period, one of two mutually exclusive algorithms is chosen: one of them corresponds to the previously elaborated model (see steps 1 and 2 above), and the other is opposite to it (so, in any case the amount of income for each of the stocks under consideration, as well as for total income for this period, is 0). One of these algorithms is selected based on the previous calculations.

В) In many stock market trading algorithms, there is a clustering of the set of stocks under consideration based on the so-called market graph (Jallo et al., 2013). The proposed approach suggests a recently designed graph decomposition algorithm (Rubchinsky, 2018), applicable, generally speaking, to arbitrary undirected graphs. In relation to the stock market, the clusters found define groups of stocks with similar behavior. Unlike other well-known algorithms, the mentioned clusterization algorithm does not create the only “most correct” graph partition into subgraphs, but rather offers a family of such partitions, from which one can be selected based on various considerations. In particular, the method does not reject clusters with a very large ratio of the powers of the parts, when this is an objective property of the modeled systems.

Naturally, as in any other approach to trading on the stock market, there are no guarantees of winning, however, experimental results over a 20-year period (1995-2014) demonstrate an almost constant increase of accumulated income. One of the graphs is shown in Figure 1. Note that incomes grow fastest just in the quarters preceding the great crises of 2001 and 2008. This topic is discussed and explained in detail when analyzing experimental results.

The work is structured as follows. Section 2 provides a brief description of the S&P-500 stock market and provides a general flowchart of the quarterly trading algorithm (QTA) over a fixed period of time (in this case, one quarter). Its blocks are described in detail in Section 3. In Section 4, experimental results of the algorithms described in Section 3 are presented and the necessary comments are provided. In conclusion, possible modifications of the proposed approach and its basic algorithms are indicated, as well as some areas of further research and the main conclusions are given.

Figure 1. Typical graph of accumulated income.

For ease of reading, the following Table 1 provides a list of the main notations and definitions used in the article. Of course, they are further explained as they appear.

Table 1. Parameters and designations of the proposed general scheme.

#	Name	Value	Explanation
0	QTA		Quarterly trading algorithm
1	p	58 - 64	Number of working days per quarter
2	t	0 − (p − 1)	The index of the working day since the beginning of the quarter
3	l	15	Viewing depth
4	S		Current value of accumulated income
5	sg	{0, 1}	Type of algorithm
6	V(t)		Technical value at day t of the quarter
7	S(t)		Current income at day t of the quarter
8	K	5	Number of parallel runs
9	D = (d_ij)		Distance matrix
10	G		Market graph
11	r	12	Number of market graph clusters
12	T	500	Number of constructed dichotomies of one graph
13	P		Currently processed graph
14	g	5	Maximum initial load on graph edges
15	f_j	>0	Load in j-th edge of the graph
16	F_max	>0	Maximum current load on graph edges
17	F_p	>0	Maximum edge load along the path p
18	S(X)	≥0	Average value of the distance between vertices in the set X

2. The S&P 500 Stock Market and the Structure of the Quarterly Trading Algorithm

Let’s start with the concepts and definitions necessary for a formal description of the stock market. The paper considers a fragment of the US stock market—S&P-500 (500 largest companies in the USA). Daily data on stock prices at the close of trading was used for 20 years and one month—from 12/01/1994 to 12/31/2014. Individual quarters were considered as the main periods. The number of working days per quarter is in the range of 58 - 64. The composition of the shares has changed over the years, but these changes were quite rare (no more than 10 shares left the market per quarter and the same number of new ones appeared, but in most cases, it was 1 - 4 companies). In addition, all constructions in this section belong to one quarter (there are 80 quarters all in all).

Let t be the fixed working day of a given quarter (t = 0, 1, 2, ..., p − 1). It seems natural to assume that all stocks traded on the S&P-500 market before day t, as well as their closing prices, are known starting from day x, on l business days preceding day t (including the first business day 0). Therefore, at the beginning of the quarter, the days of the previous quarter are included in the number l of the preceding days. The number l is 15. The choice of the number l, as well as all other parameters used in the proposed general algorithm, will be discussed below, in subsection 4.3. It is equally natural to assume that stock prices on day t and all subsequent days are unknown.

In many publications on the stock market, it was assumed that the change in stock prices on the stock market is described by one or another random process (known, partially known, or completely unknown). However, these processes can be considered as probabilistic with a with great caution. In the proposed trading algorithm, no probabilistic as well as any other assumptions are made about previous stock prices. All of them are considered simply as fixed numerical data that is fully known at the end of the next working day.

The enlarged flowchart in Figure 2 shows the QTA. The output of the QTA is the income S received in this quarter (it can be negative).

Figure 2. The enlarged flowchart of the QTA.

Before proceeding to the description of the blocks of the presented flowchart, let’s take a closer look at the time sequence of operations performed over two business days. This sequence is shown in Table 2.

Table 2. Time flowchart of the trading algorithm.

Trading ends on day t − 1
	Initialization
	Data preparation
	Clusterization
	Highlighting promising stocks
	Forward and backward trading algorithm
Trading starts on day t
	The algorithm of real trading
	Recalculation of accumulated results
	Moving on the next day
Trading ends on day t

The importance of this table lies in the fact that it is the actual sequence of operations performed that is associated with the uncertainty of the stock market. It is not possible to find out the income from operations on day t until all previous operations are completed. And when performing all these operations, the value of the shares at the end of trading is unknown and, therefore, the future gain or loss on that day is unknown. This is discussed in more detail in subsection 3.5. Initialization is performed only on the first business day of the quarter (it is convenient to take its number as 0).

3. Description of the Algorithms of the QTA Flowchart

Further, the basic algorithms of the QTA flowchart in Figure 2 are described in subsections 3.1 - 3.8. We will describe them in the same order in which they are executed in the specified flowchart. Let’s pay attention to the fact that all the current results of all blocks of the algorithm are saved and can be used in further operations.

3.1. Initialization

The initial day number t = 0 and the integer parameter sg = 1 are set. Two accumulative scalar quantities V(t) and S(t) are determined. Both at t = 0 are initialized with zeros. The number of parallel runs that are performed with the same initial data using the same algorithm is indicated by K. Since the algorithm uses a standard random seed, the results are different. The next three blocks 1 - 3 are executed in parallel for each index i = 1, ..., K. In the article K = 5.

3.2. Data Preparation

The input for block 1 is the prices of all stocks available on the market (at the time of the termination of trading day before the business day with the number t) for the preceding l business days (see Figure 2 and Table 2).

The output of this block is:

а) The matrix D of the distances between stock prices for l business days (the number 1 − r_ij is taken as the distance, where r_ij is the correlation coefficient between the i-th and j-th stock prices for the last l days). Thus, for correlated stocks, the distance is close to 0, and for strongly dissimilar stocks, it is close to 2.

b) A graph G constructed using this matrix, the vertices of which correspond to the considered set of stocks, and each vertex is connected to the 4 nearest ones by a specified distance. Note that this definition does not imply that the degree of each vertex is 4, but only that it is at least 4.

Here, just for clarity, is a well-known algorithm for constructing this graph, usually called the neighborhood graph (in relation to the stock market, this graph is called the market graph).

Algorithm for constructing a neighborhood graph

The input of this algorithm is a distance matrix D of size n × n (n denotes the number of objects in the system under consideration—in this case, the number of shares traded on the day t − 1 and on the previous 14 days).

Step 1. Let’s define an integer matrix A of size n × n and let all the elements a_ij be equal to 0.

Step 2. For each i = 1, ..., n, we perform the following operations.

2.1) Let us determine the indices i₁, i₂, i₃, i₄, k₁ ..., k_s from the distance matrix D, such that

$d_{i i_{1}} \leq d_{i i_{2}} \leq d_{i i_{3}} \leq d_{i i_{4}} = d_{i k_{1}} = \dots = d_{i k_{s}}$ , (1)

and all other elements of the i-th row of the matrix D are strictly larger than those written out in (1). Thus, objects i₁, i₂, i₃, i₄, k₁ ..., k_s are the closest to object i. In this case, objects with numbers k₁, ..., k_s are located at the same distance from object i. Note that the very presence of objects k₁, ..., k_s occupying places after the fourth in the chain (1) is not guaranteed: objects satisfying the required inequalities may simply not exist. At the same time, the existence of objects i₁, i₂, i₃, i₄ occupying the first four places in the chain (1) is guaranteed (in case n > 4).

2.2) For all j = i₁, i₂, i₃, i₄, k₁ …, k_s let a_ij = 1 and a_ji = 1.

The output of the algorithm is the adjacency matrix of the neighborhood graph. It can be clearly seen that the constructed graph does not depend on specific numbering satisfying the above conditions.

3.3. Clusterization

In this subsection we describe the mentioned in section 1 the new graph decomposition algorithm. Its inputs are a graph G and a matrix D (see subsection 3.2). The set of stocks is divided into r clusters (r = 12). The following provides a detailed description of the clustering algorithm. It was

1) Initialization. Definition of the required parameters. As stated above, the number of clusters is set to 12. The initial clustering consists of a single element—the original graph G. The number of subgraphs is assumed to be 1, and the number of the last subgraph in the clustering under construction is also 1.

2) Selection of the next subgraph from the already constructed family. The subgraph with the maximum number of vertices is selected (arbitrary, if there are several of them).

3) Construction of a family of dichotomies for the subgraph chosen in Step 2. This procedure is described in detail in Subsection 3.3.1.

4) Selection of a dichotomy from the family obtained in Step 3. The selection procedure is discussed in Subsection 3.3.2.

5) Construction of a new clustering. The subgraph selected in Step 2 is replaced by the subgraph from the dichotomy chosen in Step 4 that contains a greater number of vertices. It retains the same index as the replaced subgraph. Additionally, the other subgraph from the dichotomy (identified in Step 4) is incorporated into the clustering and assigned the last index n + 1, where n is the number of subgraphs in the previous clustering.

6) Updating the number of subgraphs. The index is updated as n = n + 1, ensuring that n now is equal to the number of subgraphs in the newly formed clustering.

7) Verification of the inequality n < r. If the condition holds, the algorithm returns to Step 2. Otherwise, the desired clustering is completed. The initial graph G is now partitioned into r disjoint subgraphs, and the algorithm stops.

The output of the presented algorithm consists of three sets:

a) A set of clusters (subsets of the vertex set of the original graph);

b) A set of numerical values representing the density of each constructed cluster;

c) A set of numerical values representing the cardinality of each constructed cluster.

3.3.1. Algorithm for Constructing a Family of Dichotomies (ACFD)

This subsection presents one of the fundamental algorithms used in the proposed general trading algorithm. The ACFD constructs a family of dichotomies F. It consists of an initialization stage and a main stage, which involves T-fold sequential repetition of a general step. The parameter T is one of the predefined parameters of the ACFD, and in this case, T = 500. Below, we proceed with its formal description.

A) Initialization. Each edge e_j of a given graph P is assigned a randomly generated integer f_j in the range from 1 to g (inclusive), sampled using a standard generator of uniformly distributed random variables (which is used here for the first time). The parameter g is also predefined within the ACFD, and in this study, it is set to g = 5. The value f_j is referred to as the load on the corresponding edge (or edge load). The maximum of the randomly assigned values f_j is denoted as F_max. The output of the initialization stage is a set of load values assigned to all edges of the given graph P. It is important to note that P is one of the subgraphs of the original graph G. During the first execution of the ACFD, the graph P coincides with the graph G.

B) Main Stage. The inputs to the main stage include:

a) The given graph P;

b) The current set of load values assigned to all edges of graph P;

c) The current maximum value F_max.

The output of the main stage is described below, immediately following the description of its steps. The flowchart of the main stage is presented in Figure 3. The steps of the algorithm at this stage are detailed below.

Steps of the ACFD

1) Using a standard generator of uniformly distributed random variables, two distinct vertices of the graph are selected (this is the second and the last instance of random numbers utilization in the general trading algorithm).

2) Dijkstra algorithm is applied to determine the shortest path connecting the two selected vertices. The length of an edge is defined as its current load. The path length is determined not by summing the edge lengths but by taking the maximum edge length along the path. It is well known that Dijkstra algorithm can be applied in such cases with a single modification: when determining the extended path, instead of summing the length of the initial segment and the newly added edge, the maximum of these two values is recorded. It is important to emphasize that the length of the initial segment is not the sum of its edge loads but the maximum load among those edges.

3) The maximum edge load F_p along the shortest path p found in Step 2 is determined.

4) If F_p < F_max, proceed to Step 5. Otherwise (i.e., if F_p ≥ F_max), proceed to Step 6.

5) The edge loads along the path identified in Step 2 are incremented by 1. Then, return to Step 1.

6) The connected components of the given graph P are determined as if all edges with the maximum load were removed. Note that, in reality, no edges are actually removed from the graph, and P remains unchanged.

7.1) The connected component with the maximum number of vertices is designated as the first subgraph of the constructed dichotomy. The union of all remaining components (or the single remaining component, if only one exists) is designated as the second subgraph of the constructed dichotomy.

7.2) When constructing both components, all edges of the original graph are preserved except those connecting vertices from the first component to vertices of the second component. It is evident that this procedure effectively removes a certain minimal cut of the given graph. Consequently, the next dichotomy of graph P is formed. It is also clear that if the original graph was connected, both constructed subgraphs remain connected, too.

8) All edges of the most recently identified path in the original graph are incremented by 1. As a result, at least one of these edges will have a load equal to F_max + 1. Assign F_max = F_max + 1.

The current execution of the main stage is now complete. Its outputs are:

a) The modified set of current edge loads in graph P;

b) The updated maximum edge load F_max in graph P;

c) An additional dichotomy of the original graph.

Figure 3. ACFD flowchart. 1. Choose two vertices. 2. Find path. 3. Determine F_p. 4. Check inequality F_p < F_max. 5. Modify loads. 6. Find connectivity components. 7. Find new dichotomy. 8. Modify loads and F_max.

Illustration of Steps 7.1, 7.2, and 8 of the main stage is presented in Figure 3. Below, we provide some explanations and comments on these steps. Prior to executing the comparison of loads in Step 4, three distinct cases—denoted as cases A, B, and C—are considered. In Figure 4, bold segments represent edges with the maximum load, while thin lines indicate paths connecting the vertex pair a and b.

In case A, the set of all edges with the maximum load does not constitute a cut of the given graph. Consequently, the shortest path found in Step 2 does not contain edges with the maximum load due to the minimax definition of path length. Therefore, the maximum load F_p determined in Step 3 is less than F_max, and the algorithm proceeds to Step 5, where the loads of all edges in the identified path are increased by 1. Subsequently, execution returns to Step 1 of the general stage. This consideration is central to the proposed algorithm. Indeed, if the maximum load F_p in the edges of the identified path were equal to F_max, this would imply that the set of edges with load F_max forms a cut of the graph, such that the constructed path tra-verses an edge of this cut. If these edges did not form a cut, the minimax Dijkstra algorithm would have found a path in which all edges have a load strictly less than F_max.

Case A Case B

Case C

Figure 4. Cuts and paths on the graph.

In case B, the set of all edges with the maximum load does contain a cut of the original graph. However, the identified path does not include edges with the maximum load because its endpoints are located on the same side of the cut. The process thus continues analogously to case A.

In case C, the set of all edges with the maximum load constitutes a cut of the original graph, and the endpoints of the identified path are positioned on opposite sides of this cut. Consequently, at least one edge along this path belongs to the specified cut and has a load equal to F_max. Therefore, after the comparison in Step 4, the process follows a different trajectory (through Steps 6 - 8), resulting in the construction of the next dichotomy of the original graph at Step 7.

Note that all the considerations in this subsection can be applied to arbitrary graphs, not just to graphs modeling systems of a specific type. Once again, we emphasize that all edge removals are virtual, and the original graph P remains unchanged throughout all T executions of the main stage.

3.3.2. Selection of a Dichotomy

Let us introduce the necessary formal concepts. Let X be a certain set of vertices of the considered graph P, and let n denote the number of vertices in P. Define D = (d_ij) as the distance matrix between the vertices of P (i, j = 1, …, n). We assume that d_ij ≥ 0, d_ij = d_ji и d_ii = 0, however, the triangle inequality is not required. The elements of matrix D are interpreted as the degree of dissimilarity between the objects corresponding to vertices i and j of graph P. Let S(X) denote the average distance between the vertices within the set X. Formally,

$S (X) = \frac{1}{0.5 \times p \times (p + 1)} \sum d_{i j}$ , (2)

where summation is taken over all unordered pairs of distinct vertices from the set X, and p = |X|. Additionally, we define S(X) = 0 for single-element sets X.

It is conceptually evident that the function S(X) characterizes the density of objects within the set X and their mutual proximity. A lower value of S(X) indicates a higher degree of similarity among the vertices in X, and, consequently, among the objects they represent within the system modeled by graph P.

Now, we can formulate the criterion for selecting a single dichotomy from the constructed set of dichotomies. Each dichotomy consists of two subgraphs with vertex sets X₁ and X₂. Let us consider the densities S(X₁) and S(X₂). As the quality criterion of a dichotomy (X₁, X₂), we take the larger (i.e., the worst-case) value among the two numbers—S(X₁) and S(X₂). Accordingly, the optimal dichotomy among all available dichotomies is the one that minimizes this criterion.

Let us draw attention to the fact that the assumption of setting distance matrix is used only at a single stage of the ACFD process—the stage of selecting one dichotomy from the constructed set. This naturally raises the question of what to do if the distance matrix is not provided. Essentially, some reasonable characteristic of graph density is required. Such a characteristic can be taken as the ratio of the number of edges in the graph to their maximum possible number, 0.5 × (n − 1) × n, and then applying a minimax approach—i.e., selecting the worst (minimum) density value from the two subgraphs forming the dichotomy and then choosing the dichotomy for which this worst value is maximized. Finally, one can select the minimum number of vertices across the two subgraphs and then choose the dichotomy for which this value is maximized. This method expresses the desire to divide the graph into two parts as “equally” as possible.

In relation to the financial market considered in this paper, a well-known approach is naturally applicable: calculating the correlation coefficients r_ij between the price time series of two stocks i and j and determining the distance between them using formula (1). Thus, the output of block 2 (clustering) in the flowchart of the QTA (Figure 2) consists of:

а) a set of clusters;

b) the mean pairwise distances between elements within each cluster;

с) the number of elements in each cluster.

It should be emphasized that we are dealing with K realizations of these sets. All of them collectively serve as input to Block 3, where the algorithm determines the best cluster within each realization.

3.4. Highlighting Promising Sets of Shares

The following is done in each of the K cluster groups.

1) All clusters with fewer than 20 or more than 200 elements (shares) or an average distance greater than 0.6 (with a possible maximum of 2) are deleted.

2) The remaining clusters in the group are compared according to two criteria: the number of elements (maximization) and the average distance (minimization). The optimal cluster is selected using the ideal point method.

Therefore, there are K sets of shares that are considered promising (see subsection 3.5 below). If there are simply no clusters satisfying the conditions of item 1 in the i-th group, then the i-th set of promising stocks is declared empty. This fact is taken into account in blocks 5 and 6 when calculating the one-day gain or loss. Such situations actually occur in many neighborhoods. The situation when for some days t all promising sets turn out to be empty is not excluded.

Note that the same stocks may be included in different prospective sets.

3.5. Choosing a trading algorithm

This subsection is mandatory before moving on to forward trading (see Table 2). Before discussing the operations of block 4, which is one of the key QTA blocks, it is necessary to introduce the relevant concepts.

3.5.1. Forward and Backward Algorithms of Daily Trading

The main idea in developping various stock market trading algorithms was to assume that the processes were repeatable for at least one day. A certain set of stocks with fairly similar behavior is determined for several days preceding day t (which can be called a set of promising stocks) and it is assumed that the prices of promising stocks will change monotonously. Formally, this means that one of three situations is possible for each promising stock:

А: c(t − 2) > с(t − 1),

B: с(t − 2) < с(t − 1),

C: с(t − 2) = с(t − 1),

and at the same time, the direction of price change in cases A and B will remain the same, i.e. in case A it turns out that

c(t − 2) > с(t − 1) > с(t), (3a)

and in case B, it turns out that

с(t − 2) < с(t − 1) < с(t), (3b)

where с(t) denotes the stock price at market closing on day t.

It is not assumed that the second inequalities in (3a) and (3b) will be fulfilled for all promising stocks, but it is assumed that they will be fulfilled for most of them.

Without discussing the various methods of determining promising stocks, let’s pay special attention to the fact that the values of c(t − 2) and c(t − 1) at the close of trading on day t − 1 are known, but the price of c(t) is naturally not. This is the uncertainty of the stock market.

The trading algorithm, which can be called forward, is connected with the simple reasoning carried out. Each stock from group A is sold at the opening of the market on day t and is bought at the closing of the market on day t at the price c(t), which will be known on day t. Each stock from the group B is bought at the market opening on day t and sold at the market closing on day t at the price c(t), which will be known on day t. For stocks from group C (which are very few or none at all) nothing is being done.

Indeed, in most cases, a forward algorithm generates revenue. However, not always. Let’s look at these situations in more detail. Let inequalities (3a) and (3b) be replaced by inequalities (4a) and (4b):

c(t − 2) > с(t − 1) < с(t), (4a)

с(t − 2) < с(t − 1) > с(t). (4b)

A trading algorithm can be associated with this situation, which is naturally called a backward one. Each stock from group A is bought at the opening of the market on day t and sold at the closing of the market on day t at the price c(t), which will be known on day t. Each stock from group B is sold at the market opening on day t and bought at the market closing on day t at the price c(t), which will be known on day t. A simple fact immediately follows from formulas (3) and (4):

Statement 1. Under any circumstances, the total gain from the forward and backward algorithms is always zero.

At first glance, inequalities (4) might seem relatively rare. However, this is not the case. In complex stock market conditions, the price evolution process becomes chaotic. Drawing on a hydraulic analogy, one could say that inequalities (3) correspond to waves, while inequalities (4) give rise to so-called “dead swell,” which is more dangerous for ships than even large waves. Negative outcomes when applying the forward algorithm occur quite frequently; moreover, the absolute magnitude of losses in these instances far exceeds the gains observed during more regular intervals. All these effects will be demonstrated using real data from the S&P-500 stock market in subsection 4.3.

Of course, we do not know in advance which scenario will unfold on any given day. Nevertheless, it is possible to propose a certain heuristic, whose sole justification is the consistently positive results that it has produced over the 20-year period described in Section 2. Experimental findings are provided in section 4.

Before proceeding further, we will illustrate the application of both algorithms in simple model scenarios.

Example 1. Demonstration of gains and losses under the forward and inverse algorithms.

Case 1. Let the price of a certain stock be c(t − 2) = 5, c(t − 1) = 3, c(t) = 2.

Forward algorithm. The stock is sold on the morning of day t at the previous evening price of 3 and bought at the current evenings price of 2. The stock remains in the same hands, yielding a profit of +1.

Backward algorithm. The stock is bought on the morning of day t at the previous evening price of 3 and sold at the current evening price of 2. The profit in this case is −1.

Case 2. Let the price of a certain stock be c(t − 2) = 3, c(t − 1) = 4, c(t) = 6.

Forward algorithm. The stock is bought on the morning of day t at the previous evening price of 4 and sold at the current evening price of 6. The profit is +2.

Backward algorithm. The stock is sold on the morning of day t at the previous evening price of 4 and then bought at the current evening price of 6. The stock remains in the same hands, resulting in a profit of −2.

Case 3. Let the price of a certain stock be c(t − 2) = 5, c(t − 1) = 2, c(t) = 3.

Forward algorithm. The stock is sold on the morning of day t at the previous evening price of 2 and bought at the current evening price of 3. The stock remains in the same hands, and the profit is −1.

Backward algorithm. The stock is bought on the morning of day t at the previous evening price of 2 and sold at the current evening price of 3. The profit in this scenario is +1.

Case 4. Let the price of a certain stock be c(t − 2) = 2, c(t − 1) = 5, c(t) = 3.

Forward algorithm. The stock is bought on the morning of day t at the previous evening price of 5 and sold at the current evening’s price of 3. The profit is −2.

Backward algorithm. The stock is sold on the morning of day t at the previous evening price of 5 and then bought at the current evenings price of 3. The stock remains in the same hands, resulting in a profit of +2.

3.5.2. Description of the Proposed Heuristic (Block 4)

Recall that by the start of day t, the accumulated quantities V(t) and S(t) (see subsection 3.1) are known for t − 1, t − 2, …, 0 (these are calculated in block 6 immediately after the completion of each day operations). Consider the four most recent known values V(t − 1), V(t − 2), V(t − 3), V(t − 4) and check the following four inequalities:

V(t − 1) < −100, V(t − 2) < −100, V(t − 3) < −100, V(t − 4) < −100. (5)

If all these inequalities are satisfied, then from day t until the end of the quarter, the backward algorithm will be used to determine the trading algorithm, and conditions (5) will no longer be checked. Otherwise, on day t, the forward algorithm will be executed. Subsequently, at each step until the end of the quarter, a similar check will be performed under the same outputs. Note that the same value sg is used for all promising sets available on a given day.

Formally, this procedure is expressed by modifying the quantity sg defined in block 0. If conditions (5) are satisfied, sg is set to −1 for the remainder of the quarter; otherwise, sg = 1. The value of sg is the output of block 4 of the algorithm under consideration. In determining sg, the values of V(t) previously obtained in Block 6 are used.

The rationale behind the proposed heuristic is that if four consecutive negative values arise under the forward algorithm, it may be assumed that subsequent negative returns will predominantly decrease. By virtue of Statement 1 in subsection 3.5.1, this implies that returns from the backward algorithm will increase accordingly. Examples using real data will be presented in Section 4.

3.6. The Real Trading Algorithm

The inputs to this algorithm, whose flowchart is shown in Figure 2, are K sets of promising stocks identified in block 3, along with the value of sg obtained in Block 4. For each non-empty set of promising stocks, the operations described at the beginning of subsection 3.5 are performed:

When sg = 1, each stock from group A is sold at the market opening on day t and repurchased at the market close on day t at the price c(t), which is determined on day t. Each stock from group B is purchased at the market opening on day t and sold at the market closing on day t at the price c(t), which is determined on day t.

When sg = −1, each stock from group A is purchased at the market opening on day t and sold at the market close on day t at the price c(t). Each stock from group B is sold at the market opening on day t and repurchased at the market close on day t at the price c(t). The proposed algorithm in section 1 was called flexible (a well-known term in numerous situations not related to stock markets).

In the proposed algorithm, the sale and purchase of shares are separated by the time between the opening and closing of trading (either all shares are sold or all shares are bought). It can be assumed that stock prices are quite close to those officially announced at the close of trading (prices at the opening of trading are discussed in paragraph 4 in the conclusion).

The output of block 5 is the collection of K stock sets together with the respective purchase and sale prices for each stock at the opening and closing of trading on day t.

3.7. Recalculation of Accumulated Gains/Losses

In block 6, the first step is a straight-forward operation that sums the gains/losses realized from actual trading on day t (see block 5). For empty sets of prospective stocks, no calculation is performed. The gains/losses from each stock are counted as many times as that stock appeared in the identified non-empty promising sets. Denote the resulting sum by U(t). Let us return to the quantities V(τ) and S(τ) introduced in block 1. These have already been determined for all τ < t (recall that for τ = 0, they are initialized to zero in block 0).

We set:

V(t) = V(t − 1) + U(t), S(t) = S(t − 1) + U(t), if sg = 1, (6a)

V(t) = V(t − 1) − U(t), S(t) = S(t − 1) + U(t), if sg = −1. (6b)

An important clarification is in order. The values S(t) represent the actual profits/losses obtained using the proposed QTA. In contrast, the values V(t) are purely virtual. They correspond to the profits/losses that would have been obtained if only the forward trading algorithm been used. Note that the signs for the new values of S(t) coincide. This arises from the fact that the gains U(t) themselves are calculated using different formulas.

The daily income is the difference between the values of the specified shares at the opening and closing of trading on a given day (in dollars). Further use of income (including as an input of any investments) within the framework of the proposed model is not considered.

3.8. End of the Period

In block 7, which differs from the other blocks in that it is a logic block, the system either proceeds to the next day t + 1 or terminates the QTA. In the latter case, the real profit S(t) accumulated over the quarter constitutes the output of the QTA.

4. Experimental Results

4.1. General Trading Scheme

The QTA is applied sequentially to each of the 80 quarters. As noted earlier, for each quarter and each trading day, only the stock price data for preceding days and certain already computed quantities (see formulas (5) and (6)) are used. The trading results for each quarter, for each year, and the principal outcome—the sequence of accumulated annual returns—are collectively referred to as a variant. These variants are denoted by letters a, b, c, d, …. Recall that all variants were computed using the same initial data (i.e., daily closing stock prices) and the same software. The difference in the results is determined solely by multiple use of a standard random number generator when selecting vertex pairs as input to Dijkstra algorithm (see subsection 3.3.1).

4.2. Parameter Selection

It should be noted that the developed algorithms make use of numerous parameters, as mentioned in the description of the QTA. Their selection is justified by little more than common sense. This is because the present work makes no attempt to simulate the processes of price formation or trading on the stock market. Instead, all the algorithms analyze already known, fixed real-world data from previous days. The only justification for choosing one set of parameters over another is the potential for obtaining positive outcomes. Some experimental results—subject to natural qualifications—can be regarded as favorable. These are presented below. An attempt to determine the optimal values of these parameters in conditions of a very high level of uncertainty in the stock market is unlikely to lead to any reliable results.

4.3. Examples of the Flexible Algorithm in Operations

Below we present detailed experimental data for variant a. First, let us consider in which quarters the flexible algorithm was actually applied, i.e., in which quarters inequalities (5) were satisfied. The use of the backward algorithm can either increase or decrease the profit for a given quarter compared to the forward algorithm.

The quarters that produced negative or positive outcomes (referring solely to the sign of the difference between the results of the backward and forward algorithms) are listed in Table 3.

Table 3. Use of the flexible algorithm.

Sign	Number of quarters	List
−	15	1995_2, 1997_1, 1998_1, 1998_3, 1998_4, 2001_2, 2002_2, 2003_1, 2003_4, 2004_1, 2006_4, 2010_1, 2011_2, 2011_3, 2012_2
+	33	1998_1, 1999_4, 2000_2, 2000_4, 2002_4, 2003_2, 2003_3, 2004_3, 2004_4, 2005_2, 2005_3, 2005_4, 2013_1, 2007_1, 2007_2, 2007_3, 2007_4, 2008_1, 2008_2, 2008_3, 2009_1, 2009_2, 2009_4, 2010_2, 2010_3, 2010_4, 2011_1, 2011_4, 2013_1, 2013_2, 2013_4, 2014_1, 2014_3

Note that in the remaining 32 quarters (out of 80), the backward algorithm was not used.

We now provide examples of value accumulation in selected quarters from the positive group. It is worth noting the various sign combinations of the accumulated sum (apart from the case of positive virtual accumulation and negative actual accumulation). By construction, such a scenario can only occur in quarters belonging to the negative group. The days on which a transition from the forward algorithm to the backward algorithm took place are underlined in Table 4.

Table 4. Examples of switching between algorithms.

2003_3				2007_1
0	0	0	0	0	0	0	0
1	55.39	55.40	55.40	1	19.84	19.84	19.84
2	−72.77	−17.38	−17.38	2	3.30	23.14	23.14
3	−200.98	−218.36	−218.36	3	−3.34	19.80	19.80
4	69.23	−149.13	−149.13	4	22.72	42.52	42.52
5	−14.06	−163.19	−163.19	5	93.32	135.84	135.84
6	81.46	−81.72	−81.72	6	40.27	176.11	176.11
7	−47.68	−129.41	−129.41	7	1.72	177.83	177.83
8	57.02	−72.39	−72.39	8	−24.80	153.03	153.03
9	−37.41	−109.80	−109.80	9	4.60	157.63	157.63
10	40.58	−69.22	−69.22	10	35.10	192.73	192.73
11	55.60	−13.61	−13.61	11	−28.44	164.29	164.29
12	−103.93	−117.54	−117.54	12	−48.89	115.40	115.40
13	−81.01	−198.55	−198.55	13	−66.46	48.94	48.94
14	−55.10	−253.66	−253.66	14	−45.60	3.34	3.34
15	−51.38	−305.05	−305.05	15	−178.24	−174.90	−174.90
16	−29.80	−334.85	−275.24	16	5.63	−169.28	−169.28
17	−49.23	−384.08	−226.01	17	−0.62	−169.89	−169.89
18	3.79	−380.30	−229.80	18	1.14	−168.75	−168.75
19	4.14	−376.15	−233.94	19	46.11	−122.64	−214.86
20	1.57	−374.59	−235.51	20	47.90	−74.73	−262.76
21	0.28	−374.31	−235.79	21	30.53	−44.21	−293.29
22	−22.38	−396.68	−213.41	22	−15.34	−59.55	−277.95
23	−2.04	−398.72	−211.38	23	0.27	−59.28	−278.22
24	3.00	−395.72	−214.38	24	60.50	1.22	−338.71
25	−0.39	−396.11	−213.99	25	−29.20	−27.98	−309.51
26	−8.55	−404.65	−205.44	26	12.21	−15.77	−321.73
27	18.51	−386.15	−223.95	27	24.06	8.29	−345.78
28	−1.76	−387.91	−222.19	28	−29.72	−21.44	−316.06
29	19.66	−368.25	−241.84	29	110.33	88.89	−426.39
30	−24.27	−392.52	−217.58	30	31.46	120.36	−457.85
31	−4.28	−396.80	−213.29	31	−20.32	100.04	−437.54
32	−18.84	−415.64	−194.45	32	25.94	125.98	−463.48
33	−13.48	−429.12	−180.97	33	−112.26	13.72	−351.21
34	44.68	−384.44	−225.65	34	3.19	16.90	−354.40
35	4.38	−380.06	−230.03	35	25.24	42.15	−379.64
36	12.14	−367.93	−242.17	36	31.44	73.58	−411.08
37	−76.43	−444.36	−165.74	37	106.36	179.94	−517.44
38	34.29	−410.07	−200.03	38	−80.72	99.22	−436.72
39	0.06	−410.01	−200.09	39	−44.77	54.45	−391.95
40	−1.64	−411.65	−198.44	40	115.45	169.90	−507.40
41	39.54	−372.11	−237.98	41	305.51	475.41	−812.91
42	57.71	−314.41	−295.69	42	−419.52	55.89	−393.39
43	90.75	−223.66	−386.44	43	−111.83	−55.93	−281.56
44	24.23	−199.43	−410.67	45	25.45	−106.74	−230.75
45	−27.08	−226.50	−383.59	46	0.00	−106.74	−230.75
46	39.65	−186.86	−423.24	47	−211.47	−318.22	−19.28
47	−58.98	−245.84	−364.25	48	−166.62	−484.84	147.34
48	−69.64	−315.48	−294.62	49	79.61	−405.23	67.73
49	119.56	−195.91	−414.18	50	−96.14	−501.37	163.87
50	−63.61	−259.52	−350.57	51	−94.26	−595.63	258.13
51	13.02	−246.50	−363.60	52	59.04	−536.59	199.09
52	−0.38	−246.88	−363.21	53	147.68	−388.91	51.41
53	−74.14	−321.03	−289.07	54	−25.28	−414.19	76.69
54	−14.06	−335.09	−275.00	55	0.39	−413.80	76.30
55	−59.55	−394.64	−215.45	56	6.11	−407.69	70.19
56	−1.46	−396.11	−213.99	57	31.74	−375.95	38.45
57	63.70	−332.40	−277.69	58	134.71	−241.24	−96.26
58	−54.26	−386.66	−223.44	59	−61.34	−302.58	−34.92
59	−97.76	−484.41	−125.68	60	−67.15	−369.73	32.23
60	55.80	−428.61	−181.48
61	82.44	−346.18	−263.92
62	−67.65	−413.82	−196.27
63	−70.00	−483.82	−126.27
2008_3				2011_4
0	0	0	0	0	0	0	0
1	73.17	73.17	73.17	1	−824.32	−824.32	−824.32
2	−41.09	32.08	32.08	2	247.09	−577.23	−577.23
3	5.86	37.95	37.95	3	361.88	−215.35	−215.35
4	0.00	37.95	37.95	4	−339.87	−555.22	−555.22
5	−729.20	−691.25	−691.25	5	−427.53	−982.75	−27.69
6	−132.32	−823.58	−823.58	6	2.53	−980.22	−30.22
7	−24.97	−848.54	−848.54	7	−6.10	−986.32	−24.12
8	229.02	−619.52	−619.52	8	−33.19	−1019.50	−90.93
9	97.31	−522.22	−716.83	9	−212.70	−1232.20	121.77
10	−240.99	−763.21	−475.84	10	−286.47	−1518.67	408.23
11	664.27	−98.94	−1140.11	11	−586.64	−2105.31	994.88
12	−41.81	−140.75	−1098.31	12	−347.13	−2452.44	1342.01
13	−23.41	−164.16	−1074.89	13	−168.37	−2620.82	1510.38
14	−186.12	−350.27	−888.78	14	388.70	−2232.11	1121.68
15	176.02	−174.25	−1064.80	15	0.00	−2232.11	1121.68
16	−563.39	−737.64	−501.41	16	−416.53	−2648.64	1538.21
17	29.37	−708.27	−530.78	17	−286.30	−2934.95	1824.51
18	−10.28	−718.55	−520.50	18	434.81	−2500.14	1389.70
19	−477.57	−1196.12	−42.93	19	−14.77	−2514.91	1404.47
20	96.52	−1099.60	−139.46	20	−183.02	−2697.93	1587.49
21	−125.61	−1225.20	−13.85	21	740.47	−1957.46	847.02
22	52.84	−1172.37	−66.68	22	−453.23	−2410.68	1300.25
23	57.92	−1114.45	−124.60	23	528.46	−1882.23	771.79
24	−4.23	−1118.68	−120.37	24	−89.06	−1971.29	860.85
25	19.06	−1099.62	−139.43	25	−56.72	−2028.00	917.57
26	−38.93	−1138.55	−100.50	26	46.55	−1981.45	871.01
27	−322.29	−1460.83	221.78	27	−722.88	−2704.33	1593.89
28	158.88	−1301.95	62.90	28	−152.51	−2856.84	1746.40
29	−84.93	−1386.88	147.82	29	247.57	−2609.27	1498.83
30	14.18	−1372.70	133.65	30	−168.99	−2778.26	1667.82
31	−160.47	−1533.16	294.11	31	−68.64	−2846.90	1736.46
32	123.65	−1409.51	170.46	32	−119.63	−2966.53	1856.10
33	−232.76	−1642.27	403.22	33	192.70	−2773.83	1663.40
34	264.68	−1377.59	138.54	34	8.27	−2765.56	1655.13
35	74.79	−1302.80	63.75	35	−13.22	−2778.79	1668.35
36	−1.05	−1303.85	64.80	36	59.36	−2719.43	1608.99
37	−19.63	−1323.47	84.42	37	−0.21	−2719.64	1609.20
38	−201.26	−1524.74	285.69	38	2.69	−2716.95	1606.52
39	−29.53	−1554.26	315.21	39	0.00	−2716.95	1606.52
40	19.30	−1534.96	295.91	40	0.00	−2716.95	1606.52
41	155.18	−1379.78	140.73	41	43.91	−2673.04	1562.61
42	−174.29	−1554.07	315.02	42	−58.64	−2731.68	1621.25
43	−38.74	−1592.80	353.75	43	9.32	−2722.36	1611.92
44	54.28	−1538.53	299.48	44	9.54	−2712.82	1602.39
45	−295.99	−1834.51	595.46	45	−41.62	−2754.44	1644.01
46	−219.47	−2053.99	814.94	46	−6.42	−2760.86	1650.43
47	−5.24	−2059.23	820.18	47	−131.31	−2892.17	1781.74
48	−251.12	−2310.36	1071.31	48	−191.43	−3083.60	1973.16
49	−114.83	−2425.19	1186.14	49	0.00	−3083.60	1973.16
50	140.81	−2284.38	1045.33	50	187.49	−2896.10	1785.67
51	32.28	−2252.10	1013.05	51	0.00	−2896.10	1785.67
52	−546.24	−2798.34	1559.29	52	−18.22	−2914.32	1803.88
53	−384.09	−3182.43	1943.38	53	60.84	−2853.48	1743.04
54	−230.89	−3413.32	2174.27	54	−40.26	−2893.74	1783.31
55	−475.68	−3889.00	2649.94	55	−637.00	−3530.75	2420.31
56	1002.92	−2886.08	1647.03	56	−23.09	−3553.84	2443.41
57	−339.10	−3225.18	1986.13	57	−5.72	−3559.56	2449.13
58	89.15	−3136.03	1896.98	58	40.24	−3519.32	2408.89
59	−106.24	−3242.27	2003.22	59	8.80	−3510.53	2400.09
60	27.52	−3214.75	1975.69	60	−150.79	−3661.32	2550.89
61	96.61	−3118.13	1879.08	61	−219.53	−3880.85	2770.42
62	−372.78	−3490.92	2251.87	62	−173.38	−4054.24	2943.80
63	−476.44	−3967.35	2728.30

4.4. Accumulated Sums for the 1995-2014 Period

A complete overview of the quarterly, annual, and cumulative results over the 20-year span for variants a, b, c, d is presented in Table 5. Despite minor negative outcomes in certain quarters and even some entire years, the final column demonstrates a pattern of rapid growth in all the variants.

Table 5. Trading results of flexible algorithm for four variants.

Variant a
	1 quarter	2 quarter	3 quarter	4 quarter	Year income	Accum. income
1995	194	−201	193	25	211	211
1996	417	146	554	385	1502	1713
1997	−1331	−165	266	848	−382	1331
1998	345	1447	−1487	−782	−477	854
1999	94	268	295	250	907	1761
2000	291	727	809	84	1911	3672
2001	−384	−829	1617	1310	1714	5386
2002	608	−975	1206	1429	2268	7654
2003	−430	137	−126	−196	−615	7039
2004	−640	184	73	179	−204	6835
2005	−234	−99	327	254	248	7083
2006	417	541	5	−279	684	7767
2007	32	156	1317	330	1835	9602
2008	2259	1048	2728	911	6946	16,548
2009	−52	−261	1128	−34	781	17,329
2010	−609	281	439	−227	−116	17,213
2011	−60	−821	−1803	2943	259	17,472
2012	203	−400	1152	−314	641	18,113
2013	32	−193	773	−439	173	18,286
2014	−2	157	−17	577	715	19,001
Variant b
	1 quarter	2 quarter	3 quarter	4 quarter	Year income	Accum. income
1995	145	−285	128	150	138	138
1996	340	153	630	184	1307	1445
1997	−1467	51	125	876	−415	1030
1998	46	1339	−1234	−951	−800	230
1999	177	−464	423	107	243	473
2000	561	1138	622	−182	2139	2612
2001	−672	−1313	1304	1924	1243	3855
2002	792	−734	1111	1123	2292	6147
2003	763	138	−58	−351	492	6639
2004	−322	271	163	−106	6	6645
2005	−157	−74	199	170	138	6783
2006	641	−97	5	−329	220	7003
2007	−704	−312	1243	446	673	7676
2008	1568	960	4177	43	6748	14,424
2009	2760	−234	792	115	3433	17,857
2010	−501	335	140	101	75	17,932
2011	−363	−1161	−2054	3615	37	17,969
2012	−1	−168	936	−306	461	18,430
2013	−209	−153	578	438	654	19,084
2014	33	−47	−85	965	866	19,950
Variant c
	1 quarter	2 quarter	3 quarter	4 quarter	Year income	Accum. income
1995	142	−477	−301	116	−520	−520
1996	193	143	561	190	1087	567
1997	209	−505	65	643	412	979
1998	288	962	973	−160	2063	3042
1999	53	149	−424	414	192	3234
2000	272	1064	879	−1223	992	4226
2001	−11	−1311	1424	1433	1535	5762
2002	745	−750	1395	1600	2990	8752
2003	−403	416	−41	−626	−654	8098
2004	−686	136	45	−193	−698	7400
2005	−171	−162	254	54	−26	7374
2006	298	326	−632	94	86	7460
2007	−314	−59	910	599	1136	8596
2008	1829	1342	2887	686	6744	15,340
2009	1960	835	−446	−666	1683	17,032
2010	293	714	331	−237	1101	18,124
2011	−247	−1477	−93	3058	1241	19,365
2012	−150	303	1108	−1066	195	19,560
2013	103	−526	−338	−472	−1233	18,327
2014	237	432	−264	13	418	18,747
Variant d
	1 quarter	2 quarter	3 quarter	4 quarter	Year income	Accum. income
1995	184	−293	−232	−19	−360	−360
1996	345	44	795	66	1250	890
1997	902	−343	−6	1093	1646	2546
1998	207	1007	4	−104	1114	3660
1999	−11	355	465	138	947	4607
2000	723	1279	720	−501	2221	6828
2001	−572	−928	1295	1253	1048	7876
2002	971	−601	1172	1207	2749	10,625
2003	36	107	331	−603	−129	10,496
2004	−988	328	−767	−259	−1686	8810
2005	−205	−382	376	−267	−478	8332
2006	305	398	101	−329	475	8807
2007	−90	−263	1383	308	1338	10,145
2008	1737	1294	2671	441	6143	16,288
2009	2779	−234	−153	−139	2253	18,541
2010	313	646	661	208	1828	20,369
2011	−240	−1338	575	3283	2280	22,649
2012	−203	455	1183	−256	1179	23,828
2013	−193	−48	438	−277	−80	23,748
2014	139	−310	597	−391	35	23,783

The data in Tables 5(a)-(d) point to a similar pattern in the evolution of accumulated returns across different calculation variants. Once again, it should be emphasized that the quantitative results presented in these tables were obtained using the same software and the same input data. The only differences stem from the use of a standard random number generator in the clustering algorithm.

A visual representation of these results is provided in Figures 5(a)-(d). The summary visual data for all four variants are presented in Figure 1. Table 6 presents the general quantitative data (in contrast to Table 5, quarterly data are not included). The variants can be implemented simultaneously, and the overall profit is the sum of the individual profits from these variants.

In Table 6, the values of the correlation coefficients are given for the last five columns of Table 5, i.e., for the accumulated incomes of variants a, b, c, d and the sum of them.

(a)

(b)

(c)

(d)

Figure 5. Accumulated totals and regression lines for 4 variants.

Table 6. Comparison of the accumulated returns for variants a, b, c, d, and their sum.

Year	Inc. a	Inc. b	Inc. c	Inc. d	Ac. a	Ac. b	Ac. c	Ac. d	Ac. Sum
1995	211	138	−520	−360	211	138	−520	−360	−531
1996	1502	1307	1087	1250	1713	1445	567	890	4615
1997	−382	−415	412	1646	1331	1030	979	2546	5886
1998	−477	−800	2063	1114	854	230	3042	3660	7786
1999	907	243	192	947	1761	473	3234	4607	10,075
2000	1911	2139	992	2221	3672	2612	4226	6828	17,338
2001	1714	1243	1535	1048	5386	3855	5762	7876	22,879
2002	2268	2292	2990	2749	7654	6147	8752	10,625	33,178
2003	−615	492	−654	−129	7039	6639	8098	10,496	32,272
2004	−204	6	−698	−1686	6835	6645	7400	8810	27,852
2005	248	138	−26	−478	7083	6783	7374	8332	29,572
2006	684	220	86	475	7767	7003	7460	8807	31,037
2007	1835	673	1136	1338	9602	7676	8596	10,145	36,019
2008	6946	6748	6744	6143	16,548	14,424	15,340	16,288	62,600
2009	781	3433	1683	2253	17,329	17,857	17,032	18,541	70,759
2010	−116	75	1101	1828	17,213	17,932	18,124	20,369	73,638
2011	259	37	1241	2280	17,472	17,969	19,365	22,649	77,455
2012	641	461	195	1179	18,113	18,430	19,560	23,828	79,931
2013	173	654	−1233	−80	18,286	19,084	18,327	23,748	79,445
2014	715	866	418	35	19,001	19,950	18,745	23,783	81,479

Table 7. Correlation coefficients among variants a, b, c, d and their sum.

	a	b	c	d	sum
a	1	0.994	0.990	0.978	0.995
b		1	0.986	0.979	0.994
c			1	0.992	0.997
d				1	0.993
sum					1

From Tables 5-7, it is clear that “similarity” increases with the level of aggregation. A comparison of cumulative results over a prolonged period demonstrates a degree of stability that, at first glance, might be unexpected in an inherently uncertain environment such as the stock market. Nevertheless, the rise in stability with greater levels of aggregation is a well-known systemic effect. As an example, one might point to temperature variations across large regions: on individual days and at specific locations, no discernible patterns are apparent. However, when examining longer time spans and broader geographic areas, stability increases—reaching as much as 12˚C for extended periods on a global average. It is important to note a significant distinction, however. The claim here does not pertain to the stability of the S&P 500 stock market per se; rather, it concerns the stability of trading outcomes derived from the proposed algorithm, applied to actual daily stock prices.

5. Conclusion

1) The sums reported in Table 5 and Table 6 may seem insignificant in the context of the multibillion-dollar S&P-500 stock market. However, recall that QTA considers no more than five sets of promising stocks, and in each set, only one share of each type is traded. If one were to buy and sell 100 shares instead of just one, the returns would be multiplied by 100, and so on. Furthermore, the different trading variants run simultaneously, and the total profit equals the sum of the profits accumulated across each variant (see Table 6). Their number can be substantially increased.

2) Let’s focus on choosing one of the parameters of the proposed algorithm, namely, the depth l of viewing stock price data. In the article, l = 15, i.e., stock prices (at the time of closing) during the previous 15 days were considered for trading the next day. Without giving precise statements, we will give meaningful arguments. With a significant increase in the parameter l (for example, to 60), the clustering results will practically not depend on the price behavior in the last few days, and we will miss the beginning of areas of rapid growth or decrease of these prices. If the values of this parameter are too small (for example, 5), the clustering results will depend on small price fluctuations, and we will again incorrectly determine reasonable actions for the next day. Note that the choice of viewing depth is largely determined by the speed of trading on the stock market itself, i.e., in fact, the rate of the management programs of the stock market. This rate has been increased by a factor of ten or more since July 1, 2014. Therefore, for the last half-year of the 20-year period under review, almost all clusterizations found with the same parameters turned out to be degenerate. This means that one of the 12 clusters consisted of 450 or more shares out of 500, and the remaining 11 consisted of very small groups. In this case, no reasonable results can be obtained using only the prices at the time of closing the trade. It is required to use price data at much shorter intervals (for example, every hour or every half hour). Fortunately, such data on the S&P-500 market is available free, and the proposed algorithm can be used without modification. This is expected to be done in more detail in subsequent works.

The questions have not been raised before, although many publications are devoted to stock market analysis for the period including July 1, 2014.

The relationship between the frequency of information changes and the frequency of its collection is systemic in nature. It is well known in the theory of Fourier series as the Nyquist-Shannon theorem:

If a signal contains no frequency components above a frequency f_max, the signal can be uniquely represented by equally spaced samples if the sampling frequency f_s is greater than twice f_max. That is, the sampling frequency must satisfy the inequality f_s > 2f_max.

3) The idea of switching trading to a backward mode was inspired by a well-known scheme in optimal control theory: bringing a rocket to zero velocity by alternating forward and backward thrust. In the algorithm under consideration, such a switch can occur at most once. The results might be improved by considering trading not on a quarterly basis but over longer intervals (e.g., years). In that case, one could employ multiple mode switches, depending on the evolving situation—one that may not fully play out within a single quarter.

4) A careful reader may note the following. When trading opens on day t, the chosen stocks prices can differ slightly from their closing prices on the previous day. Yet the profit expression involves differences between closing prices on days t − 1 and t. As a result, the gains realized may differ marginally from those computed by the proposed algorithm.

The correction becomes straightforward once stock prices are made available more frequently. The algorithms themselves require no modification (see item 2).

5) The daily selection of promising sets of stocks (see subsection 3.4) can be considered as determining the optimal portfolio. While a high level of sustainability has been demonstrated for accumulated income, issues related to the sustainability of the found portfolios were not considered in the article. This is supposed to be done in the future.

6) The graphs in Figure 1 and Figure 5, as well as the numbers in Table 5, indicate a sharp increase in revenues in the 3rd quarter of 2001 and in the 3rd quarter of 2008. They are accompanied by exactly the same (see statement 1 in subsection 3.5.1) decrease in income according to the forward algorithm. Therefore, these features make it possible to accurately predict major crises based on these changes during the relevant periods. The task of predicting major crises has not been considered in this article. Naturally, it will require special research. However, this fact also demonstrates the promise of the proposed approach for stock market analysis.

Acknowledgements

The authors extend their gratitude to Professor F. T. Aleskerov for his support of this work, to Professor B. G. Mirkin for his assistance in preparing this article, to M.V. Zhitlukhin for useful recommendations, and to A.I. Shirokov for computer-aid. This article is an output of a research project “Models and methods for decision-making under deep uncertainty” implemented as part of the Basic Research Program at the National Research University Higher School of Economics (HSE University).

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1]	Aldridge, I. (2020). A Practical Guide to Algorithmic Strategies and Trading Systems. Wiley.
[2]	Bruni, R. (2017). Stock Market Index Data and Indicators for Day Trading as a Binary Classification Problem. Data in Brief, 10, 569-575. [Google Scholar] [CrossRef] [PubMed]
[3]	Chan, E. P. (2021). Quantitative Trading: How to Build Your Own Algorithmic Trading Business (Wiley Trading) (2nd ed.). Wiley.
[4]	Chen, J., & Tsang, P. K. (2021). Detecting Regime Change in Computational Finance Data Science, Machine Learning and Algorithmic Trading. Chapman & Hall.
[5]	Clenow, A. F. (2023). Following the Trend: Diversified Managed Futures Trading (2nd ed.). Wiley. [Google Scholar] [CrossRef]
[6]	Cover, T. M. (1991). Universal Portfolios. Mathematical Finance, 1, 1-29.
[7]	Dashore, P. et al. (2010). Fuzzy Rule Based System to Characterize the Decision-Making Process in Share Market. International Journal on Computer Science and Engineering, 2, 1973-1979.
[8]	Donadio, S., & Ghosh, S. (2019). Learn Algorithmic Trading: Build and Deploy Algorithmic Trading Systems and Strategies Using Python and Advanced Data Analysis. Packt Publishing.
[9]	Fazli, S., & Jafari, H. (2012). Developing a Hybrid Multi-Criteria Model for Investment in Stock Exchange. Management Science Letters, 2, 457-468. [Google Scholar] [CrossRef]
[10]	Fung, S. P. Y. (2021). Online Two-Way Trading: Randomization and Advice. Theoretical Computer Science, 856, 41-50. [Google Scholar] [CrossRef]
[11]	Georgieva, P. V., Popchev, I. P., & Stoyanov, S. N. (2015). A Multi-Step Procedure for Asset Allocation in Case of Limited Resources. Cybernetics and Information Technologies, 15, 41-51. [Google Scholar] [CrossRef]
[12]	Guo, Y. (2020). Stock Trading Based on Principal Component Analysis and Clustering Analysis. IOP Conference Series: Materials Science and Engineering, 740, Article ID: 012129. [Google Scholar] [CrossRef]
[13]	Jallo, D., Budai, D., Boginski, V., Goldengorin, B., & Pardalos, P. M. (2013). Network-Based Representation of Stock Market Dynamics: An Application to American and Swedish Stock Markets. In B. Goldengorin, et al. (Eds.), Models, Algorithms, and Technologies for Network Analysis (pp. 93-106). Springer. [Google Scholar] [CrossRef]
[14]	Jing, D., Imeni, M., Edalatpanah, S. A., Alburaikan, A., & Khalifa, H. A. E. (2023). Optimal Selection of Stock Portfolios Using Multi-Criteria Decision-Making Methods. Mathematics, 11, 415-435. [Google Scholar] [CrossRef]
[15]	Kissell, R. (2014). The Science of Algorithmic Trading and Portfolio Management. Elsevier Inc.
[16]	Mayanja, F., Mataramvura, S., & Charles, W. M. (2013). A Mathematical Approach to a Stocks Portfolio Selection: The Case of Uganda Securities Exchange (Use). Journal of Mathematical Finance, 3, 487-501. [Google Scholar] [CrossRef]
[17]	Mohammed Almasani, S. A., Finaev, V. I., Abdo Qaid, W. A., & Tychinsky, A. V. (2017). The Decision-Making Model for the Stock Market under Uncertainty. International Journal of Electrical and Computer Engineering, 7, 2782-2790. [Google Scholar] [CrossRef]
[18]	Pakpahan, E. E. et al. (2019). Investment Decision Making Methods on Stock Market. DLSU Business & Economics Review, 28, 104-108.
[19]	Prado, L. (2018). Advances in Financial Machine Learning. Wiley.
[20]	Rubchinsky, A. (2018). Graph Dichotomy Algorithm and Its Applications to Analysis of Stocks Market. In V. Kalyagin, P. Pardalos, O. Prokopyev, & I. Utkina (Eds.), Computational Aspects and Applications in Large-Scale Networks (pp. 75-111). Springer International Publishing. [Google Scholar] [CrossRef]
[21]	Rubchinsky, A., & Baikova, K. (2023). Algorithm of Trading on the Stock Market, Providing Satisfactory Results. In B. Goldengorin, & S. Kuznetsov (Eds.), Data Analysis and Optimization (pp. 331-347). Springer. [Google Scholar] [CrossRef]
[22]	Ruke, A., Gaikwad, S., Yadav, G., Buchade, A., Nimbarkar, S., & Sonawane, A. (2024). Predictive Analysis of Stock Market Trends: A Machine Learning Approach. In 2024 4th International Conference on Data Engineering and Communication Systems (ICDECS) (pp. 1-6). IEEE. [Google Scholar] [CrossRef]
[23]	Seth, S. (2023). Basics of Algorithmic Trading: Concepts and Examples. UtradeAlgos.
[24]	Ziemba, W. T., Lleo, S., & Zhitlukhin, M. (2018). Stock Market Crashes. Predictable and Unpredictable and What to Do about Them. World Scientific.
[25]	Zou, J., Lou, J., Wang, B., & Liu, S. (2023). A Novel Deep Reinforcement Learning Based Automated Stock Trading System Using Cascaded LSTM Networks. Expert Systems with Applications, 242, Article ID: 122801. [Google Scholar] [CrossRef]

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies