A Program Study of the Union of Semilattices on the Set of Subsets of Grids of Waterloo Language ()
1. Introduction and Preliminaries
It is well known that there exist different complete invariants for describing a regular language: not only well-known canonical automata [1] [2] [3] [4], but also basis automata [5] [6] [7] and universal automata [8] [9] [10] [11]. When constructing basis and universal automata, it is necessary to construct canonical automata both for a given regular language and for its mirror language. In the process of such construction, we can obtain, among other objects, a special binary relation # defined on pairs of states of these two canonical automata. This relation is also invariant (but incomplete) for the language in question.
Of course, at present, the most interesting language for research is the Waterloo language. The term “Waterloo automaton” is used in most of the articles, both in English and in Russian. However, it will be clear from what follows that the term “Waterloo language” is better.
A universal automaton constructed for this language has the following property: there exists a non-equivalent automaton among its corresponding covering automata [11] [12] [13]. It is still unknown whether a Waterloo language is minimal, but most likely it is.
And, of course, it is desirable to define “minimality” mentioned in the previous paragraph strictly; it can be considered according to different norms. The most natural norm is the number of states of an equivalent canonical automaton, but we are more interested in other variants:
・ either the number of states of an equivalent basis automaton―for the Waterloo language this value is 20;
・ or the product of the number of states of two equivalent canonical automata (for the given regular language and for the language mirroring the given one)―for Waterloo this value is 80.
Covering automata have already been mentioned above. Of course, for any automaton, they form a semilattice by union, but it can be shown that they do not, generally speaking, form a semilattice by intersection. More concretely, instead of one intersection semilattice, a union of several such semilattices is formed. This paper is devoted to consideration of such a construction for the Waterloo language.
Thus, the topic under consideration is adjacent to the study of the set of all possible arcs of a nondeterministic finite automaton defining a given regular language; such a study was started by one of the authors of this paper back in the late 1990s [14] [15] and was subsequently continued in [10] [11] [13] [16].
Thus, as it has already been mentioned, when we study the set of possible arcs of an automaton (in other words, the arcs of a COM automaton, the arcs of a universal automaton), we obtain a special binary relation # defined on pairs of states of these two canonical automata as an auxiliary construction. This relation is also invariant (but incomplete) for the language in question. For each such binary relation, there exists a whole subclass of the class of regular languages that possesses it. Hence, on the set of all regular languages, it is possible to define (another) binary relation; it is valid for some two languages if and only if they have the same binary relation #. Obviously, the binary relation defined in this way (in some of our previous works, [17] [18] [19], it was denoted by R) is an equivalence relation on the set of all regular languages. This raises the question of the “most typical” language, which is an element of the class of equivalence under the relation in question.
In some previous papers, we have studied in detail regular languages and their corresponding canonical automata, which can be considered as such “typical elements” of their class, and considered some of their properties [19] [20] [21]. Apparently, the most interesting of these properties is the following: using specially described transformations, we can obtain from such a “typical” automaton any canonical automaton whose language corresponds to the binary relation # under consideration.
Let us repeat that so far we know only one example of a Waterloo-like language (i.e. a language for which there exists a covering automaton which is not equivalent to it); the article [22] can be considered as a formulation of the problem of finding other such languages. Let us also note the possible connection of Waterloo-like languages with infinite iterations of finite languages [18] [23] [24] and others. We also note that many of the problems listed here can be considered as container packing problems (see [25] and many others); of course, it is not expedient to solve them in this way; they require other models similar to those considered in this paper.
Most of the terminology related to nondeterministic finite automata corresponds to [17] (see also references from that paper). Here we add references to some terms directly related to automata minimization. The vertex minimization algorithm of nondeterministic finite automata considered in [4; Section 6] is based on the analysis of the binary relation # [4; Section 3.3] connecting the sets of states X and Y of two canonical automata which are constructed on the basis of the analyzed nondeterministic finite automaton K (defining some regular language L) and a mirror to it. A set of grids [4; Section 3.4] is associated with relation #. Each grid is defined by a pair of subsets X0 Ì X and Y0 Ì Y satisfying the following condition: for any states x Î X0 and y Î Y0, the relation x # y holds, and the subsets X0 and Y0 cannot be expanded while keeping the condition. We denote such a grid by X0 × Y0. Note that grids can be considered as states of a universal automaton.
A set M of grids is called a covering grid if for any elements x Î X and y Î Y such that x # y, there exists a grid X0 × Y0 of M for which x Î X0 and y Î Y0. Obviously, the complete set of grids constructed by mapping # is a covering set.
In [11], we describe an algorithm for constructing from the complete set of grids a COM (L) automaton (actually isomorphic to a universal automaton) which defines the same regular language L as the original automaton K, and each grid corresponds to some state of the COM (L) automaton. Based on a COM (L) automaton, we can define a family of covering automata, each of which is obtained by removing some states of a COM (L) automaton, and the remaining states correspond to grids which form the covering set.
The minimization algorithm for the original automaton K consists in choosing a covering grid set M0 of minimal size, for which the correspondent covering automaton is equivalent to the automaton K, i.e. defines the same regular language. Examples show that not every covering grid set leads to a covering automaton equivalent to the original one. A well-known example is the Waterloo automaton first given in [26] and analyzed in detail in [4; Sec. 6].
This paper presents the results of an additional study of covering automata based on the Waterloo automaton. The study was performed using the library for working with nondeterministic automata NFALib implemented by one of the authors in C#, [27] et al.
3. The Program Study of Automata for Waterloo-Like Languages
Before describing the obtained results, let us describe the objects used and give the fragments of the program code that allow us to obtain them. An initial automaton (a canonical automaton for a regular Waterloo-like language) is defined by means of its text description contained in the Waterloo.txt file (the description corresponds to [4; Table 13]), Figure 1.
This is a deterministic automaton whose canonicalization leads to an automaton of the same kind. The canonical automaton for a mirror automaton, after renaming the states, takes the form given in [4; Table 16], Figure 2.
The States column gives the sets of states before their reassignment; each state also corresponds to the set of states of the original mirror automaton obtained as a result of its determinization. On the basis of the obtained canonical automata, a matrix of the relation # is constructed, the rows of which correspond to the states of the canonical automaton w, and the columns to the states of the automaton w2, which is canonical to the mirror automaton. This matrix corresponds to [4; Table 17], Figure 3.
Next, a complete set of 14 grids is defined for the found relation #. After a small change in the order of their order, we obtain a set that is completely identical with the one described in [4; p. 107], Figure 4.
Figure 1. A canonical automaton for a regular Waterloo-like language.
Figure 2. The canonical automaton for a mirror automaton.
On the basis of the obtained complete set of grids, a complete automaton is constructed, the representation of which coincides with the one given in [4; Table 20], Figure 5.
The constructed complete automaton is equivalent to the original Waterloo automaton, which can be shown by performing its determinization, see Figure 6. After renaming the states we get an automaton whose representation is the same as the original Waterloo automaton, Figure 7. Then the minimal covering set of grids is defined. This set contains grids 1, 3, 5, 6, 8, 10, 12 of the full set, see
Figure 8. A covering automaton is constructed based on the minimal covering set of grids, Figure 9. After the canonization procedure, the resulting covering automaton takes the form shown in Figure 10. By renaming the states, similar to what was done for the deterministic representation of the full automaton, we obtain an automaton that is not equivalent to the original Waterloo automaton (the line with differences is underlined in the automaton representation), Figure 11. So, the covering automaton based on the minimal covering set of grids is not equivalent to the original one.
Figure 6. A canonical automaton for a complete automaton.
Figure 7. A canonical automaton with renamed states.
Figure 8. The minimal covering set of grids.
4. Computation Results for Covering Automata
In the final part of the article we describe results obtained when analyzing other covering automata that can be constructed on the basis of the minimal covering grid set complemented with some other grids from the initial complete grid set.
As will be seen later, it would be reasonable to change a little the order of additional grids, moving grid 9 of the complete set of grids (see Figure 4) to the beginning of the set of additional grids, Figure 12.
Each covering automaton will be denoted by a 7-digit binary number in which the digit 1 marks the additional grids included in the covering set of grids on which it is based. In particular, automaton 1111111 corresponds to the complete automaton (see Figure 5) based on the complete set of grids, and automaton 0000000 corresponds to the covering automaton based on the minimal covering set (see Figures 9-11). The order of digits corresponds to the order of additional grids from Figure 12.
Based on the NFALib library tools described above, a set of 128 covering automata was constructed and their equivalence was investigated. To check the
Figure 10. A covering automaton after canonization.
Figure 11. A covering automaton after canonization and renaming the states.
equivalence of two automata we used an algorithm in which
・ initially, the automata are reduced to a canonical form,
・ then, for each state of automata, some hash characteristic is constructed,
・ and, if the hash-characteristics coincide, comparison of transition functions for different combinations of new names of the second automaton is performed using the back-tracking algorithm.
As a result, we obtained four sets of pairwise equivalent automata whose properties are shown in Table 1.
In addition, situations were investigated in which the addition of some additional grid results in a transition from an automaton that is not equivalent to the original Waterloo automaton to an equivalent automaton. In total, there are 80 transitions, and for each of the 48 non-equivalent automata there are one or two such transitions:
・ one transition for automata from set A,
・ two transitions for automata from sets B and C.
The results can be represented graphically: depict a 7-dimensional hypercube, highlighting in light red the vertices that correspond to covering automata which are not equivalent to the original Waterloo automaton, and also the edges that correspond to the transition from non-equivalent to equivalent automata (Figure 13). In this Figure,
・ the outer “circle” contains the first half of vertices (0000000 to 0111111) traversed clockwise from the rightmost vertex,
・ and inner “circle” contains the second half of vertices (from 1000000 to 1111111) traversed in the same order.
Even more illustrative representation of the sets of automata and transitions between them can be obtained if we represent each of the sets A, B, C as a 4-dimensional hypercube and set D as five 4-dimensional hypercubes, and then combine these hypercubes into one 7-dimensional hypercube (Figure 14).
Figure 14 shows that the vertices of 7-dimensional hypercube are decomposed into equivalence classes of 16 elements each, the first 3 binary digits of which coincide. The properties of automata from each of these equivalence classes are the same, so to fully illustrate our results it is sufficient to use a three-dimensional
Table 1. Four sets of pairwise equivalent automata.
Figure 13. A set of 128 covering automata (first representation).
Figure 14. A set of 128 covering automata (second representation).
“factor cube”, in which vertices are determined by the first three binary digits or, which is the same, by the presence or absence of grids 9, 2 and 4 in the corresponding covering automaton (Figure 15).
It follows from these results that a minimal covering automaton equivalent to the Waterloo automaton can be obtained by adding an additional grid 9 to the minimal covering grid set, i.e. by passing from automaton 0000000 to automaton 1,000,000.
Additional calculations show that, besides the minimal covering automaton 1,000,000, it is possible to obtain 4 more minimal covering automata equivalent to the original Waterloo automaton (of 11 different covering automata with 8 states), but to obtain each of them one or two grids from the minimal covering
Figure 15. A set of 128 covering automata represented as a factor cube.
set must be replaced by two or, respectively, three grids from the additional set:
・ grid 3 must be replaced by grids 2 and 4,
・ grid 8 must be replaced by grids 7 and 9,
・ grid 10 must be replaced by grids 9 and 11,
・ grids 8 and 10 must be replaced by grids 7, 9, and 11.
Founding
This work was supported in part by the Higher Education Stability Support Program of Chinese Universities (section “Shenzhen 2022 - Science, Technology and Innovation Commission of Shenzhen Municipality”).