1. Introduction
Classically, a word over an alphabet Ω is a finite sequence of letters taken from Ω and a run in a word is a maximal subsequence of consecutive identical letters in, see, for example, Feller [1], or Mood [2]. A slightly different point of view consists of interpreting a word over Ω as the result of the replacement of each element of a linear digraph by an arbitrary letter of Ω. A run in a word is then interpreted as a maximal linear sub-digraph whose letters are all identical. Here are four runs, , in a 9-letter word over the alphabet) obtained under this interpretation (see below).
(1)
Using this point of view, words can now be easily “structured” in various ways. For example, starting from the digraph of a circular permutation (instead of a linear digraph, as above) gives rize to the concept of a circular word studied probabilistically in the basic paper of Koutras et al. [3]. In a similar way, any combinatorial species F, in the sense of Joyal [4], leads to a corresponding notion of F-structured word (or F-word, for short). Technically speaking, a F-word of length n over an alphabet Ω is an orbit of the following natural action of the symmetric group,
(2)
defined by
where is the set of F-structures on
,
is the result of relabelling the -structure along the permutation, and is an ordinary word of length over the alphabet Ω. Figure 1 illustrates this action for, the left drawing shows the ordered pair where is a graph on [5] and, the -th letter of the word, is written just under its associated vertex in the graph, for; while the right drawing shows the resulting under the action of.
It is important to note that under this action, the labels of the underlying set of the structure are permuted while each letter is kept in a fixed position. This is a general fact and it constitutes the reason why the orbits of action (2) can be though of as unlabelled F-structures whose nodes are replaced (or filled) by letters of the alphabet, that is, F-words. Summing the orbits of (2) over, gives rize to the concept of an analytic functor in the sense of Joyal [5],
(1) |
of the category of sets into itself, sending each set Ω (= alphabet) to the set of all F-words over Ω.
There exist special classes of F-words, for which a natural notion of “run” can be defined. This is the case when F is a graphical species, whose structures are of the form of a simple graph possibly equipped with extra structures. Standard examples of graphical species include: the species of digraphs, of circular permutations, of tree-like structures (e.g., when F is the species of -enriched rooted trees or the species of R-enriched trees in the sense of Labelle [6]), etc.
Definition 1 Let F be a graphical species and be a F-word over an alphabet Ω. A run in is a maximal connected subgraph of having identical letters.
This definition makes sense since the relabellings of F-structures under the action (2) is letter-invariant. Figure 2 shows the runs in a circular word and in a tree-like word over the alphabet.
The study of runs in graphical words appears far from being trivial, under such a general setting. For example, a simple graph word whose runs are all singletons is equivalent to a proper coloring of the graph. So that, even in this special case, the whole theory of chromatic polynomials must be used. In the present text, our run analysis will be concentrated on classes of arborescent words.
In Section 2, we study various types of runs in Fwords over an alphabet of k letters, where F is the graphical species of R-enriched rooted trees, called here R-arborescences, for short. To do so, we make use of functional equations on multisort species and introduce a special 2-sort species, that we call a runs selector species, to classify the types of runs under study. Various enu-
Figure 2. Four runs in a circular word and five runs in a tree-like word.
merative results about runs (e.g., number of F-words whose runs are all of a given type, distribution of the biggest run, etc) then follow by making use of appropriate underlying cycle index series. Section 3 illustrates the theory of R-arborescent words by analysing specific examples and their associated series. Section 4 indicates some extensions to be developed in the future. For an introduction to species, the reader can consult the basic paper of Joyal [4] or the book by Bergeron, Labelle, and Leroux [7].
2. Runs in Enriched Arborescent Words
An arborescence is a directed rooted tree in which all edges point away from the root. Let be a given species of structures having at least one structure on the empty set. Recall that an -arborescence is an arborescence in which the (possibly empty) set of direct successors of each vertex is equipped with an - structure. Figure 3(a) shows an -arborescence, where the black dots represent the elements of its underlying set. The small arc at each vertex represents the -structure placed on the set of successors of the vertex. The species of -arborescences is characterized by the combinatorial equation,
(3)
and, depending on the choice of the “enriching” species, , various species of tree-like structures are obtained. Now, take a -letter alphabet and associate a sort of singletons to letter , for . The structures belonging to the -sort species are -arborescences whose vertices are of arbitrary sorts taken among the’s. The first step in our run analysis is to classify these multisort structures according the runs of sorts they contain. From (3), we obviously have,
(4)
where is the species of the -structures having a root of sort. Note that this root of sort is part of a run of vertices of sort. To reflect this fact, we now give an alternate expression for in terms of an auxiliary two-sort species, , where is an extra sort of singletons.
Lemma 2 Let be the two-sort species defined by
(5)
Then, the species satisfy the combinatorial equations,
(6)
Proof. Figure 3(b) shows a -structure, where singletons of sort (resp.,) are drawn as black dots (resp., white triangles). The result follows from the fact that an -structure is a -structure in which the blackdots are replaced by singletons of sort and the white triangles are replaced by arbitrary -structures, , (see Figure 4(a), where, , -singletons are white dots, -singletons are black dots, -singletons are gray dots).
Definition 3 Any subspecies is called a runs selector species. The species
of -arborescences having only runs of sorts of type is defined by
(7)
An -arborescent word having only runs of type over the alphabet is the result of the replacement of each vertex of sort in a -structure, by the letter, for.
It is worthwhile to note that a runs selector species does not only select the shape of the runs under interest but also specifies how these runs should be interconnected. For example• selects unrestricted runs (see Figure 3(b))• selects linear runs (see Figure 5(a))• selects linear runs of length• selects singleton runs (see Figure 5(b))• selects maximal runs (see Figure 5(c)), etc.
The main interest in Lemma 2 and Definition 3 is that they give rise to a variety of enumerative results about runs of various types in -arborescent words by the introduction of special weight counters in (5)-(7). For example, taking, where is the species of cyclic permutations, Equation (3) takes the form
whose solution is the species of
(a) (b)
Figure 3. (a) -arborescence; (b) -structure.
(a) (b)
Figure 4. (a) -structure; (b) Mobile word with linear runs.
(a) (b) (c)
Figure 5. Selectors for (a) Linear runs; (b) Singleton runs; and (c) Maximal runs.
mobiles in the terminology of Bergeron, Labelle, and Leroux [7]. The reason is the following. Putting the root on top, the descendents of each node can be thought as turning around along horizontal circles. In this case, -words are called mobile words. Figure 4(b) illustrates a mobile word on a 2-letter alphabet
having only linear (type (ii)) runs.
The introduction of a run counter and the simultaneous substitutions, in (5)-(7), where is a letter counter lead to the following general proposition.
Proposition 4
Let be the species of R-arborescences and be a runs selector species. Give a weight to each -word with runs and let be the total weight of -words of length having only runs of type over the -letter alphabet. Then the generating series
is given by,
(8)
where is the cycle index series of. In the special case, , corresponding to unrestricted runs, the generating series is explicitely given by,
(9)
where is the cycle index series of.
Proof. The series is the total weight of all unlabelled -structures in which each node is replaced by the variable counter and each run is given a weight. Now, consider the species
obtained by the substitutions, , in the species weighted as above. Then, from Pólya Theory in the context of species, , where is the cycle index series of . By symmetry, all the species, , are equal. So that they can all be identified with a single species, , say. By (4), taking into account the run counter, ,
(10)
(11)
Define the species by.
Then, , and the recursive scheme (8) follows by taking the cycle index series of both sides of this last equation. Now, consider the special choice,. Then, using the fact that
we have,
(12)
Hence,
This implies that the species
satisfies the equation
Since, we must have
by the implicit species theorem [4]. So that,
and (9) follows by taking the cycle index series on both sides of this combinatorial equation.
The following corollary is immediate from Proposition 4, via the substitution.
Corollary 5 Let be the species of arborescences and be a runs selector species. Let be the number of -words of length having only runs of type over the - letter alphabet. Then the generating series, , is given by,
(13)
In the special case, , corresponding to unrestricted runs, the series is explicitely given by,
(14)
The use of cycle index series above comes from the fact that R-enriched arborescences have non-trivial automorphisms, in general. This occurs when R-structures have non-trivial automorphisms. In the asymmetric case, however, that is when the automorphism group of each -structure is trivial, it is well-known that cycle index series reduce to exponential generating series.
Corollary 6 If the species, , is asymmetric, then
and with
(15)
where is the exponential generating series of. Moreover, if, then
(16)
where is the exponential generating series of.
Various computational general methods exist to obtain successive approximations (or explicit expressions, sometimes) for and. Simple iteration is the most obvious one, but more sophisticated methods include Good-Lagrange inversion, the use of eclosion operators and Newton-Raphson iteration adapted to cycle index series (see Labelle [8,9]). There is not enough space here to describe these methods in detail in the present context. Instead, we have chosen to illustrate our analysis of runs by focusing on specific examples.
3. Specific Examples
3.1. Runs in Ordinary Words
With, Equation (3) takes the form whose solution is the asymmetric species of non empty linear orders. In this case, -words are simply (ordinary non empty) words and Equation (5) takes the form whose solution is
(17)
• In the case, of unrestricted runs, (16) immediately gives,
(18)
• Take the runs selector
of linear runs of length. Then recurrences (15) take the form,
(19)
Solving for, we get the well known g.f.’s of words whose runs are of size,
(20)
Note that taking in (20) can be called the Fibonacci case since gives,
(21)
where is the -th Fibonacci number. This corresponds to twice the well known fact that a sequence of adjacent squares can be tiled by monominos or dominos in ways. Taking in (20) and substracting gives the g.f. of words whose maximal runs are of size,
(22)
from which one can compute or estimate the expected size of the largest run (see, for example, Erdos and Revesz [10], Guibas and Odlyzko [11], Kong [12]).
• Take any subset and the selector
of runs of length in. Then,
(23)
3.2. Runs in Binary Arborescent Words
Planar case. With, where is the species of linearly ordered 2-point sets, Equation (3) takes the form whose solution is the asymmetric species of binary plane arborescences. Accordingly, -words are binary plane arborescent words and (5) takes the form.
• Take the linear runs selector
corresponding to (ii) above. Then by (16), the g.f.’s of plane binary arborescent words having only linear runs are given by,
(24)
Coefficient extractions give, for odd,
where, are the Catalan numbers.
• The Fibonacci case, k = 2, m = 2, corresponds, in the present context, to case (iii) above with a 2-letter alphabet. Here,. So that,
(25)
The first terms of and of are given by,
Non-planar case. With, where is the species of 2-element sets, Equation (3) takes the form whose solution is the species of (free) binary arborescences. Accordingly, -words are binary arborescent words and (5) takes the form. Note that in this case, Binstructures have non trivial symmetries in general, so that cycle-index series must be used instead of exponential generating series.
• Take the linear runs selector
corresponding to (ii) above. Then
and by (8), where,
(26)
For example, the first terms of are given by•
• In the Fibonacci case, k = 2, m = 2, the runs selector (iii) above is given by
.
So that,
(27)
The first terms of and of are given by,
3.3. Runs in Cayley Arborescent Words
With, where is the species of finite sets, Equation (3) takes the form whose solution is the species of standard Cayley arborescences. In this case, -words are Cayley arborescent words and equation (5) takes the form. Since, this equation can be rewritten as. Hence, from the unicity of solution in the implicit species theorem, we have,
(28)
• Take the runs selector. Then since , (9) and (14) can be rewritten as,
(29)
For example, the first terms of and are given by•
• Take the linear runs selector corresponding to (ii) above. Then,
(30)
More generally, the runs selector
corresponds to linear runs of size. Here,
(31)
• For Cayley arborescences, the Fibonacci case, k = 2, m = 2, corresponds to the runs selector . So that,
(32)
• For singleton runs (case (iv) above), . So that,
(33)
• For maximal runs (case (v) above), , since . So that,
(34)
where is the ordinary generating series of unlabelled Cayley arborescences.
3.4. Runs in Mobile Words
Recall that with, where is the species of cyclic permutations, Equation (3) takes the form whose solution is the species of mobiles. In this case, -words are called mobile words (see Figure 4(b)). Since
the above case (ii) of linear runs, for example, corresponds to the runs selector,
(35)
and the generating series, , of mobile words having only linear runs, weighted by the run counter is given by,
(36)
since, where is the Euler function. For example, the first terms of, for a 2-letter alphabet are given by,
4. Some Extensions
The present approach to runs in structured words can be extended in various directions. For example, a probabilistic analysis of runs can be made by adding a probability of occurence to letter in the alphabet. Runs in other classes of graphical words can also be considered. For example, the class of - enriched tree words is a tractable one. Here, the species of -enriched arborescences is replaced by the species of -enriched trees. This case needs a more complex analysis since extra symmetries of the structures have to be taken into account (the structures being unrooted). The (enriched) dissymmetry formula,
(37)
which expresses the species of -enriched trees in terms of the species of -enriched arborescences (see Bergeron, Labelle, and Leroux [7]), is the tool to be used for the analysis of the multisort species leading to an analogue of Proposition 4 for -enriched tree words. These extensions will be developed elsewere.
5. Acknowledgements
The authors would like to thank Jérôme Tremblay for his LaTeX and Maple help.