1. Introduction
A random walk on a graph G is a random process on the vertices of G in which, at each step in the walk, we choose uniformly at random among the neighbors of the current vertex. Random walks have been studied extensively, and are used in a variety of algorithms involving graphs. For a comprehensive survey on random walks on graphs, see [2] , and for applications of spectral techniques to random walk theory, see [3] . Random walks on graphs have the useful property that given any initial distribution on the vertex set, the random walk converges to a unique stationary distribution as long as the graph is connected and not bipartite. The speed at which this convergence takes place is referred to as the mixing rate of the random walk. In a graph where a random walk has a fast mixing rate, vertices can be sampled quickly using this random process, making this a useful tool in theoretical computer science.
A non-backtracking random walk on a graph is a random walk with the added condition that, on a given step, we are not allowed to return to the vertex visited on the previous step. Viewed as a walk on vertices, a non-backtracking random walk loses the property of being a Markov chain, making its analysis somewhat more difficult. However, their study has received increased interest in recent years. Recently, Angel, Friedman, and Hoory [4] studied non-backtracking walks on the universal cover of a graph. Fitzner and Hofstad [5] studied the convergence of non-backtracking random walks on lattices and tori. Krzakala et al. [6] use a matrix related to non-backtracking walks to study spectral clustering algorithms. Most pertinent to the current paper, Alon, Benjamini, Lubetzky, and Sodin [1] studied the mixing rate of a non-backtracking walk for regular graphs. In particular, they prove that in most cases, a non-backtracking random walk on a regular graph has a faster mixing rate than a random walk allowing backtracking.
In this paper, we study the mixing rate for a non-backtracking random walk, with the goal of removing the condition of regularity needed in the results of Alon et al. in [1] . We take a different approach than Alon et al. by looking at the non-backtracking walk as a walk along directed edges of a graph, as is done in [4] . This allows us to turn the non-backtracking random walk into a Markov chain, but on a larger state space, which in turn allows us to determine the stationary distribution to which a non-backtracking walk converges for a general graph, whether or not it is regular. In the case of regular graphs, our approach allows us to compute the spectrum of the transition probability matrix for a non-backtracking random walk, expressed in terms of the eigenvalues of the adjacency matrix. This allows for easy comparison of the mixing rates of a non-backtracking random walk, and an ordinary random walk. As a corollary, this gives us an alternate proof of the result in [1] for regular graphs. Our approach gives more information than the approach in [1] , since we give the full spectrum of the transition probability matrix. In addition, we are able to compute the spectrum of the non-backtracking transition probability matrix for biregular graphs. As a corollary, we generalize the result in [1] for regular graphs to an analogous result for biregular graphs.
A key component in our proof is a weighted version of a result known as Ihara’s Theorem, also called the Ihara zeta identity, which relates an operator indexed by the directed edge set of a graph to an operator indexed by the vertex set of the graph. Ihara’s Theorem was first considered in the study of number theoretic zeta functions on graphs, and was first proved for regular graphs by Ihara in 1966 (see [7] ). Numerous other proofs have been given since, along with generalizations to irregular graphs, by Hashimoto ( [8] , 1989), Bass ( [9] , 1992), Stark and Terras ( [10] , 1996), Kotani and Sunada ( [11] , 2000), and others. We will give an elementary proof of Ihara’s Theorem that, to our knowledge, is original. In addition, we follow ideas similar to those in [11] to obtain a version of Ihara’s Theorem with weights that allows us to study the relevant transition probability matrices for random walks.
The remainder of this paper is organized as follows. In Section 2, we give the necessary background and preliminary information on random walks, and develop the corresponding theory for non-backtracking walks, including the convergence of a non- backtracking walk to a stationary distribution for a general graph. We accomplish this via walks on the directed edges of a graph. We also investigate bounds obtained from the normalized Laplacian for a directed graph. We also give the relevant background on Ihara’s Theorem, and a new elementary proof. In Section 3, we prove our weighted version of Ihara’s formula. Finally, in Section 4, we use this formula to obtain the spectrum of the transition probability matrix for a non-backtracking random walk for regular and biregular graphs. This gives a new proof of the result of Alon et al. concerning the mixing rate of a non-backtracking random walk on a regular graph, and generalizes this result to the class of biregular graphs.
2. Preliminaries
2.1. Random Walks
Throughout this paper, we will let
denote a graph with vertex set V and (undirected) edge set E, and we will let
,
and vol(G) denote the sum of the degrees of all the vertices of G. A random walk on a graph is a sequence
of vertices
where
is chosen uniformly at random among the neighbors of
. Random walks on graphs are well-studied, and considerable literature exists about them. See in particular [2] and [3] for good surveys, especially in the use of spectral techniques in studying random walks on graphs.
The adjacency matrix A of G is the
matrix with rows and columns indexed by V given by

It is a well-known fact that the
entry of
is the number of walks of length k starting at vertex u and ending at vertex v. Define D to be the
diagonal matrix with rows and columns indexed by V with
, where
denotes the degree of vertex v. A random walk on a graph G is a Markov process with transition probability matrix
, so

Given any starting probability distribution
on the vertex set V, the resulting expected distribution
after applying k random walk steps is given by
. Here we are considering
and
as row vectors in
.
Note that, in general, P is not symmetric for an irregular graph, but is similar to the symmetric matrix
. Thus, the eigenvalues of P are real, and if we order them as
, then it is easy to see that
with eigenvector
, and
. By Perron-Frobenius theory, if the matrix P is irreducible, then we have that
, and if P is aperiodic, then
. The matrix P being irreducible and aperiodic corresponds to the graph G being connected and non-bipartite.
The stationary distribution for a random walk on G is given by
![]()
The stationary distribution has the important property the
, so that a random walk with initial distribution
will stay at
at each step. An important fact about the stationary distribution is that if G is a connected graph that is not bipartite, then for any initial distribution
on
, we have
![]()
for all v (see [2] ).
Knowing that a random walk will converge to some stationary distribution, a fundamental question to consider is to determine how quickly the random walk approaches the stationary distribution, or in other words, to determine the mixing rate. In order to make this question precise, we need to consider how to measure the distance between two distribution vectors.
Several measures for defining the mixing rate of a random walk have been given (see [3] ). Classically, the mixing rate is defined in terms of the pointwise distance (see [2] ). That is, the mixing rate is
![]()
Note that a small mixing rate corresponds to fast mixing. Alternatively, the mixing rate can be considered in terms of the standard
(Euclidean) norm, the relative pointwise distance, the total variation distance, or the
-squared distance. In general, these measures can yield different distances, but spectral bounds on the mixing rate are essentially the same for each. See [3] for a detailed comparison of each. For our purposes, we will primarily be concerned with the
-squared distance, which will be defined below.
The mixing rate of a random walk is directly related to the eigenvalues of P.
Theorem 1 (Corollary 5.2 of) Let G be a connected non-bipartite graph with transition probability matrix P, and let the eigenvalues of P be
. Then the mixing rate is
.
Thus, the smaller the eigenvalues of P, the faster the random walk converges to its stationary distribution.
2.2. Non-Backtracking Random Walks
A non-backtracking random walk on G is a sequence
of vertices
where
is chosen randomly among the neighbors of
such that
for
. In other words, a non-backtracking random walk is a random walk in which a step is not allowed to go back to the immediately previous state. A non-back- tracking random walk on a graph is not a Markov chain since, in any given state, we need to remember the previous step in order to take the next step. In order for this to be well-defined, we assume throughout the remainder of the paper that the minimim degree of G is at least 2.
Define
to be the
transition probability matrix for a k-step non-back- tracking random walk on the vertices. That is
is the probability that a non-backtracking random walk starting at vertex u ends up at vertex v after k steps. Note that
, where
is the transition matrix for an ordinary random walk on G. However,
is not simply
since a non-backtracking random walk is not a Markov chain.
This process can be turned into a Markov chain, however, by changing the state space from the vertices of the graph to the directed edges of the graph. That is, replace each edge in E with two directed edges (one in each direction). Then the non-back- tracking random walk is a sequence of directed edges
where if
, and
then
and
. That is, the non-back- tracking condition restricts the walk from moving from an edge to the edge going in the opposite direction. Denote the set of directed edges by
. The transition probability matrix for this process we will call
. Observe that
![]()
Note that
is a
matrix. Note also that
is the transition matrix for a walk with k steps on the directed edges.
Lemma 1. Given any graph G, the matrix
as defined above is doubly stochastic.
Proof. Observe first that the rows of the matrix
sum to 1, as it is a transition probability matrix. In addition, the columns of
sum to 1. To see this, consider the column indexed by the directed edge
.
The entry of this column corresponding to the row indexed by
is
if
and if
. Since
this is equal to
. Otherwise, the entry is 0.
Thus the column sum is
![]()
as claimed. □
Define the distribution
by
![]()
where
is the vector of length 2m with each entry equal to 1.
Lemma 2. Let
be any distribution on the directed edges of G. If the matrix
is irreducible and aperiodic, then
![]()
as
.
Proof. It follows from Lemma 1 that
is a stationary distribution for
. This follows because, since the columns of
sum to 1, we have
![]()
Therefore, if the sequence
converges, it must converge to
. Now,
being irreducible and aperiodic are precisely the conditions for this to converge. □
Let f be a probability distribution on the vertices of G. Then f can be turned into a distribution
on
as follows. Define
![]()
Conversely, given a distribution
on
, define a distribution g on the vertices by
![]()
Thus, given any starting distribution
on the vertex set of G, we can compute the distribution after k non-backtracking random walk steps
as follows. First compute the distribution
on the directed edges as above, then compute
, then
is given by
. The following proposition tells us that this converges to the same stationary distribution as an ordinary random walk on a graph.
Theorem 2. Given a graph G and a starting distribution
on the vertices of G, define
to be the distribution on the vertices after k non-backtracking
random walk steps. Define the distribution
by
(note that
this is the stationary distribution for an ordinary random walk on G). Then if the matrix
is irreducible and aperiodic, then for any starting distribution
on V, we have
![]()
Proof. As described above, take the distribution
on vertices to the corresponding distribution
on directed edges. Then define
. Then by Lemma 2, ![]()
converges to
. Now
, and observe that
![]()
So pulling the distribution
on directed edges back to a distribution on the vertices yields
. Thus the result follows. □
Definition 1. The
-squared distance for measuring convergence of a random walk is defined by
![]()
Notice that since
,
![]()
Theorem 3. Let
be the eigenvalues of
. Then the convergence rate for the non-backtracking random walk with respect to the
-squared distance is bounded above by ![]()
Proof. We have
![]()
Observe that
is orthogonal
, which is the eigenvector for
, so we see that
![]()
Therefore,
![]()
□
2.3. Non-Backtracking Walks as Walks on a Directed Graph
The transition probability matrix
for the walk on directed edges can be thought of as a transition matrix for a random walk on a directed line graph of the graph G. In this way, theory for random walks on directed graphs can be applied to analyze non-back- tracking random walks. Random walks on directed graphs have been studied by Chung in [12] by way of a directed version of the normalized graph Laplacian matrix. In [12] , the Laplacian for a directed graph is defined as follows. Let P be the transition probability matrix for a random walk on the directed graph, and let
be its Perron vector, that is,
. Then let
be the diagonal matrix with the entries of
along the diagonal. Then the Laplacian for the directed graph is defined as
![]()
This produces a symmetric matrix that thus has real eigenvalues. Those eigenvalues are then related to the convergence rate of a random walk on the directed graph. In particular, the convergence rate is bounded above by
, where
is the second smallest eigenvalue of
(see Theorem 7 of [12] ).
Applying this now to non-backtracking random walks, define
as before. Then as seen above,
is the constant vector with
for all v. Then the directed Laplacian for a non-backtracking walk becomes
![]()
Then Theorem 1 of [12] , applied to the matrix
as defined, gives the Rayleigh quotient for a function
by
![]()
From this it is clear that
is positive semidefinite with smallest eigenvalue
. If
are the eigenvalues of
, then Theorem 7 from [12] implies that the convergence rate for the corresponding random walk is bounded above by
![]()
We remark that for an ordinary random walk on an undirected graph G, the convergence rate is also on the order of
, where
now denotes the normalized Laplacian of the undirected graph G. Note that
![]()
where
denotes the Rayleigh quotient with respect to
, and
![]()
with
given above.
The following result shows that the Laplacian bound does not give an improvement for non-backtracking random walks over ordinary random walks.
Proposition 1. Let G be any graph, and let
be the normalized graph Laplacian and
the non-backtracking Laplacian defined above. Then we have
![]()
Proof. Let
be the function orthogonal to D that achieves the minimum in the Rayleigh quotient for
. So
![]()
Define
by
. Observe that
![]()
So
is orthogonal to
. Therefore
![]()
□
2.4. Ihara’s Theorem
The transition probability matrix
defined above is a weighted version of an important matrix that comes up in the study of zeta functions on finite graphs. We define B to be the
matrix with rows and columns indexed by the set of directed edges of G as follows.
![]()
The matrix B can be thought of as a non-backtracking edge adjacency matrix, and the entries of
describe the number of non-backtracking walks of length k from one directed edge to another, in the same way that the entries of powers of the adjacency matrix,
, count the number of walks of length k from one vertex to another. The expression
is closely related to zeta functions on finite graphs. A result known as Ihara’s Theorem further relates such zeta functions to a determinant expression involving the adjacency matrix. While we will not go into zeta functions on finite graphs in this paper, the following result equivalent to Ihara’s theorem will be of interest to us.
Ihara’s Theorem. For a graph G on n vertices and m edges, let B be the matrix defined above, let A denote the adjacency matrix, D the diagonal degree matrix, and I the identity. Then
![]()
We remark that the expression
is the characteristic polynomial of B evaluated at
, multiplied by the appropriate power of
. In this way the complete spectrum of the matrix B is given by the reciprocals of the roots of the polynomial
. Numerous proofs of this result exist in the literature [7] - [11] . For completeness, we will include here an elementary proof that uses only basic linear algebra. To the knowledge of the author, this proof is original. To begin, we will need a lemma giving a well-known property of determinants.
Lemma 3. Let M be a
matrix, N a
matrix, and A an invertible
matrix. Then
![]()
Proof. Note that
![]()
Taking determinants of both sides gives the result. □
Proof of Ihara’s Theorem. Define S to be the
matrix
![]()
so S is the endpoint incidence operator. Define T to be the
matrix given by
![]()
so T is the starting point incidence operator. We will also define
to be the
matrix giving the reversal operator that switches a directed edge with its opposite. That is,
![]()
Now, a straightforward computation verifies that
(1)
(2)
and
(3)
Then from Lemma 3 and (1) we obtain
![]()
where u is chosen so that the matrix
is inverivle.
Observe that
, so that
, so
. Thus, applying (2) and (3), the above becomes
![]()
where the last step is obtained by observing that
. This is the desired equality for our choice of u. This is a polynomial of finite degree in u, and there are infinitely many u that make
invertible, so the equality holds for all u. □
3. A Weighted Ihara’s Theorem
In this section, we will give a weighted version of Ihara’s Theorem. The proof presented in the previous section does not lend itself well to generalization to the weighted setting, so we will not follow that strategy. Rather, we will follow the main ideas of the proof of Ihara’s theorem found in [11] to obtain our weighted version of this result.
To each vertex
we assign a weight
, and let W be the
diagonal matrix given by
. Define S and T to be the matrices from the proof of Ihara’s Theorem in the previous section, and define
and
. So
is the weighted version of the endpoint vertex-edge incidence operator, and
is the weighted version of the starting point vertex-edge incidence operator. Define
from the proof of Ihara’s Theorem, and define
to be the weighted version of
, that is
![]()
Finally, define the
matrix
by
(4)
Then
is the weighted version of the non-backtracking edge adjacency matrix B seen above in Ihara’s theorem, with
the weight on edge
. We remark that if we take
for each
, then
is exactly the transition probability matrix for a non-backtracking random walk on the directed edges of G defined in Section 2.2. This case is our primary focus, but we note that our computations apply for any arbitrary positive weights assigned to the vertices.
Now, a straightforward computation verifies that
(5)
and
(6)
We will define
. Note that
, so this is the adjacency matrix for the weighted graph with edge weights
. The matrix
is similar to
, so when
, this is the matrix whose entries are the transition probabilities for a single step of a non-backtracking random walk G.
From (5) and (6) we obtain the following equations.
(7)
(8)
We define
to be the diagonal
matrix
and observe that
. It then follows that
(9)
(10)
We remark that in the proof in [11] , they use the unweighted versions of each of these matrices, so
rather than
yields
. Hence S and T will factor through
, so that the
term stays on the right hand side of the above equations. Here we have
is a
diagonal matrix with
. Depending on the
’s this matrix might not behave nicely with respect to the action of S and T, hence the extra terms that need to stay on the left-hand side above. This difference from [11] is one of the primary difficulties in generalizing this result.
We will now perform a change of basis to see how the operator
behaves with respect to the decomposition of the space of functions
as the direct sum of
and
. To this end, fix any basis of the subspace
, and let R be the
matrix whose columns are the vectors of that basis (note that
has rank n). Define
. This will be our change of basis
matrix. To obtain the inverse of M, form the matrix
and observe that
![]()
Therefore we have that
.
Applying this change of basis, direct computation, applying (7) and (9), yields
(11)
Therefore, the matrix
is similar to the matrix
, so they have the same determinant. Thus, we have
proven a weighted version of Ihara’s Theorem, which we state as the following.
Theorem 4. Let G be a graph on n vertices and m edges, and assign an arbitrary positive weight
assigned to each vertex x. Let
be the
weighted non-backtracking edge adjacency matrix with edge weight
assigned to edge
as defined in (4). Let
be the weighted
adjacency matrix with edge weight
assigned to each edge. Let
be the weighted reversal operator defined above, and
the
diagonal matrix with
as defined above. Then we have
![]()
As a corollary to the decomposition in Equation (11), if we take
for all x, then
, and the usual unweighted Ihara’s Theorem falls out immediately.
If we take
, then
becomes the transition probability matrix for the non-backtracking walk on directed edges, and
. This is
clearly similar to the matrix
. So in this case
is similar to the matrix whose entries are the transition probabilities for a single step in a non-backtracking random walk. (Note, however, that
is not the transition probability matrix for a non-backtracking random walk.)
4. The Mixing Rate of Non-Backtracking Random Walks
4.1. An Alternate Proof for Regular Graphs
Applying the results of the previous section to regular graphs yields a different proof of the results from [1] on the mixing rate of non-backtracking random walks on regular graphs.
Let G be a regular graph where each vertex has degree d. Then choosing
for all x yields gives us that
is the transition probability matrix for the non-backtracking random walk on G. We remark that, from the previous
section, we have
,
,
, and
.
Therefore, the decomposition in (11) becomes
![]()
Noting that
can be thought of as block diagonal with m blocks of the form
, then taking determinants, we find that
![]()
and hence
![]()
where the product ranges over all the eigenvalues
of the adjacency matrix A for
. As remarked previously, the left hand side
is the characteristic polynomial of
evaluated at
, so from this we obtain the spectrum of
.
Theorem 5. Let G be a d-regular graph with m edges and n vertices, and let
be the
transition probability matrix for a non-backtracking random walk as defined above. Then the eigenvalues of
are
![]()
where
ranges over the eigenvalues of the adjacency matrix A, and
each have multiplicity
.
From this we obtain the result from [1] .
Corollary 1. Let G be a non-bipartite, connected d-regular graph on n vertices for
, and let
and
denote the mixing rates of simple and non-backtracking random walk on G, respectively. Let
be the second largest eigenvalue of the adjacency matrix of G in absolute value.
If
, then
![]()
If
and
, then
![]()
Proof. We remark that the expression
is precisely the ex-
pression derived by Alon et al. in [1] for the mixing rate of a non-backtracking random walk on a regular graph, and we may proceed with the analysis of the convergence rate in the same way they do. The convergence rate is given by the second largest eigenvalue of
, which will be obtained setting
to be the second largest eigenvalue of A. Let
be this eigenvalue.
For
we have
![]()
So
. Since
is the second largest eigenvalue of the transition
probability matrix P for the usual walk, the first case follows.
For
,
is complex, and we obtain
![]()
so
.
We remark that in this case that
, a classic result of Nilli [13] related to
the Alon-Boppana Theorem implies that we are never too far below this bound. Indeed, the result states that if G is d-regular with diameter at least
, then
. With the restriction that
, then the diameter is at
least
, and so
, and the second case follows. □
4.2. Biregular Graphs
A graph G is called
-biregular if it is bipartite and each vertex in one part of the bipartition has degree c, and each vertex of the other part has degree d. In the weighted
Ihara’s Theorem, we have
, so in the case where G is
-biregular, then we have
. So since
is a multiple of the
identity, as with regular graphs, in the decomposition (11), the
term can be taken to the other side of the equation. Note that
is diagonal with
if u has degree c, or
if
u has degree d. Then
is diagonal with entry
or
Hence
the decomposition (11) becomes
![]()
where
is the adjacency matrix of G.
Note that
is similar to a block diagonal matrix with blocks of the form
, so taking the determinant above we obtain
![]()
so
![]()
We will look at the matrix
. Suppose the first part in the
bipartition of G has size r, and the second part has size s, where without loss of generality,
. By row reduction, this has the same determinant as the matrix
![]()
which is
![]()
Now, the above determinant is given by the product of the eigenvalues of the matrix. Observe that if
is an eigenvalue of the adjacency matrix A, then
is an eigenvalue of
. Therefore, in all we have
![]()
where the product ranges over the
largest eigenvalues of A (or in other words,
ranges of the s eigenvalues of
). Therefore the characteristic polynomial is given by
![]()
Thus we can explicitly obtain the eigenvalues of
.
Theorem 6. Let G be a
-biregular graph, let the part with degree c have size r, and the part with degree d have size s, and assume without loss of generality that
. Suppose G has n vertices and m edges. Then the eigenvalues of the non-backtracking transition probability matrix
defined above are
![]()
as well as the 4 roots of the polynomial
![]()
for each value of
ranging over the s positive eigenvalues of the adjacency matrix A. These roots are
(12)
We can now give a version of Corollary 1 for
-biregular graphs.
Corollary 2. Let G be a
-biregular graph with
. Let
be the square of the second largest eigenvalue of the transition probability matrix P for a random walk on G, and let
be the square of the second largest modulus of an eigenvalue of
. Let
be the second largest eigenvalue of the adjacency matrix of G. Then we have the following cases.
If
, then
![]()
If
and both c and d are
, then
![]()
Proof. We need to compare the eigenvalues of
to the eigenvalues of
. Note that for
an eigenvalue of A, we have
![]()
which implies
and
. Then observe
![]()
so the eigenvalues of P are
where
ranges over the eigenvalues of A. Note that the largest eigenvalue of A is
.
Let
equal the expression (12), and consider the following cases.
If
, then
is real. Direct computation verifies that, evaluating the expression (12) at
yields
and
for
in this range. Therefore, in this case the eigenvalue of
always has smaller absolute value than the corresponding eigenvalue of P, implying
. The lower bound follows from (12) ignoring the square root inside. Thus the first case follows.
If
, then
is complex, and direct computation shows
![]()
so
![]()
A version of the Alon-Boppana Theorem exists for
-biregular graphs as well, proven by Feng and Li in [14] (see also [15] ).
Theorem 7. [14] Let G be a
-biregular graph, and let
be the second largest eigenvalue of the adjacency matrix A of G. Then
![]()
where the diameter of G is greater than
.
Observe that certainly the diameter is at least
, so that the condition on the degrees and Theorem 7 imply that
![]()
As
, so this gives the result for the second case. □
5. Conclusions
We have looked at non-backtracking random walks from the point of view of walking along directed edges. For the special cases of regular and biregular graphs, our weighted version of Ihara’s Theorem (Theorem 4) has given us the complete spectrum of the transition probability matrix for the non-bakctracking walk, allowing for easy comparison between the non-backracking mixing rate, and the mixing rate of the usual random walk. Clearly, it would be desirable to extend these reults to more general classes of graphs. The difficulty in applying Theorem 4 directly is with the term involving
. As seen in Section 3,
is a
diagonal matrix with
![]()
In the case of regular and biregular graphs, this expression is constant (we get
and
for the d-regular and
-biregular cases respectively), making
simply a multiple of the identity. This allows the difficulty to be handled relatively easily. Regular and biragular graphs are in fact the only graphs for which
is a multiple of the identity, suggesting that these exact techniques will not work as nicely on more general classes of graphs. If a cleaner version of Theorem 4 could be proven, then, aside from being interesting in its own rite, it could potentially be used to extend our results on non-backtracking random walks.
Acknowledgements
The author would like to thank Fan Chung for numerous helpful discussions throughout the process of writing this paper.