A Novel Design Method for Protein-Like Molecules from the Perspective of Sheaf Theory ()
1. Introduction
Proteins are folded sequences of amino acids, which perform variety of functions in cells. They perform their functions by interacting with other proteins as well as small molecule ligands (in enzyme-substrate interactions).
In protein-protein interactions, proteins interact each other by forming temporary complexes of proteins called “reaction intermediates”. Stability of reaction intermediates then depends on shape complementarity at the protein- protein interfaces (i.e., contact area on surface).
In protein-ligand interactions, proteins bind to one or more small molecule ligands at pockets (or grooves) on their surfaces. Specificity and affinity of the interactions then depend on shape complementarity at the ligand-binding pockets.
In both cases, the functions of proteins are largely determined by their shape. Since structural data for thousands of protein-protein interfaces and ligand- binding pockets are available in the PDB database [1] , it is conceivable that artificial proteins could be created by combining these known structures. On the other hand, Mathematics has Sheaf Theory as a framework for patching local data together to obtain global data.
In this paper, we propose a novel design method for artificial protein-like molecules (i.e., folded sequences of basic units) with a given shape. In the method, a new molecule is obtained from a given set of known molecules using the framework of Sheaf Theory. The design of protein-like molecules is carried out in two steps:
1) Specify the shape of a new molecule.
2) Find a folded sequence of basic units that forms the specified shape.
Note that it is not trivial to combine proteins with known structures to form a new protein (i.e., a folded sequence of amino acids). For example, since a local surface structure is often formed by multiple amino acid fragments which are distant in the amino acid sequence, the local surface structure may be unfolded in the new molecule if the corresponding fragments are not arranged adequately in the new amino acid sequence (in other words, proteins are neither “rigid” like holomorphic functions nor “flexible” like continuous functions).
In this paper, protein-like molecules are represented as a closed trajectory in a flow of n-simplices. Due to page limitations, only the two-dimensional case (i.e., flows of 2-simplices) is considered. We then propose a novel design method, called the “incremental design method”, which uses the framework of Sheaf Theory to compute a closed trajectory (i.e., a new molecule) from a given set of shorter closed trajectories (i.e., smaller known molecules). We believe this method is essential, especially when designing hybrids of known proteins.
In the past, mathematical studies of protein structure have been concerned mostly with the classification and characterization of their structure [2] - [7] . The author is unaware of any other mathematical studies on the design of protein-like molecules by other researchers. For an overview of protein-like molecules, see [8] . No prior knowledge of Protein Science, Sheaf Theory [9] , nor Category Theory [10] is required.
A quick review of Sheaf Theory is given: Let U be a subset of a 2D Euclidean space R2. Let
be a covering of U, i.e., a set of subsets in R2 such that
. Suppose that each subset V of R2 is associated with a set F(V) of mathematical data. Let σ be a function
defined “consistently” on A. In Sheaf Theory, we can compute a value of F(U) by patching together the values
on A. For example, in the case of the sheaf of continuous functions on R2, F(V) is the set of continuous functions defined on an open set V in R2. We then obtain a “global” continuous function on an open set U by patching together “local” continuous functions
on
.
Figure 1 illustrates the design method proposed in this paper. In our case, F(V) is a set of closed trajectories on V. Figure 1(a) is an example of our design method. Given a subset U of R2 (left end) and a covering
of U (second from left). Suppose that closed trajectories
and
are given (third from left). We then obtain a closed trajectory
(right end) by patching together the two closed trajectories
and
(enclosed closed trajectories are considered part of the enclosing closed trajectory).
Here’s where the problem comes up. In the case of sheaves, we can compute the global data on U by patching together the local data on “any” covering of U (if they are “consistent”). On the other hand, computation fails for some covering A in our case. (Note that
can be the empty set because the restriction of an element of F(U) on
may not be contained in
.) Figure 1(b) is an example of unsuccessful computation. Given a subset U of R2 (left end) and a covering
of U (second from left). Suppose that
and
are given (third from left), Then, patching together
and
, we obtain two closed trajectories. In Section 5, we consider sufficient conditions for “local” flows on a covering to produce a single closed trajectory.
This paper is organized as follows. Section 2 explains the loop model of protein-like molecules. Section 3 defines a differential geometric structure on a triangular mesh B. Section 4 formulates the protein design problem from the perspective of Sheaf Theory, where the design problem is rephrased into the “incremental design problem”. Section 5 studies the incremental design problem. Due to page limitations, we only consider the case where a covering consists of two smaller molecules. Finally, Section 6 presents discussion and future directions.
2. The Loop Model of Protein-Like Molecules
Shapes of molecules are given as a region on a hexagonal mesh H. Molecules then correspond to a closed trajectory on a triangular mesh B, which is a subdivision of H. New molecules are designed using a differential structure defined on B.
2.1. Regions on a Hexagonal Mesh H
Figure 2 is explained in this subsection. Shown in Figure 2(a) is the honeycomb mesh obtained by dividing a 2D Euclidean plane R2 into a set of regular hexagons. H denotes the set of all hexagons of the mesh. A subset S of H is called connected if each
shares a side with another
(i.e., for each
, there exists another
such that
and
share a side).
Shown in Figure 2(b) is a connected subset
of H. Since hexagons of H do not overlap each other, we write
Figure 1. The design method for protein-like molecules proposed in this paper.
Figure 2. The mathematical model of the shape of protein-like molecules.
. (1)
If S consists of only one hexagon
, we write either
or
.
Shown in Figure 2(c) is an integral region on H, defined as follows. (Addition “+” of hexagons will be defined later in this subsection.)
Definition 2.1. (Integral Region) An integral region
on H is a hole-free subset of R2 covered by a connected finite subset
of H. The hexagonal base of
is then defined by
. (2)
For example, the hexagonal base of
of Figure 2(c) is shown in Figure 2(b).
denotes the set of all integral regions on H. Note that
, i.e., hexagons of H are integral regions.
The set difference between two integral regions
is defined by
, (3)
where
. The hexagonal base
of
is defined in the same way as for integral regions (it may have holes).
Lemma 2.2. Let
. Then,
if
.
Shown in Figure 2(d) are region-intermediates on H, defined as follows. Let
.
is called connected if each
shares a side with another
.
is called disjoint if
’s do not overlap each other. If
is disjoint, we write
. (4)
If
consists of only one integral region
, we write either
or
.
Definition 2.3. (Region-Intermediate) A region-intermediate
on H is a connected finite disjoint subset
of
. Since S is disjoint, we write
. (5)
The hexagonal base of
is then defined by
. (6)
RI denotes the set of all region-intermediate on H.
Finally, addition of integral regions is defined using addition of directed polygonal chains as shown below.
Definition 2.4. (Directed Polygonal Chain) Let
be points in R2. A directed polygonal chain
in R2 is a set of directed line segments defined by
, (7)
where
denotes the directed line segment from
to
. If
, we obtain a closed directed polygonal chain in R2.
denotes the area of R2 bounded by
in R2.
Let
be a set of closed directed polygonal chains in R2.
is called disjoint if
’s do not overlap each other. If
is disjoint, we write
. (8)
If
is disjoint, the area
of R2 bounded by
is defined by
. (9)
Let
. Since
has no hole, we have
(10)
for some
, where the vertices are labeled counter-clockwise. In the case of Figure 2(c),
. (11)
Definition 2.5. (The Boundary Operator ∂ on RI) Let
, where the vertices are labeled counter-clockwise. The boundary
of
is defined by
. (12)
In this paper, the boundary of an integral region is always given the counter- clockwise orientation. Let
. Since
is disjoint, the boundary
of
is defined by
(13)
(See Equation (8)) Note that
.
Addition of integral regions is defined as follows.
Definition 2.6. (
) Let
such that they do not overlap. Addition of
and
is defined by
. (14)
In other words, the same line segments in opposite directions (i.e.,
and
) are cancelled when added. Addition of
and
is then defined by
. (15)
In the case of Figure 2(c), we have
. (16)
Let’s denote the set of all natural numbers
by
.
Lemma 2.7.
The set of all integral regions on H is given by
. (17)
The set of all region-intermediate on H is given by
. (18)
Proof. They follow immediately from the definitions.∎
Remark 2.8. Integral domains play the role that “integers” do for rational numbers. That is, “rational” regions are obtained by dividing integral regions into loops of triangles [11] .
2.2. Loops on a Triangular Mesh B
Figure 3 is explained in this subsection. Shown in Figure 3(a) is the triangular mesh obtained by dividing every hexagon of H into 6 equilateral triangles. B denotes the set of all triangles of the mesh.
Definition 2.9. (Trajectories of Triangles) A trajectory of triangles on B is a sequence of triangles of B connected by a common edge. (No direction is assigned to a trajectory.) The edges not used to connect adjacent triangles are called the normal edges (of the trajectory) at the triangle (i.e., the “normal vector” of the trajectory). In figures, normal edges are indicated by thick line segments.
Definition 2.10. (Loops of Triangles) A loop on B is a closed trajectory of triangles on B. In this paper, protein-like molecules are represented as a loop of triangles of B.
denotes the area in R2 swept by a loop
, where the area enclosed by
is also included in
(for example,
of Figure 1(a)).
A loop
of length 6 is called a hexagonal loop. A hexagonal loop
is denoted by
if
. In other words,
denotes both a loop of length 6 and a hexagon of H, i.e.,
.
Figure 3. The mathematical model of protein-like molecules.
Remark 2.11. The hexagonal base
of
defined above is a region-intermediate consisting of hexagons as well as a loop-intermediate consisting of loops of length 6.
Shown in Figure 3(b) is a set
of seven hexagonal loops on H. Since
’s do not overlap each other, we write
. (19)
If L consists of only one hexagon
, we write either
or
.
Shown in Figure 3(c) is an integral loop on B, defined as follows.
Definition 2.12. (Integral Loop) A loop
on B is called integral if
is an integral region on H.
denotes the set of all integral loops, i.e.,
. (20)
is called an implementation of
if
. For example,
of Figure 3(c) is an implementation of
of Figure 2(c).
Let
. The set
of integral regions associated with L is defined by
. (21)
If
is disjoint, we write
. (22)
If L consists of only one integral loop
, we write either
or
.
Shown in Figure 3(d) top is a loop-intermediate on B, defined as follows.
Definition 2.13. (Loop-Intermediate) A loop-intermediate on B is a finite subset
of
such that
is a region-intermediate on H. Since
is disjoint, we write
. (23)
LI denotes the set of all loop-intermediate on B, i.e.,
. (24)
Let
.
is called an implementation of
if
. Note that some region-intermediates have no implementation. For example,
of Figure 2(d) has no implementation (Figure 3(d) bottom).
Now, fusion and fission of integral loops are defined using addition of the corresponding integral regions (Addition of integral loops is considered in Section 4 below).
Definition 2.14. (Fusion and Fission of Integral Loops) Let
. Let
.
is called the fusion of
if
. (25)
is then called a fission of
. In Figure 3, both
and
are fissions of
.
Finally, let’s define flows of triangles on B.
Definition 2.15. (Flows of Triangles on B) A flow ψ of triangles on B is an assignment of normal edges to triangles of B, i.e.,
(26)
ψ is called regular at t if
consists of one edge (i.e., t is connected to exactly two adjacent triangles). ψ is called regular if ψ is regular at all triangles of B.
denotes the set of all regular flows on B.
A triangle t of B is called singular if it is not regular. Singular triangles are called branch, terminal, or isolated triangles when they have no normal edges (i.e., connected to all the adjacent triangles), two normal edges (i.e., connected to only one adjacent triangle), or three normal edges (i.e., connected to no adjacent triangles), respectively.
Remark 2.16. We often consider trajectories of triangles without explicit reference to the corresponding flow ψ. A triangle t is then called regular if ψ is regular at t.
Definition 2.17. (Disjoint Unions
and
) Let
.
denotes the set of all the loops of ψ. Since trajectories of ψ do not overlap, we have
, (27)
where
’s are the loops of ψ.
is called the loops associated with ψ.
denotes the associated regions, i.e.,
, (28)
is called the regions associated with ψ.
Lemma 2.18. Let
.
1)
if
is finite, connected, and hole-free.
2)
if
is finite and connected.
2.3. Design Problem for Protein-Like Molecules
In the loop model of protein-like molecules, the shape of a new molecule is an integral region
. A new molecule of the shape
then is an implementation
of
. The problem we consider in this paper is now defined as follows.
Problem 2.19. (Design of Protein-like Molecules) Given
, find
such that
.
In the next section, the problem is rephrased using a differential geometric structure on B.
3. Differential Geometric Structure on B
A differential geometric structure on B is naturally obtained by embedding the honeycomb mesh H in a 3D Euclidean space R as shown in this section. We denote the set of all real numbers by R.
3.1. Embedding of H in R3
Shown in Figure 4(a) is a unit cube in R3 and its orthogonal projection on the plane H0 in R3 defined by
. (29)
H is embedded in H0 using unit cubes in R3, as explained below.
Definition 3.1. (Unit Cubes in R3) Let
.
denotes the unit cube at
, i.e.,
, (30)
where
is the closed interval in R between x and y. If
, then
is also written as
. UC denotes the set of all unit cubes at the integer lattice
, i.e.,
(31)
The height
of
is defined by
. (32)
Remark 3.2.
is given by
(33)
Shown in Figure 4(a) top is a unit cube
with vertices
,
,
,
,
,
,
, and
. The vertical diagonals OU, OV, and OW are drawn as thick line segments.
Definition 3.3. (Projection π of R3 onto H0) π is the orthogonal projection of R3 onto
defined by
. (34)
Shown in Figure 4(a) bottom is the orthogonal projection of OU, OV, and OW onto
, forming part of a hexagonal mesh on
.
Definition 3.4. (“Bumpy” Mesh
)
is the “bumpy” honeycomb mesh defined on the top surfaces of
Figure 4. Differential geometric structure on the trianglular mesh B.
(35)
The edges of the mesh are the vertical diagonals.
denotes the set of “bumpy” hexagons drawn on
(Figure 4(b) top).
Shown in Figure 4(b) bottom is the projection of
onto
by π. In the following, we identify H with
. An embedding of B in
is then obtained by dividing every hexagon of
into 6 equilateral triangles.
3.2. Tangent Cones
Shown in Figure 4(c) top-left is a tangent cone
to a region-intermediate
, defined as follows. Roughly speaking, a 3D cone with multiple tops is obtained by stacking unit cubes diagonally (from
to
).
Definition 3.5. (Tangent Cone Cone A) Let
. The tangent cone
generated by A is defined by
(36)
TCONE denotes the set of all tangent cones, i.e.,
(37)
P (TCONE) denotes the set of all subsets of TCONE, i.e., the power set of TCONE.
Definition 3.6. (Tops and Bottoms of Cone A) Let
. The top vertices of
are the peaks of the cone.
denotes the set of all top vertices of
. Note that
. The bottom vertices of
are the dents of the cone which are peaks if we look up the cone from
.
denotes the set of all bottom vertices of
.
We define flows of triangles on B using tangent cones in R3.
Definition 3.7. (The Flow
on B) Let
. Note that the surfaces of
consist of the top faces of unit cubes of UC. Taking their vertical diagonals as normal edge, we obtain a regular flow of triangles on the surface of
(Figure 4(c) top-right). Projecting the regular flow onto
by π, we obtain a regular flow on B. The regular flow is called the flow on B induced by
and denoted by
.
Definition 3.8.
denotes the set of all regular flows on B induced by tangent cones, i.e.,
. (38)
and
denote the corresponding set of region-intermediates and loop-intermediates, respectively, i.e.,
. (39)
. (40)
The author has no proof of the following claim.
Claim 3.9.
.
In this paper, only flows of FLWTCONE are considered.
Remark 3.10.
. For example,
of Figure 2(d) is not contained in
.
3.3. Tangent Cones to M0
Here we define a tangent “space” to
.
Definition 3.11. (The Boundary Cone
) Let
, (41)
where
. (42)
The boundary cone
to
is defined by
, (43)
where
’s are points on the top surfaces of
such that
. (44)
Remark 3.12. Since π is a one-to-one mapping between the top surfaces of
and
, the boundary cone
exists for all
.
Definition 3.13. (The Set TM0 of Tangent Cones) Let
. The set
of tangent cones to
is defined by
, (45)
where
is the surfaces of C, i.e.,
. (46)
Note that
.
Remark 3.14.
dose not imply
.
Lemma 3.15. (The Base Tangent Cone
) Let
. There exists
such that
, (47)
for
. (48)
is called the base tangent cone associated with
.
Proof. Since π gives a one-to-one mapping between the top surfaces of
and
, the result follows immediately.∎
Definition 3.16. (Mapping T) Assigning
to each
, we obtain a mapping T from
to P (TCONE). Let
. A section σ of T on S is a mapping from S to P (TCONE) such that
for all
.
denotes the set of all sections of T on S.
The design problem is now rephrased as follows.
Problem 3.17. (Design of Protein-like Molecules) Given
, find
such that
.
4. Loop Design Problem from the Perspective of Sheaf Theory
To mimic Sheaf Theory, “subsets” of a region-intermediate
are defined using a binary relation over
. A “covering”
of
is then defined as a set of region-intermediates such that
is the least upper bound of S with respect to the binary relation. An implementation of
is obtained as the sum of implementations of
, where addition is defined using transformations on
as shown below.
4.1. Binary Relation over RITCONE and LITCONE
Shown in Figure 5(a) is a binary relation over
, defined as follows.
Definition 4.1. (Binary Relation ≤ over
) Let
. Then,
if and only if, for any
,
there exist
such that
. (49)
In figures, we often use the arrow
to indicate
.
Shown in Figure 5(b) is the binary relation over
induced by the binary relation ≤ over
. That is,
Definition 4.2. (Binary Relation ≤ over
) Let
. Then,
if and only if, for any
,
there exists
such that
. (50)
In figures, we often use the arrow
to indicate
.
Remark 4.3. Notations such as
and
are used to explicitly indicate the binary relation equipped with a set.
Lemma 4.4. Let
. Then,
if
. (51)
That is, T is a “covariant” mapping from
to
.
Shown in Figure 5(c) is examples of the greatest lower bound of loop-inter- mediates, defined as follows.
Definition 4.5. (⋀S and ⋁S) Let
. The greatest lower bound ⋀S of S is the greatest element of
that is less than or equal to each element
Figure 5. Binary relation ≤ over
and
.
of S. The least upper bound ⋁S of S is the least element of RI that is greater than or equal to each element of S. ⋀S and ⋁S for
are also defined similarly.
Remark 4.6. In general, there are multiple candidates for ⋀S and ⋁S. In such cases, select one of them arbitrarily. Because of this uncertainty, “
for all
” does not imply M0 ≤ ⋀S.
Remark 4.7.
for any
, where
denotes the empty set.
We use the following lemma to find “subsets” of a region-intermediate.
Lemma 4.8. Let
and
. Then,
If and only if
. (52)
Proof.
if and only if
.
if and only if
. The result follows immediately. ∎
4.2. Coverings of a Region-Intermediate
Two types of coverings are defined as follows.
Definition 4.9. (Coverings of a Region-Intermediate) Let
. Let
. S is called a covering of M0 if ⋁S = M0.
Definition 4.10. (Topological Coverings of an Integral Region) Let
. Let
. V is called a topological covering of
if 1)
, and 2) for each
, there exists another
such that
.
Remark 4.11. Since some integral regions have no implementation (i.e., there exists
such that
for any
), topological coverings may have no sections on them.
Lemma 4.12. Let
. Let
be a topological covering of . A covering of
is then obtained by
. (53)
Example 4.13. In Figure 5(c),
is a topological covering of X.
is a covering of X.
The proposed design method uses a specific type of covering (in Problem 4.38).
Definition 4.14. (Hexagonal Covering SV of an Integral Region) Let
be a topological covering of
. The hexagonal covering
of
associated with V is defined by
. (54)
Lemma 4.15.
is a covering of
.
The design problem is now rephrased as follows.
Problem4.16. (Incremental Design of Protein-like Molecules) Given 1) a target shape
:
, 2) a topological covering V of
:
, 3) a section σ of T on V:
. Then, compute
such that
by patching “local” loop-intermediates
,
,
, and
together.
4.3. Transformations on LITCONE Induced by UC
To patch loop-intermediates together, we define addition of loop-intermediates using transformations on TCONE, defined as follows.
Definition 4.17. (TRANS (TCONE)) A transformation on TCONE is a mapping from TCONE to TCONE. TRANS (TCONE) denotes the set of all transformations on .
Let
and
. We use the symbol “
” to denote the transformation of C by A, i.e.,
.
is also called the action of A on C. Let
. We use the symbol “
” to denote the composition of transformations, i.e.,
. (55)
Example 4.18. Unit cubes induce transformations on TCONE as follows. Let
, where
, i.e.,
. (56)
Taking the unit cube
at
from C, we obtain another tangent cone
, (57)
where
,
, and
.Conversely, putting the unit cube
on
, we obtain the original cone C.
Definition 4.19. (The minimal L-cone CL) Let
and
. C is called a L-cone if
. The tangent cone
is the minimal L-cone with respect to set inclusion, i.e.,
for any L-cone C. Since
,
always exists and uniquely determined by L.
Transformations on TCONE induce transformations on
as follows.
Definition 4.20. (Transformations on
) Let
and
. The transformation of L by A is defined by
. (58)
is called the action of A on L.
Definition 4.21. (Transformations
,
, and
) Let
. Two transformations
and
on TCONE induced by
is defined by
, (59)
, (60)
where
.
is called the put & fill-action by
on C.
is called the take & clear-action by
on C. We denote the composition of P after T by PT, i.e.,
. (61)
is called the take & put-action by
on C.
Remark 4.22. After the action of
on
, the cube
is always visible from
. On the other hand, after the action of
on C,
may not be visible from
.
Lemma 4.23. Let
and
. Then,
if
, (62)
if
. (63)
Definition 4.24. (
)
denotes the set of all the transformations on TCONE generated by finite compositions of
and
(
), i.e.,
(64)
In general,
for
and
Example 4.25. Let
,
such that
. Then,
, (65)
. (66)
Definition 4.26 (Well-defined Transformations) Let
. G is called well-defined if the action of G on TCONE does not depend on the order of
’s, i.e.,
for all
. (67)
for any permutation ρ of
.
Remark 4.27. If G is well-defined, removed unit cubes are removed forever and placed unit cubes are placed forever.
4.4. Addition on LITCONE
Addition of loop-intermediates is now defined using transformations on TCONE.
Definition 4.28. {Transformations on TM0} Let
. The set
of transformations on
is defined by
. (68)
Lemma 4.29. Let
and
. Then,
. (69)
Lemma 4.30. Let
and
. Then,
If
, then
. (70)
Definition 4.31. (The Relative Transformation
) Let
. The relative transformation
of L with respect to H is defined by
, (71)
where
, (72)
. (73)
is defined by
for
.
Lemma 4.32. Let
and
such that
. Then,
is well-defined and
. (74)
Remark 4.33. The hexagonal base
is a loop-intermediate consisting of loops of length 6 as well as a region-intermediate consisting of hexagons.
Lemma 4.34. Let
and
such that
. Then,
. (75)
Addition of loop-intermediates is now defined as follows.
Definition 4.35. (Addition of Loop-Intermediates) Let
and
such that
. Then,
. (76)
Remark 4.36. Addition
is defined with respect to
, which is not explicitly indicated in the formula.
Definition 4.37. (Section
on
) Let
be a topological covering of
. Let
. The section
of T on the hexagonal covering
is defined by
. (77)
Note that
. (78)
Since
, we can define addition
(79)
by Definition 4.35.
Using addition of loop-intermediates, the design problem is now rephrased as follows.
Problem 4.38. (Incremental Design of Protein-like Molecules) Given 1) a target shape
:
; 2) a topological covering V of
:
; 3) a section σ of T on V:
. Then, we obtain
such that
by
(80)
where
. The question here is “when dose L0 consist of a single loop?”
5. Incremental Design of Protein-Like Molecules (N = 2)
In general, the sum of loops is not a loop. In this section, we consider sufficient conditions for the L0 of Problem 4.38 to be a loop. Due to page limitations, we only consider a topological covering consisting of two integral regions. The incremental design problem is then given as follows
Problem 5.1. (Incremental Design of Protein-like Molecules (n = 2)) Given 1) a target shape
:
; 2) a topological covering V of
:
; 3) a section σ of T on V:
. Then, we obtain
such that
by
(81)
where
. Find sufficient conditions for
to be a loop.
5.1. Closer Look at the Action of
Figure 6 shows the effect of the action of
on
using the height of normal edges, defined as follows.
Definition 5.2. (Height of Normal Edges) Let
be a vertical diagonal of a top face of
. The height
of
is defined by
. (82)
Let
. Let
be a normal edge of the induced flow
such that
, where
is the corresponding vertical diagonal on the surfaces of C. The height
of
(with respect to C) is defined by
Figure 6. The action of
on
.
. (83)
Shown in Figure 6(a) top is the top view of the base tangent cone
(84)
of some
. Shown in Figure 6(a) bottom is the loop-intermediate
(85)
on B. By definition, the heights of all normal edges of
are 0.
Shown in Figure 6(b) top is the tangent cone
. (86)
In the upper part, two Y-shaped sets of normal edges are replaced by two inverted Y-shaped sets of normal edges (thick line segments) by putting the two unit cubes
and
. In the lower part, a Y-shaped set of normal edges is replaced byan inverted Y-shaped set of normal edges (thick line segments) by taking the unit cube
.
Shown in Figure 6(b) bottom is the loop-intermediate
. (87)
The light grey area indicates the projection image of
and
by π. The dark grey area indicates the projection image of the removed
by π. Note that normal edges of different heights are not directly connected.
Lemma 5.3. If two normal edges
and
of a flow of triangles are connected, the difference of their heights is even, i.e.,
. (88)
In Figure 6(c) top, the normal edges of height −1 are connected to the normal edges of height 1 by putting the unit cube
of height −1 on the tangent cone of Figure 6(b). In Figure 6(c) bottom, the light grey area (height −1) and the dark grey area (height 1) are now in contact.
In Figure 6(d) top, the normal edges of height −2 are connected to the normal edges of height 0 by putting the unit cube
of height −2 on the tangent cone of Figure 6(c). In Figure 6(d) bottom, the white area indicates the projection image of
. Note that the grey area (height 0) and the white area (height −2) are in contact.
5.2. Computation of Loops from the Hexagonal Base
In the loop model, it is easier to design an integral loop from scratch than to design a “hybrid” of known integral loops, since the area enclosed by a loop
is included in
. (In protein science, it is a formidable task to design a novel artificial protein from scratch.)
Let
. Disconnecting normal edges of
along the boundary, we obtain
such that
as explained below.
Shown in Figure 7(a) top is the top view of the base tangent cone
(89)
of some
. Shown in Figure 7(a) bottom is the loop-intermediate
(90)
on B. By definition, the heights of all normal edges of
are 0.
In Figure 7(b), normal edges of height 0 is disconnected by putting unit cubes of height −1 along the boundary on
. The light grey area in Figure 7(b) bottom indicates the projection image of the added unit cubes.
In Figure 7(c), normal edges of height 0 is disconnected by taking unit cubes of height 0 along the boundary from
. The dark grey area in Figure 7(c) bottom indicates the projection image of the removed unit cubes.
In Figure 7(d) left, a loop-intermediate consisting of three integral loops is obtained by putting 8 unit cubes of height −1 and taking a unit cube of height 0 along the boundary. Then, taking another unit cube of height 0 at the meeting point of the boundaries of the three loops, we obtain the integral loop shown in Figure 7(d) right.
5.3. Sufficient Conditions for L0 to be a Loop
Sufficient conditions for L0 of Problem 5.1 to be a loop are given using the two concepts defined below.
Definition 5.4. (The set
of normal edges) Let
.
denotes the set of all the normal edges of height 0 contained in L.
Definition 5.5. (Rifts of a Loop-Intermediate) A crack of
is a polygonal chain of normal edges of L connected to the boundary of L. A crack is called a rift if it consists of more than one normal edge.
Lemma 5.6. Let
. Let
such that
. Let
. If
is well-defined, then
. (91)
Figure 7. Computation of loops from the hexagonal base.
In general, the set
shrinks monotonically as more loop- intermediates are added.
Proof. Because of Remark 4.27, removed normal edges of height 0 are removed forever. ∎
Proposition 5.7. (Sufficient Conditions to be a Loop) Settings are the same as for Problem 5.1. Let
, i.e.,
. (92)
Then,
consists of a single loop if
is well-defined and one of the following three conditions are satisfied:
1) No cracks of
and
connect to
. There is at most only one rift of
that penetrates into
through
. No rift of
penetrates into
through
(Figure 8(a) and Figure 8(b)).
2) No cracks of
and
connect to
. There is at most only one rift of
that penetrates into
through
. No rift of
penetrates into
through
.
3) Both
and
have a crack connected to
. No rift of
penetrates into
through
. No rift of
penetrates into
through
(Figure 8(c)).
Proof. Since
is well-defined, normal edges of height 0 are not added as a result of addition by Lemma 5.6. That is, cracks are extended only by normal edges of height n, where
such that
and n is even. “No cracks of
and
connect to
” implies
is not separated by a polygonal chain contained in
.
Since
and
bring no normal edges of height n (
) into
and
, respectively, the result follows. ∎
Shown in Figure 9 are examples where the sufficient conditions are not satisfied.
In Figure 9(a), both
and
have a rift that penetrates into
through
and
, respectively. In Figure 9(b),
has two rifts that penetrate into
through
. In Figure 9(c), both
and
have a crack connected to
, and
has a rift that penetrates into
through
.
Figure 10(a) is the case given in Figure 1(a). Shown in Figure 10(a) bottom are all normal edges of height 0, where both
and
have a rift consisting
Figure 8. Sufficient conditions for L0 of Problem 5.1 to be a loop.
Figure 9. Examples where the sufficient conditions are not satisfied.
Figure 10. Incremental design of protein-like molecules (Examples given in Figure 1).
normal edges of height 0. All other normal edges are height 1. As a result of addition, some of the normal edges of height 0 are removed and we obtain the loop
.
Figure 10(b) is the case given in Figure 1(b). Shown in Figure 10(b) bottom are all normal edges of height 0 and height −1 (thick line segments). All other normal edges are height 1. In this case,
is not well- defined and we cannot use Lemma 5.6. As a result of addition, the rift of
is extended by normal edges of height 0 and we obtain two loops
and
.
6. Discussion
A novel design method for protein-like molecules is proposed from the perspective of Sheaf Theory. In this method, a new molecule of a given shape is obtained as the sum of smaller molecules. Since the sum of loops is not a loop in general, sufficient conditions for a sum to be a loop are also considered. We believe this method is essential, especially when designing hybrids of known proteins.
Previous mathematical studies of protein structure have focused primarily on characterization and classification of structures, and the author is aware of no other mathematical research on protein design. As such, there is much room for improvement in this study, which is still in its infancy. The author hopes that this paper will inspire more mathematicians to become interested in the mathematical research on protein design.
As directions for future research, there are two directions. One is the study of three-dimensional case, in which protein-like molecules are represented as a loop of tetrahedra [12] . The other is the study of loops on various hexagonal meshes other than the “flat” mesh H considered in this paper [13] . Examples include hexagonal meshes on the surface of 3D molecules (i.e., loops of tetrahedra). Note that a 2D triangular flow is induced on the surface of a complex of loops of tetrahedra.
In the three-dimensional case, two difficulties arise. First, the shape of a molecule is given on a mesh of dodecahedron, where a dodecahedron can be divided into four loops of tetrahedra (A hexagon cannot be divided into more than one loop of triangles). Second, the height of normal edges of tetrahedra is classified into three congruence classes of modulo 3, not two congruence classes of modulo 2.