A Novel Design Method for Protein-Like Molecules from the Perspective of Sheaf Theory
Naoto Morikawa
Genocript, Zama, Japan.
DOI: 10.4236/ojdm.2023.133007   PDF    HTML   XML   67 Downloads   402 Views  

Abstract

Proteins perform a variety of functions in living organisms and their functions are largely determined by their shape. In this paper, we propose a novel mathematical method for designing protein-like molecules of a given shape. In the mathematical model, molecules are represented as loops of n-simplices (2-simplices are triangles and 3-simplices are tetrahedra). We design a new molecule of a given shape by patching together a set of smaller molecules that cover the shape. The covering set of small molecules is defined using a binary relation between sets of molecules. A new molecule is then obtained as a sum of the smaller molecules, where addition of molecules is defined using transformations acting on a set of (n + 1)-dimensional cones. Due to page limitations, only the two-dimensional case (i.e., loops of triangles) is considered. No prior knowledge of Sheaf Theory, Category Theory, or Protein Science is required. The author hopes that this paper will encourage further collaboration between Mathematics and Protein Science.

Share and Cite:

Morikawa, N. (2023) A Novel Design Method for Protein-Like Molecules from the Perspective of Sheaf Theory. Open Journal of Discrete Mathematics, 13, 63-85. doi: 10.4236/ojdm.2023.133007.

1. Introduction

Proteins are folded sequences of amino acids, which perform variety of functions in cells. They perform their functions by interacting with other proteins as well as small molecule ligands (in enzyme-substrate interactions).

In protein-protein interactions, proteins interact each other by forming temporary complexes of proteins called “reaction intermediates”. Stability of reaction intermediates then depends on shape complementarity at the protein- protein interfaces (i.e., contact area on surface).

In protein-ligand interactions, proteins bind to one or more small molecule ligands at pockets (or grooves) on their surfaces. Specificity and affinity of the interactions then depend on shape complementarity at the ligand-binding pockets.

In both cases, the functions of proteins are largely determined by their shape. Since structural data for thousands of protein-protein interfaces and ligand- binding pockets are available in the PDB database [1] , it is conceivable that artificial proteins could be created by combining these known structures. On the other hand, Mathematics has Sheaf Theory as a framework for patching local data together to obtain global data.

In this paper, we propose a novel design method for artificial protein-like molecules (i.e., folded sequences of basic units) with a given shape. In the method, a new molecule is obtained from a given set of known molecules using the framework of Sheaf Theory. The design of protein-like molecules is carried out in two steps:

1) Specify the shape of a new molecule.

2) Find a folded sequence of basic units that forms the specified shape.

Note that it is not trivial to combine proteins with known structures to form a new protein (i.e., a folded sequence of amino acids). For example, since a local surface structure is often formed by multiple amino acid fragments which are distant in the amino acid sequence, the local surface structure may be unfolded in the new molecule if the corresponding fragments are not arranged adequately in the new amino acid sequence (in other words, proteins are neither “rigid” like holomorphic functions nor “flexible” like continuous functions).

In this paper, protein-like molecules are represented as a closed trajectory in a flow of n-simplices. Due to page limitations, only the two-dimensional case (i.e., flows of 2-simplices) is considered. We then propose a novel design method, called the “incremental design method”, which uses the framework of Sheaf Theory to compute a closed trajectory (i.e., a new molecule) from a given set of shorter closed trajectories (i.e., smaller known molecules). We believe this method is essential, especially when designing hybrids of known proteins.

In the past, mathematical studies of protein structure have been concerned mostly with the classification and characterization of their structure [2] - [7] . The author is unaware of any other mathematical studies on the design of protein-like molecules by other researchers. For an overview of protein-like molecules, see [8] . No prior knowledge of Protein Science, Sheaf Theory [9] , nor Category Theory [10] is required.

A quick review of Sheaf Theory is given: Let U be a subset of a 2D Euclidean space R2. Let A = { V 1 , , V n } be a covering of U, i.e., a set of subsets in R2 such that U = i = 1 n V i . Suppose that each subset V of R2 is associated with a set F(V) of mathematical data. Let σ be a function σ ( V i ) F ( V i ) defined “consistently” on A. In Sheaf Theory, we can compute a value of F(U) by patching together the values { σ ( V 1 ) , , σ ( V n ) } on A. For example, in the case of the sheaf of continuous functions on R2, F(V) is the set of continuous functions defined on an open set V in R2. We then obtain a “global” continuous function on an open set U by patching together “local” continuous functions σ ( V i ) on V i .

Figure 1 illustrates the design method proposed in this paper. In our case, F(V) is a set of closed trajectories on V. Figure 1(a) is an example of our design method. Given a subset U of R2 (left end) and a covering { V 1 , V 2 } of U (second from left). Suppose that closed trajectories ψ 1 F ( V 1 ) and ψ 2 F ( V 2 ) are given (third from left). We then obtain a closed trajectory ϕ F ( U ) (right end) by patching together the two closed trajectories ψ 1 F ( V 1 ) and ψ 2 F ( V 2 ) (enclosed closed trajectories are considered part of the enclosing closed trajectory).

Here’s where the problem comes up. In the case of sheaves, we can compute the global data on U by patching together the local data on “any” covering of U (if they are “consistent”). On the other hand, computation fails for some covering A in our case. (Note that σ ( V i ) can be the empty set because the restriction of an element of F(U) on V i may not be contained in F ( V i ) .) Figure 1(b) is an example of unsuccessful computation. Given a subset U of R2 (left end) and a covering { V 1 , V 2 } of U (second from left). Suppose that ψ 1 F ( V 1 ) and ψ 2 F ( V 2 ) are given (third from left), Then, patching together ψ 1 F ( V 1 ) and ψ 2 F ( V 2 ) , we obtain two closed trajectories. In Section 5, we consider sufficient conditions for “local” flows on a covering to produce a single closed trajectory.

This paper is organized as follows. Section 2 explains the loop model of protein-like molecules. Section 3 defines a differential geometric structure on a triangular mesh B. Section 4 formulates the protein design problem from the perspective of Sheaf Theory, where the design problem is rephrased into the “incremental design problem”. Section 5 studies the incremental design problem. Due to page limitations, we only consider the case where a covering consists of two smaller molecules. Finally, Section 6 presents discussion and future directions.

2. The Loop Model of Protein-Like Molecules

Shapes of molecules are given as a region on a hexagonal mesh H. Molecules then correspond to a closed trajectory on a triangular mesh B, which is a subdivision of H. New molecules are designed using a differential structure defined on B.

2.1. Regions on a Hexagonal Mesh H

Figure 2 is explained in this subsection. Shown in Figure 2(a) is the honeycomb mesh obtained by dividing a 2D Euclidean plane R2 into a set of regular hexagons. H denotes the set of all hexagons of the mesh. A subset S of H is called connected if each h a S shares a side with another h b S (i.e., for each h a S , there exists another h b S such that h a and h b share a side).

Shown in Figure 2(b) is a connected subset S = { h 1 , h 2 , , h 7 } of H. Since hexagons of H do not overlap each other, we write

Figure 1. The design method for protein-like molecules proposed in this paper.

Figure 2. The mathematical model of the shape of protein-like molecules.

S = h 1 h 2 h 7 . (1)

If S consists of only one hexagon h 1 , we write either S = h 1 or S = { h 1 } .

Shown in Figure 2(c) is an integral region on H, defined as follows. (Addition “+” of hexagons will be defined later in this subsection.)

Definition 2.1. (Integral Region) An integral region m 0 on H is a hole-free subset of R2 covered by a connected finite subset S = { h 1 , h 2 , , h n } of H. The hexagonal base of m 0 is then defined by

( m 0 ) H : = h 1 h 2 h n . (2)

For example, the hexagonal base of m 0 of Figure 2(c) is shown in Figure 2(b). I H denotes the set of all integral regions on H. Note that H I H , i.e., hexagons of H are integral regions.

The set difference between two integral regions m 1 , m 2 I H is defined by

m 1 \ m 2 : = { h H | h m 1 and h m 2 } , (3)

where { h 1 , h 2 , , h n } : = h 1 h 2 h n . The hexagonal base ( m 1 \ m 2 ) H of m 1 \ m 2 is defined in the same way as for integral regions (it may have holes).

Lemma 2.2. Let m 1 , m 2 I H . Then, m 1 \ m 2 I H if m 2 m 1 .

Shown in Figure 2(d) are region-intermediates on H, defined as follows. Let M 0 = { m 1 , m 2 , } I H . M 0 is called connected if each m i M 0 shares a side with another m j M 0 . M 0 is called disjoint if m i ’s do not overlap each other. If M 0 is disjoint, we write

M 0 = m 1 m 2 . (4)

If M 0 consists of only one integral region m 1 , we write either M 0 = m 1 or M 0 = { m 1 } .

Definition 2.3. (Region-Intermediate) A region-intermediate M 0 on H is a connected finite disjoint subset S = { m 1 , m 2 , , m n } of I H . Since S is disjoint, we write

M 0 = m 1 m 2 m n . (5)

The hexagonal base of M 0 is then defined by

( M 0 ) H : = ( m 1 ) H ( m 2 ) H ( m n ) H . (6)

RI denotes the set of all region-intermediate on H.

Finally, addition of integral regions is defined using addition of directed polygonal chains as shown below.

Definition 2.4. (Directed Polygonal Chain) Let P 1 , P 2 , , P n , P n + 1 be points in R2. A directed polygonal chain P 1 P 2 P n + 1 in R2 is a set of directed line segments defined by

P 1 P 2 P n + 1 : = { P 1 P 2 , P 2 P 3 , , P n P n + 1 } , (7)

where P i P j denotes the directed line segment from P i to P j . If P n + 1 = P 1 , we obtain a closed directed polygonal chain in R2. | P 1 P 2 P n P 1 | denotes the area of R2 bounded by P 1 P 2 P n P 1 in R2.

Let C 0 = { c 1 , c 2 , } be a set of closed directed polygonal chains in R2. C 0 is called disjoint if | c i | ’s do not overlap each other. If C 0 is disjoint, we write

C 0 = c 1 c 2 . (8)

If C 0 is disjoint, the area | C 0 | of R2 bounded by C 0 is defined by

| C 0 | : = | c 1 | | c 2 | . (9)

Let m 0 I H . Since m 0 has no hole, we have

m 0 = | P 1 P 2 P n P 1 | (10)

for some P 1 , P 2 , , P n R 2 , where the vertices are labeled counter-clockwise. In the case of Figure 2(c),

m 0 = | P 1 P 2 P 20 P 1 | . (11)

Definition 2.5. (The Boundary Operator on RI) Let m 0 = | P 1 P 2 P n P 1 | I H , where the vertices are labeled counter-clockwise. The boundary m 0 of m 0 is defined by

m 0 : = P 1 P 2 P n P 1 . (12)

In this paper, the boundary of an integral region is always given the counter- clockwise orientation. Let M 0 = m 1 m 2 m n R I . Since { m 1 , m 2 , , m n } is disjoint, the boundary M 0 of M 0 is defined by

M 0 : = m 1 m 2 m n (13)

(See Equation (8)) Note that | M 0 | = M 0 .

Addition of integral regions is defined as follows.

Definition 2.6. ( m 1 + m 2 ) Let m 1 = | P 1 P 2 P n P 1 | , m 2 = | Q 1 Q 2 Q n Q 1 | I H such that they do not overlap. Addition of m 1 and m 2 is defined by

m 1 + m 2 : = { P i P j m 1 | P j P i m 2 } { Q i Q j m 2 | Q j Q i m 1 } . (14)

In other words, the same line segments in opposite directions (i.e., P i P j and P j P i ) are cancelled when added. Addition of m 1 and m 2 is then defined by

m 1 + m 2 : = | m 1 + m 2 | . (15)

In the case of Figure 2(c), we have

m 0 = h 1 + h 2 + + h 7 . (16)

Let’s denote the set of all natural numbers { 1 , 2 , 3 , } by N .

Lemma 2.7.

The set of all integral regions on H is given by

I H = { h 1 + + h n | n N , { h i } H isconnectedandhole-free } . (17)

The set of all region-intermediate on H is given by

R I = { m 1 m n | n N , { m i } I H isconnected and disjoint } . (18)

Proof. They follow immediately from the definitions.∎

Remark 2.8. Integral domains play the role that “integers” do for rational numbers. That is, “rational” regions are obtained by dividing integral regions into loops of triangles [11] .

2.2. Loops on a Triangular Mesh B

Figure 3 is explained in this subsection. Shown in Figure 3(a) is the triangular mesh obtained by dividing every hexagon of H into 6 equilateral triangles. B denotes the set of all triangles of the mesh.

Definition 2.9. (Trajectories of Triangles) A trajectory of triangles on B is a sequence of triangles of B connected by a common edge. (No direction is assigned to a trajectory.) The edges not used to connect adjacent triangles are called the normal edges (of the trajectory) at the triangle (i.e., the “normal vector” of the trajectory). In figures, normal edges are indicated by thick line segments.

Definition 2.10. (Loops of Triangles) A loop on B is a closed trajectory of triangles on B. In this paper, protein-like molecules are represented as a loop of triangles of B. | l p | denotes the area in R2 swept by a loop l p , where the area enclosed by l p is also included in | l p | (for example, ϕ of Figure 1(a)).

A loop l p 0 of length 6 is called a hexagonal loop. A hexagonal loop l p 0 is denoted by h 0 if | l p 0 | = h 0 H . In other words, h 0 H denotes both a loop of length 6 and a hexagon of H, i.e., | h 0 | = h 0 .

Figure 3. The mathematical model of protein-like molecules.

Remark 2.11. The hexagonal base ( M 0 ) H of M 0 R I defined above is a region-intermediate consisting of hexagons as well as a loop-intermediate consisting of loops of length 6.

Shown in Figure 3(b) is a set L = { h 1 , h 2 , , h 7 } of seven hexagonal loops on H. Since h i ’s do not overlap each other, we write

L = h 1 h 2 h 7 . (19)

If L consists of only one hexagon h 1 , we write either L = h 1 or L = { h 1 } .

Shown in Figure 3(c) is an integral loop on B, defined as follows.

Definition 2.12. (Integral Loop) A loop l p 0 on B is called integral if | l p 0 | is an integral region on H. I B denotes the set of all integral loops, i.e.,

I B = { l p 0 | l p 0 isaloopon B suchthat | l p 0 | I H } . (20)

l p 0 I B is called an implementation of m 0 I H if | l p 0 | = m 0 . For example, l p 0 of Figure 3(c) is an implementation of m 0 of Figure 2(c).

Let L = { l p 1 , l p 2 , } I B . The set | L | of integral regions associated with L is defined by

| L | : = { | l p 1 | , | l p 2 | , } I H . (21)

If | L | is disjoint, we write

L = l p 1 l p 2 . (22)

If L consists of only one integral loop l p 1 , we write either L = l p 1 or L = { l p 1 } .

Shown in Figure 3(d) top is a loop-intermediate on B, defined as follows.

Definition 2.13. (Loop-Intermediate) A loop-intermediate on B is a finite subset L 0 = { l p 1 , l p 2 , , l p n } of I B such that | L 0 | is a region-intermediate on H. Since | L 0 | is disjoint, we write

L 0 = l p 1 l p 2 l p n . (23)

LI denotes the set of all loop-intermediate on B, i.e.,

L I : = { l p 1 l p n | n N , l p i I B ( i = 1 , 2 , , n ) , | l p 1 l p n | I H } . (24)

Let M 0 R I . L 0 L I is called an implementation of M 0 if | L 0 | = M 0 . Note that some region-intermediates have no implementation. For example, m 3 of Figure 2(d) has no implementation (Figure 3(d) bottom).

Now, fusion and fission of integral loops are defined using addition of the corresponding integral regions (Addition of integral loops is considered in Section 4 below).

Definition 2.14. (Fusion and Fission of Integral Loops) Let l p 0 I B . Let L 0 = l p 1 l p 2 l p n L I . l p 0 is called the fusion of L 0 if

| l p 0 | = | l p 1 | + | l p 2 | + + | l p n | . (25)

L 0 is then called a fission of l p 0 . In Figure 3, both h 1 h 2 h 7 and l p 1 h 4 p 2 are fissions of l p 0 .

Finally, let’s define flows of triangles on B.

Definition 2.15. (Flows of Triangles on B) A flow ψ of triangles on B is an assignment of normal edges to triangles of B, i.e.,

ψ ( t ) : = thesetofnormaledgesof t ( t B ) . (26)

ψ is called regular at t if ψ ( t ) consists of one edge (i.e., t is connected to exactly two adjacent triangles). ψ is called regular if ψ is regular at all triangles of B. F L W R denotes the set of all regular flows on B.

A triangle t of B is called singular if it is not regular. Singular triangles are called branch, terminal, or isolated triangles when they have no normal edges (i.e., connected to all the adjacent triangles), two normal edges (i.e., connected to only one adjacent triangle), or three normal edges (i.e., connected to no adjacent triangles), respectively.

Remark 2.16. We often consider trajectories of triangles without explicit reference to the corresponding flow ψ. A triangle t is then called regular if ψ is regular at t.

Definition 2.17. (Disjoint Unions L ( ψ ) and M ( ψ ) ) Let ψ F L W R . L ( ψ ) denotes the set of all the loops of ψ. Since trajectories of ψ do not overlap, we have

L ( ψ ) : = l p 1 l p 2 , (27)

where l p i ’s are the loops of ψ. L ( ψ ) is called the loops associated with ψ. M ( ψ ) denotes the associated regions, i.e.,

M ( ψ ) : = | l p 1 | | l p 2 | , (28)

M ( ψ ) is called the regions associated with ψ.

Lemma 2.18. Let ψ F L W R .

1) L ( ψ ) L I if L ( ψ ) is finite, connected, and hole-free.

2) M ( ψ ) R I if M ( ψ ) is finite and connected.

2.3. Design Problem for Protein-Like Molecules

In the loop model of protein-like molecules, the shape of a new molecule is an integral region m 0 . A new molecule of the shape m 0 then is an implementation l p 0 of m 0 . The problem we consider in this paper is now defined as follows.

Problem 2.19. (Design of Protein-like Molecules) Given M 0 = { m 0 } R I , find L 0 = { l p 0 } L I such that | l p 0 | = m 0 .

In the next section, the problem is rephrased using a differential geometric structure on B.

3. Differential Geometric Structure on B

A differential geometric structure on B is naturally obtained by embedding the honeycomb mesh H in a 3D Euclidean space R as shown in this section. We denote the set of all real numbers by R.

3.1. Embedding of H in R3

Shown in Figure 4(a) is a unit cube in R3 and its orthogonal projection on the plane H0 in R3 defined by

H 0 : = { ( x , y , z ) R 3 | x + y + z = 0 } . (29)

H is embedded in H0 using unit cubes in R3, as explained below.

Definition 3.1. (Unit Cubes in R3) Let ( a , b , c ) R 3 . [ a , b , c ] denotes the unit cube at ( a , b , c ) , i.e.,

[ a , b , c ] : = [ a , a + 1 ] × [ b , b + 1 ] × [ c , c + 1 ] R 3 , (30)

where [ x , y ] is the closed interval in R between x and y. If P = ( a , b , c ) R 3 , then [ a , b , c ] is also written as [ P ] . UC denotes the set of all unit cubes at the integer lattice Z 3 , i.e.,

U C : = { [ a , b , c ] | ( a , b , c ) Z 3 } . (31)

The height h t U C ( [ a , b , c ] ) of [ a , b , c ] U C is defined by

h t U C ( [ a , b , c ] ) : = a + b + c . (32)

Remark 3.2. [ a , b , c ] is given by

[ a , b , c ] = { ( a , b , c ) + ( u , 0 , 0 ) + ( 0 , v , 0 ) + ( 0 , 0 , w ) R 3 | ( u , v , w ) R 3 suchthat u + v + w = 1 and u , v , w 0 } . (33)

Shown in Figure 4(a) top is a unit cube [ a , b , c ] U C with vertices O = ( a , b , c ) , P = ( a + 1 , b , c ) , Q = ( a , b + 1 , c ) , R = ( a , b , c + 1 ) , U = ( a + 1 , b + 1 , c ) , V = ( a , b + 1 , c + 1 ) , W = ( a + 1 , b , c + 1 ) , and X = ( a + 1 , b + 1 , c + 1 ) . The vertical diagonals OU, OV, and OW are drawn as thick line segments.

Definition 3.3. (Projection π of R3 onto H0) π is the orthogonal projection of R3 onto H 0 defined by

π ( x , y , z ) : = ( ( 2 x y z ) / 3 , ( x + 2 y z ) / 3 , ( x y + 2 z ) / 3 ) . (34)

Shown in Figure 4(a) bottom is the orthogonal projection of OU, OV, and OW onto H 0 , forming part of a hexagonal mesh on H 0 .

Definition 3.4. (“Bumpy” Mesh H b u m p ) H b u m p is the “bumpy” honeycomb mesh defined on the top surfaces of

Figure 4. Differential geometric structure on the trianglular mesh B.

U C 0 : = { [ x , y , x ] U C | h t U C ( [ x , y , z ] ) = 0 } . (35)

The edges of the mesh are the vertical diagonals. H b u m p denotes the set of “bumpy” hexagons drawn on U C 0 (Figure 4(b) top).

Shown in Figure 4(b) bottom is the projection of H b u m p onto H 0 by π. In the following, we identify H with π ( H b u m p ) H 0 . An embedding of B in H 0 is then obtained by dividing every hexagon of π ( H b u m p ) into 6 equilateral triangles.

3.2. Tangent Cones

Shown in Figure 4(c) top-left is a tangent cone C 0 to a region-intermediate M 0 , defined as follows. Roughly speaking, a 3D cone with multiple tops is obtained by stacking unit cubes diagonally (from ( , , ) to ( , , ) ).

Definition 3.5. (Tangent Cone Cone A) Let A Z 3 . The tangent cone C o n e A generated by A is defined by

C o n e A : = { ( x , y , z ) R 3 | max ( a , b , c ) A { min { x a , y b , z c } } 0 } . (36)

TCONE denotes the set of all tangent cones, i.e.,

T C O N E : = { C o n e A | A Z 3 } . (37)

P (TCONE) denotes the set of all subsets of TCONE, i.e., the power set of TCONE.

Definition 3.6. (Tops and Bottoms of Cone A) Let C 0 T C O N E . The top vertices of C 0 are the peaks of the cone. t o p ( C 0 ) denotes the set of all top vertices of C 0 . Note that C 0 = C o n e t o p ( C 0 ) . The bottom vertices of C 0 are the dents of the cone which are peaks if we look up the cone from ( , , ) . b o t t o m ( C 0 ) denotes the set of all bottom vertices of C 0 .

We define flows of triangles on B using tangent cones in R3.

Definition 3.7. (The Flow ψ C 0 on B) Let C 0 T C O N E . Note that the surfaces of C 0 consist of the top faces of unit cubes of UC. Taking their vertical diagonals as normal edge, we obtain a regular flow of triangles on the surface of C 0 (Figure 4(c) top-right). Projecting the regular flow onto H 0 by π, we obtain a regular flow on B. The regular flow is called the flow on B induced by C 0 and denoted by ψ C 0 .

Definition 3.8. F L W T C O N E denotes the set of all regular flows on B induced by tangent cones, i.e.,

F L W T C O N E : = { ψ C | C T C O N E } F L W R . (38)

R I T C O N E and L I T C O N E denote the corresponding set of region-intermediates and loop-intermediates, respectively, i.e.,

R I T C O N E : = { M R I | C T C O N E suchthat M = M ( ψ C ) } . (39)

L I T C O N E : = { L L I | C T C O N E suchthat L = L ( ψ C ) } . (40)

The author has no proof of the following claim.

Claim 3.9. F L W T C O N E = F L W R .

In this paper, only flows of FLWTCONE are considered.

Remark 3.10. R I T C O N E R I . For example, m 3 of Figure 2(d) is not contained in R I T C O N E .

3.3. Tangent Cones to M0

Here we define a tangent “space” to M 0 R I .

Definition 3.11. (The Boundary Cone B C ( M 0 ) ) Let

M 0 = m 1 m 2 m n R I T C O N E , (41)

where

m i = | P i 1 P i 2 P i k i P i 1 | ( i = 1 , 2 , , n ) . (42)

The boundary cone B C ( M 0 ) to M 0 is defined by

B C ( M 0 ) : = C o n e { Q i 1 , Q i 2 , , Q i k i | i = 1 , 2 , , n } , (43)

where Q i j ’s are points on the top surfaces of U C 0 such that

π ( Q i j ) = P i j . (44)

Remark 3.12. Since π is a one-to-one mapping between the top surfaces of U C 0 and H 0 , the boundary cone B C ( M 0 ) exists for all M 0 R I T C O N E .

Definition 3.13. (The Set TM0 of Tangent Cones) Let M 0 R I T C O N E . The set T M 0 of tangent cones to M 0 is defined by

T M 0 : = { C T C O N E | t o p ( B C ( M 0 ) ) b o t t o m ( B C ( M 0 ) ) s u r ( C ) } , (45)

where s u r ( C ) is the surfaces of C, i.e.,

s u r ( C ) : = { ( x , y , z ) R 3 | max ( a , b , c ) t o p ( C ) { min { x a , y b , z c } } = 0 } . (46)

Note that B C ( M 0 ) T M 0 .

Remark 3.14. C T M 0 dose not imply t o p ( B C ( M 0 ) ) t o p ( C ) .

Lemma 3.15. (The Base Tangent Cone C b a s e ( M 0 ) ) Let M 0 R I T C O N E . There exists C b a s e ( M 0 ) T M 0 such that

L ( ψ C b a s e ( M 0 ) ) = ( M 0 ) H , (47)

h t U C ( [ a , b , c ] ) = 0 for ( a , b , c ) t o p ( C b a s e ( M 0 ) ) . (48)

C b a s e ( M 0 ) is called the base tangent cone associated with M 0 .

Proof. Since π gives a one-to-one mapping between the top surfaces of U C 0 and H 0 , the result follows immediately.∎

Definition 3.16. (Mapping T) Assigning T M 0 to each M 0 R I T C O N E , we obtain a mapping T from R I T C O N E to P (TCONE). Let S R I T C O N E . A section σ of T on S is a mapping from S to P (TCONE) such that M ( ψ σ ( M i ) ) = M i for all M i S . Γ T ( S ) denotes the set of all sections of T on S.

The design problem is now rephrased as follows.

Problem 3.17. (Design of Protein-like Molecules) Given M 0 = { m 0 } R I T C O N E , find C 0 T M 0 such that M ( ψ C 0 ) = M 0 .

4. Loop Design Problem from the Perspective of Sheaf Theory

To mimic Sheaf Theory, “subsets” of a region-intermediate M 0 are defined using a binary relation over R I T C O N E . A “covering” S = { M 1 , M 2 , , M n } of M 0 is then defined as a set of region-intermediates such that M 0 is the least upper bound of S with respect to the binary relation. An implementation of M 0 is obtained as the sum of implementations of M i ( i = 1 , 2 , , n ) , where addition is defined using transformations on T M 0 as shown below.

4.1. Binary Relation over RITCONE and LITCONE

Shown in Figure 5(a) is a binary relation over R I T C O N E , defined as follows.

Definition 4.1. (Binary Relation ≤ over R I T C O N E ) Let M a , M b R I T C O N E . Then, M a M b if and only if, for any m b M b ,

there exist m 1 , , m n M a such that m b = m 1 + + m n . (49)

In figures, we often use the arrow M a M b to indicate M a M b .

Shown in Figure 5(b) is the binary relation over L I T C O N E induced by the binary relation ≤ over R I T C O N E . That is,

Definition 4.2. (Binary Relation ≤ over L I T C O N E ) Let L a , L b L I T C O N E . Then, L a L b if and only if, for any l p b L b ,

there exists l p 1 , , l p n L a such that | l p b | = | l p 1 | + + | l p n | . (50)

In figures, we often use the arrow L a L b to indicate L a L b .

Remark 4.3. Notations such as ( R I T C O N E , ) and ( L I T C O N E , ) are used to explicitly indicate the binary relation equipped with a set.

Lemma 4.4. Let M 1 , M 2 R I T C O N E . Then,

T ( M 1 ) T ( M 2 ) if M 1 M 2 . (51)

That is, T is a “covariant” mapping from ( R I T C O N E , ) to ( P ( T C O N E ) , ) .

Shown in Figure 5(c) is examples of the greatest lower bound of loop-inter- mediates, defined as follows.

Definition 4.5. (⋀S and ⋁S) Let S R I T C O N E . The greatest lower bound S of S is the greatest element of R I T C O N E that is less than or equal to each element

Figure 5. Binary relation ≤ over R I T C O N E and L I T C O N E .

of S. The least upper bound S of S is the least element of RI that is greater than or equal to each element of S. ⋀S and ⋁S for S L I T C O N E are also defined similarly.

Remark 4.6. In general, there are multiple candidates for ⋀S and ⋁S. In such cases, select one of them arbitrarily. Because of this uncertainty, “ M 0 M i for all M i S ” does not imply M0 ≤ ⋀S.

Remark 4.7. H M 0 for any M 0 R I T C O N E , where denotes the empty set.

We use the following lemma to find “subsets” of a region-intermediate.

Lemma 4.8. Let M 0 R I T C O N E and C T C O N E . Then,

M ( ψ C ) M 0 If and only if C T M 0 . (52)

Proof. M ( ψ C ) M 0 if and only if M 0 M ( ψ C ) . M 0 M ( ψ C ) if and only if t o p ( B C ( M 0 ) ) b o t t o m ( B C ( M 0 ) ) s u r ( C ) . The result follows immediately. ∎

4.2. Coverings of a Region-Intermediate

Two types of coverings are defined as follows.

Definition 4.9. (Coverings of a Region-Intermediate) Let M 0 R I T C O N E . Let S R I T C O N E . S is called a covering of M0 if ⋁S = M0.

Definition 4.10. (Topological Coverings of an Integral Region) Let m 0 I H . Let V = { c 1 , c 2 , , c n } I H . V is called a topological covering of m 0 if 1) m 0 = i = 1 n c i , and 2) for each c i V , there exists another c j V such that c i c j .

Remark 4.11. Since some integral regions have no implementation (i.e., there exists c i I H such that c i | l p | for any l p I B ), topological coverings may have no sections on them.

Lemma 4.12. Let m 0 I H . Let { c 1 , c 2 , , c n } I H be a topological covering of . A covering of m 0 is then obtained by

{ m 0 c 1 , m 0 c 2 , , m 0 c n } . (53)

Example 4.13. In Figure 5(c), { A 1 , B 1 } is a topological covering of X. { A 1 X , B 1 X } is a covering of X.

The proposed design method uses a specific type of covering (in Problem 4.38).

Definition 4.14. (Hexagonal Covering SV of an Integral Region) Let V = { c 1 , c 2 , , c n } I H be a topological covering of m 0 I H . The hexagonal covering S V of m 0 associated with V is defined by

S V = { c 1 ( m 0 \ c 1 ) H , c 2 ( m 0 \ c 2 ) H , , c n ( m 0 \ c n ) H } . (54)

Lemma 4.15. S V is a covering of m 0 .

The design problem is now rephrased as follows.

Problem4.16. (Incremental Design of Protein-like Molecules) Given 1) a target shape m 0 : M 0 = { m 0 } R I T C O N E , 2) a topological covering V of m 0 : V = { c 1 , c 2 , , c n } , 3) a section σ of T on V: σ Γ T ( V ) . Then, compute C 0 T M 0 such that M ( ψ C 0 ) = M 0 by patching “local” loop-intermediates L ( ψ σ ( c 1 ) ) , L ( ψ σ ( c 2 ) ) , , and L ( ψ σ ( c n ) ) together.

4.3. Transformations on LITCONE Induced by UC

To patch loop-intermediates together, we define addition of loop-intermediates using transformations on TCONE, defined as follows.

Definition 4.17. (TRANS (TCONE)) A transformation on TCONE is a mapping from TCONE to TCONE. TRANS (TCONE) denotes the set of all transformations on .

Let A T R A N S ( T C O N E ) and C T C O N E . We use the symbol “ ” to denote the transformation of C by A, i.e., A C . A C is also called the action of A on C. Let A 1 , A 2 , , A n T R A N S ( T C O N E ) . We use the symbol “ ” to denote the composition of transformations, i.e.,

A 1 A 2 A n C : = A 1 ( A 2 ( ( A n C ) ) ) . (55)

Example 4.18. Unit cubes induce transformations on TCONE as follows. Let C T C O N E , where t o p ( C ) = { P 1 , P 2 , , P n } , i.e.,

C = C o n e { P 1 , P 2 , , P n } . (56)

Taking the unit cube [ a , b , c ] at P 1 = ( a , b , c ) from C, we obtain another tangent cone

C = C o n e { P 1 , P 2 , P 3 , P 2 , , P n } , (57)

where P 1 = ( a + 1 , b , c ) , P 2 = ( a , b + 1 , c ) , and P 3 = ( a , b , c + 1 ) .Conversely, putting the unit cube [ a , b , c ] on C , we obtain the original cone C.

Definition 4.19. (The minimal L-cone CL) Let L L I T C O N E and C T | L | . C is called a L-cone if L ( ψ C ) = L . The tangent cone C L is the minimal L-cone with respect to set inclusion, i.e., C L C for any L-cone C. Since L L I T C O N E , C L T | L | always exists and uniquely determined by L.

Transformations on TCONE induce transformations on L I T C O N E as follows.

Definition 4.20. (Transformations on L I T C O N E ) Let A T R A N S ( T C O N E ) and L L I T C O N E . The transformation of L by A is defined by

A L : = L ( ψ A C L ) . (58)

A L is called the action of A on L.

Definition 4.21. (Transformations P [ a , b , c ] , T [ a , b , c ] , and P T [ a , b , c ] ) Let [ a , b , c ] U C . Two transformations P [ a , b , c ] and T [ a , b , c ] on TCONE induced by [ a , b , c ] is defined by

P [ a , b , c ] C : = C C o n e { ( a , b , c ) } , (59)

T [ a , b , c ] C : = { [ x , y , z ] C | [ a , b , c ] C o n e { ( x , y , z ) } } , (60)

where C T C O N E . P [ a , b , c ] C is called the put & fill-action by [ a , b , c ] on C. T [ a , b , c ] C is called the take & clear-action by [ a , b , c ] on C. We denote the composition of P after T by PT, i.e.,

P T [ a , b , c ] : = P [ a , b , c ] T [ a , b , c ] ( [ a , b , c ] U C ) . (61)

P T [ a , b , c ] C is called the take & put-action by [ a , b , c ] on C.

Remark 4.22. After the action of P T [ a , b , c ] on C T C O N E , the cube [ a , b , c ] is always visible from ( , , ) . On the other hand, after the action of P [ a , b , c ] on C, [ a , b , c ] may not be visible from ( , , ) .

Lemma 4.23. Let [ a , b , c ] U C and C T C O N E . Then,

P [ a , b , c ] C = C if [ a , b , c ] C , (62)

T [ a , b , c ] C = C if [ a , b , c ] C . (63)

Definition 4.24. ( T R A N S T , P T ( T C O N E ) ) T R A N S T , P T ( T C O N E ) denotes the set of all the transformations on TCONE generated by finite compositions of T [ a , b , c ] and P T [ a , b , c ] ( [ a , b , c ] U C ), i.e.,

T R A N S T , P T ( T C O N E ) : = { A 1 A n | n Z , A i = P u or P T u forsome u U C } . (64)

In general, G 1 G 2 C G 2 G 1 C for G 1 , G 2 T R A N S T , P T ( T C O N E ) and C T C O N E

Example 4.25. Let u 1 = [ a , b , c ] , u 2 = [ a , b , c ] U C such that ( a , b , c ) C o n e { a , b , c } . Then,

P T u 1 P T u 2 C o n e { ( a , b , c ) } P T u 2 P T u 1 C o n e { ( a , b , c ) } , (65)

T u 1 P T u 2 C o n e { ( a , b , c ) } P T u 2 T u 1 C o n e { ( a , b , c ) } . (66)

Definition 4.26 (Well-defined Transformations) Let G = A 1 A 2 A n T R A N S T , P T ( T C O N E ) . G is called well-defined if the action of G on TCONE does not depend on the order of A i ’s, i.e.,

G C = A ρ ( 1 ) A ρ ( 2 ) A ρ ( n ) C for all C T C O N E . (67)

for any permutation ρ of { 1 , 2 , , n } .

Remark 4.27. If G is well-defined, removed unit cubes are removed forever and placed unit cubes are placed forever.

4.4. Addition on LITCONE

Addition of loop-intermediates is now defined using transformations on TCONE.

Definition 4.28. {Transformations on TM0} Let M 0 R I . The set T R A N S ( T M 0 ) of transformations on T M 0 is defined by

T R A N S ( T M 0 ) : = { G T R A N S T , P T ( T C O N E ) | G C T M 0 forall C T M 0 } . (68)

Lemma 4.29. Let M 0 R I T C O N E and G 1 , G 2 T R A N S ( T M 0 ) . Then,

G 1 G 2 T R A N S ( T M 0 ) . (69)

Lemma 4.30. Let L 0 , L L I T C O N E and G T R A N S ( T | L 0 | ) . Then,

If L L 0 , then G L L 0 . (70)

Definition 4.31. (The Relative Transformation G H ( L ) ) Let L L I T C O N E . The relative transformation G H ( L ) of L with respect to H is defined by

G H ( L ) : = P T [ P 1 ] P T [ P n ] T [ Q 1 ( 1 , 1 , 1 ) ] T [ Q k ( 1 , 1 , 1 ) ] , (71)

where

{ P 1 , , P n } = { P t o p ( C L ) | h t U C ( [ P ] ) 0 } , (72)

{ Q 1 , , Q k } = { Q b o t t o m ( C L ) | h t U C ( [ Q ] ) 2 } . (73)

[ Q ( 1 , 1 , 1 ) ] is defined by [ Q ( 1 , 1 , 1 ) ] : = ( a 1 , b 1 , c 1 ) for Q = ( a , b , c ) .

Lemma 4.32. Let L L I T C O N E and M R I T C O N E such that | L | M . Then, G H ( L ) is well-defined and

L = G H ( L ) L H G H ( L ) M H . (74)

Remark 4.33. The hexagonal base M H is a loop-intermediate consisting of loops of length 6 as well as a region-intermediate consisting of hexagons.

Lemma 4.34. Let L L I T C O N E and M R I T C O N E such that | L | M . Then,

G H ( L ) T R A N S ( T M ) . (75)

Addition of loop-intermediates is now defined as follows.

Definition 4.35. (Addition of Loop-Intermediates) Let L 1 , L 2 , , L n L I T C O N E and M R I T C O N E such that | L 1 | , | L 2 | , , | L n | M . Then,

L 1 + L 2 + + L n : = G H ( L 1 ) G H ( L 2 ) G H ( L n ) M H . (76)

Remark 4.36. Addition L 1 + L 2 + + L n is defined with respect to M H , which is not explicitly indicated in the formula.

Definition 4.37. (Section σ S V on S V ) Let V = { c 1 , c 2 , , c n } I H be a topological covering of m 0 I H . Let σ Γ T ( V ) . The section σ S V of T on the hexagonal covering S V is defined by

σ S V ( c i ( m 0 \ c i ) H ) : = σ ( c i ) C b a s e ( ( m 0 \ c i ) H ) . (77)

Note that

L ( ψ σ S V ( c i ( m 0 \ c i ) H ) ) = L ( ψ σ ( c i ) ) ( m 0 \ c i ) H ( i = 1 , , n ) . (78)

Since c i ( m 0 \ c i ) H m 0 ( i = 1 , , n ) , we can define addition

i = 1 n L ( ψ σ S V ( c i ( m 0 \ c i ) H ) ) (79)

by Definition 4.35.

Using addition of loop-intermediates, the design problem is now rephrased as follows.

Problem 4.38. (Incremental Design of Protein-like Molecules) Given 1) a target shape m 0 : M 0 = { m 0 } R I T C O N E ; 2) a topological covering V of m 0 : V = { c 1 , c 2 , , c n } ; 3) a section σ of T on V: σ Γ T ( V ) . Then, we obtain L 0 L I T C O N E such that | L 0 | M 0 by

L 0 : = i = 1 n L ( ψ σ S V ( c i ( m 0 \ c i ) H ) ) = G H ( L 1 ) G H ( L 2 ) G H ( L n ) ( M 0 ) H , (80)

where L i : = L ( ψ σ S V ( c i ( m 0 \ c i ) H ) ) . The question here is “when dose L0 consist of a single loop?”

5. Incremental Design of Protein-Like Molecules (N = 2)

In general, the sum of loops is not a loop. In this section, we consider sufficient conditions for the L0 of Problem 4.38 to be a loop. Due to page limitations, we only consider a topological covering consisting of two integral regions. The incremental design problem is then given as follows

Problem 5.1. (Incremental Design of Protein-like Molecules (n = 2)) Given 1) a target shape m 0 : M 0 = { m 0 } R I T C O N E ; 2) a topological covering V of m 0 : V = { c 1 , c 2 } ; 3) a section σ of T on V: σ Γ T ( V ) . Then, we obtain L 0 L I T C O N E such that | L 0 | M 0 by

L 0 : = L ( ψ σ S V ( c 1 ( m 0 \ c 1 ) H ) ) + L ( ψ σ S V ( c 2 ( m 0 \ c 2 ) H ) ) = G H ( L 1 ) G H ( L 2 ) ( M 0 ) H , (81)

where L i : = L ( ψ σ S V ( c i ( m 0 \ c i ) H ) ) . Find sufficient conditions for L 0 to be a loop.

5.1. Closer Look at the Action of T R A N S T , P T ( T C O N E )

Figure 6 shows the effect of the action of T R A N S ( T M 0 ) on L ( ψ C b a s e ( M 0 ) ) using the height of normal edges, defined as follows.

Definition 5.2. (Height of Normal Edges) Let P 1 P 2 be a vertical diagonal of a top face of [ a , b , c ] U C . The height h t E ( P 1 P 2 ) of P 1 P 2 is defined by

h t E ( P 1 P 2 ) : = h t U C ( [ a , b , c ] ) . (82)

Let C T C O N E . Let Q 1 Q 2 be a normal edge of the induced flow ψ C such that π ( P 1 P 2 ) = Q 1 Q 2 , where P 1 P 2 is the corresponding vertical diagonal on the surfaces of C. The height h t E ( Q 1 Q 2 ) of Q 1 Q 2 (with respect to C) is defined by

Figure 6. The action of T R A N S ( T M 0 ) on L ( ψ C b a s e ( M 0 ) ) .

h t E ( Q 1 Q 2 ) : = h t E ( P 1 P 2 ) . (83)

Shown in Figure 6(a) top is the top view of the base tangent cone

C b a s e ( M 0 ) T M 0 (84)

of some M 0 R I . Shown in Figure 6(a) bottom is the loop-intermediate

L ( ψ C b a s e ( M 0 ) ) (85)

on B. By definition, the heights of all normal edges of ψ C b a s e ( M 0 ) are 0.

Shown in Figure 6(b) top is the tangent cone

P T [ P 1 ] P T [ P 2 ] T [ P 3 ] C b a s e ( M 0 ) . (86)

In the upper part, two Y-shaped sets of normal edges are replaced by two inverted Y-shaped sets of normal edges (thick line segments) by putting the two unit cubes [ P 1 ] and [ P 2 ] . In the lower part, a Y-shaped set of normal edges is replaced byan inverted Y-shaped set of normal edges (thick line segments) by taking the unit cube [ P 3 ] .

Shown in Figure 6(b) bottom is the loop-intermediate

P T [ P 1 ] P T [ P 2 ] T [ P 3 ] L ( ψ C b a s e ( M 0 ) ) . (87)

The light grey area indicates the projection image of [ P 1 ] and [ P 2 ] by π. The dark grey area indicates the projection image of the removed [ P 3 ] by π. Note that normal edges of different heights are not directly connected.

Lemma 5.3. If two normal edges n 1 and n 2 of a flow of triangles are connected, the difference of their heights is even, i.e.,

h t E ( n 1 ) h t E ( n 2 ) mod 2 . (88)

In Figure 6(c) top, the normal edges of height −1 are connected to the normal edges of height 1 by putting the unit cube [ P 4 ] of height −1 on the tangent cone of Figure 6(b). In Figure 6(c) bottom, the light grey area (height −1) and the dark grey area (height 1) are now in contact.

In Figure 6(d) top, the normal edges of height −2 are connected to the normal edges of height 0 by putting the unit cube [ P 5 ] of height −2 on the tangent cone of Figure 6(c). In Figure 6(d) bottom, the white area indicates the projection image of [ P 5 ] . Note that the grey area (height 0) and the white area (height −2) are in contact.

5.2. Computation of Loops from the Hexagonal Base

In the loop model, it is easier to design an integral loop from scratch than to design a “hybrid” of known integral loops, since the area enclosed by a loop l p is included in | l p | . (In protein science, it is a formidable task to design a novel artificial protein from scratch.)

Let M 0 = { m 0 } R I T C O N E . Disconnecting normal edges of L ( ψ C b a s e ( M 0 ) ) along the boundary, we obtain l p I B such that | l p 0 | = m 0 as explained below.

Shown in Figure 7(a) top is the top view of the base tangent cone

C b a s e ( M 0 ) T M 0 (89)

of some M 0 R I T C O N E . Shown in Figure 7(a) bottom is the loop-intermediate

L ( ψ C b a s e ( M 0 ) ) (90)

on B. By definition, the heights of all normal edges of ψ C b a s e ( M 0 ) are 0.

In Figure 7(b), normal edges of height 0 is disconnected by putting unit cubes of height −1 along the boundary on C b a s e ( M 0 ) . The light grey area in Figure 7(b) bottom indicates the projection image of the added unit cubes.

In Figure 7(c), normal edges of height 0 is disconnected by taking unit cubes of height 0 along the boundary from C b a s e ( M 0 ) . The dark grey area in Figure 7(c) bottom indicates the projection image of the removed unit cubes.

In Figure 7(d) left, a loop-intermediate consisting of three integral loops is obtained by putting 8 unit cubes of height −1 and taking a unit cube of height 0 along the boundary. Then, taking another unit cube of height 0 at the meeting point of the boundaries of the three loops, we obtain the integral loop shown in Figure 7(d) right.

5.3. Sufficient Conditions for L0 to be a Loop

Sufficient conditions for L0 of Problem 5.1 to be a loop are given using the two concepts defined below.

Definition 5.4. (The set N h t = 0 ( L ) of normal edges) Let L L I T C O N E . N h t = 0 ( L ) denotes the set of all the normal edges of height 0 contained in L.

Definition 5.5. (Rifts of a Loop-Intermediate) A crack of L L I T C O N E is a polygonal chain of normal edges of L connected to the boundary of L. A crack is called a rift if it consists of more than one normal edge.

Lemma 5.6. Let M 0 R I . Let L 1 , L 2 L I T C O N E such that | L 1 | , | L 2 | M 0 . Let σ Γ T ( { | L 1 | , | L 2 | } ) . If G H ( L 1 ) G H ( L 2 ) is well-defined, then

N h t = 0 ( L ( ψ σ ( | L 1 | ) ) + L ( ψ σ ( | L 2 | ) ) ) = N h t = 0 ( L ( ψ σ ( | L 1 | ) ) ) N h t = 0 ( L ( ψ σ ( | L 2 | ) ) ) . (91)

Figure 7. Computation of loops from the hexagonal base.

In general, the set N h t = 0 ( i L ( ψ σ ( | L i | ) ) ) shrinks monotonically as more loop- intermediates are added.

Proof. Because of Remark 4.27, removed normal edges of height 0 are removed forever. ∎

Proposition 5.7. (Sufficient Conditions to be a Loop) Settings are the same as for Problem 5.1. Let l p i = L ( ψ σ ( c i ) ) I B ( i = 1 , 2 ) , i.e.,

L ( ψ σ S V ( c i ( m 0 \ c i ) H ) ) = l p i ( m 0 \ c i ) H ( i = 1 , 2 ) . (92)

Then, L 0 consists of a single loop if G H ( L 1 ) G H ( L 2 ) is well-defined and one of the following three conditions are satisfied:

1) No cracks of l p 1 and l p 2 connect to M 0 ( c 1 c 2 ) . There is at most only one rift of l p 1 that penetrates into c 1 c 2 through c 1 \ c 2 . No rift of l p 2 penetrates into c 1 c 2 through c 2 \ c 1 (Figure 8(a) and Figure 8(b)).

2) No cracks of l p 1 and l p 2 connect to M 0 ( c 1 c 2 ) . There is at most only one rift of l p 2 that penetrates into c 1 c 2 through c 2 \ c 1 . No rift of l p 1 penetrates into c 1 c 2 through c 1 \ c 2 .

3) Both l p 1 and l p 2 have a crack connected to M 0 ( c 1 c 2 ) . No rift of l p 1 penetrates into c 1 c 2 through c 1 \ c 2 . No rift of l p 2 penetrates into c 1 c 2 through c 2 \ c 1 (Figure 8(c)).

Proof. Since G H ( L 1 ) G H ( L 2 ) is well-defined, normal edges of height 0 are not added as a result of addition by Lemma 5.6. That is, cracks are extended only by normal edges of height n, where n N such that n 0 and n is even. “No cracks of l p 1 and l p 2 connect to M 0 ( c 1 c 2 ) ” implies M 0 is not separated by a polygonal chain contained in c 1 c 2 .

Since l p 1 and l p 2 bring no normal edges of height n ( n 0 ) into c 2 \ c 1 and c 1 \ c 2 , respectively, the result follows. ∎

Shown in Figure 9 are examples where the sufficient conditions are not satisfied.

In Figure 9(a), both l p 1 and l p 2 have a rift that penetrates into c 1 c 2 through c 1 \ c 2 and c 2 \ c 1 , respectively. In Figure 9(b), l p 1 has two rifts that penetrate into c 1 c 2 through c 1 \ c 2 . In Figure 9(c), both l p 1 and l p 2 have a crack connected to M 0 ( c 1 c 2 ) , and l p 1 has a rift that penetrates into c 1 c 2 through c 1 \ c 2 .

Figure 10(a) is the case given in Figure 1(a). Shown in Figure 10(a) bottom are all normal edges of height 0, where both l p 1 and l p 2 have a rift consisting

Figure 8. Sufficient conditions for L0 of Problem 5.1 to be a loop.

Figure 9. Examples where the sufficient conditions are not satisfied.

Figure 10. Incremental design of protein-like molecules (Examples given in Figure 1).

normal edges of height 0. All other normal edges are height 1. As a result of addition, some of the normal edges of height 0 are removed and we obtain the loop l p 0 .

Figure 10(b) is the case given in Figure 1(b). Shown in Figure 10(b) bottom are all normal edges of height 0 and height −1 (thick line segments). All other normal edges are height 1. In this case, G ( L 1 , M 0 ) G ( L 2 , M 0 ) is not well- defined and we cannot use Lemma 5.6. As a result of addition, the rift of l p 2 is extended by normal edges of height 0 and we obtain two loops l p 3 and l p 4 .

6. Discussion

A novel design method for protein-like molecules is proposed from the perspective of Sheaf Theory. In this method, a new molecule of a given shape is obtained as the sum of smaller molecules. Since the sum of loops is not a loop in general, sufficient conditions for a sum to be a loop are also considered. We believe this method is essential, especially when designing hybrids of known proteins.

Previous mathematical studies of protein structure have focused primarily on characterization and classification of structures, and the author is aware of no other mathematical research on protein design. As such, there is much room for improvement in this study, which is still in its infancy. The author hopes that this paper will inspire more mathematicians to become interested in the mathematical research on protein design.

As directions for future research, there are two directions. One is the study of three-dimensional case, in which protein-like molecules are represented as a loop of tetrahedra [12] . The other is the study of loops on various hexagonal meshes other than the “flat” mesh H considered in this paper [13] . Examples include hexagonal meshes on the surface of 3D molecules (i.e., loops of tetrahedra). Note that a 2D triangular flow is induced on the surface of a complex of loops of tetrahedra.

In the three-dimensional case, two difficulties arise. First, the shape of a molecule is given on a mesh of dodecahedron, where a dodecahedron can be divided into four loops of tetrahedra (A hexagon cannot be divided into more than one loop of triangles). Second, the height of normal edges of tetrahedra is classified into three congruence classes of modulo 3, not two congruence classes of modulo 2.

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper.

References

[1] Berman, H.M., Westbrook, J., Feng, G., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N. and Bourne, P.E. (2000) The Protein Data Bank. Nucleic Acids Research, 28, 235-242.
https://doi.org/10.1093/nar/28.1.235
[2] Taylor, W.R., May, A.C.W., Brown, N.P. and Aszódi, A. (2001) Protein Structure: Geometry, Topology and Classification. Reports on Progress in Physics, 64, Article No. 517.
https://doi.org/10.1088/0034-4885/64/4/203
[3] Albou, L.-P., Schwarz, B., Poch, O., Wurtz, J.M. and Moras, D. (2009) Defining and Characterizing Protein Surface Using Alpha Shapes. Proteins, 76, 1-12.
https://doi.org/10.1002/prot.22301
[4] Gromov, M. (2011) Crystals, Proteins, Stability and Isoperimetry. Bulletin of the American Mathematical Society, 48, 229-257.
https://doi.org/10.1090/S0273-0979-2010-01319-7
[5] Xia, K. and Wei, G.-W. (2014) Persistent Homology Analysis of Protein Structure, Flexibility and Folding. International Journal for Numerical Methods in Biomedical Engineering, 30, 814-844.
https://doi.org/10.1002/cnm.2655
[6] Penner, R.C. (2016) Moduli Spaces and Macromolecules. Bulletin of the American Mathematical Society, 53, 217-268.
https://doi.org/10.1090/bull/1524
[7] Zhao, R., Wang, M., Chen, J., Tong, Y. and Wei, G.-W. (2021) The de Rham-Hodge Analysis and Modeling of Biomolecules. Bulletin of Mathematical Biology, 82, Article No. 108.
https://doi.org/10.1007/s11538-020-00783-2
[8] Morikawa, N. (2019) Design of Self-Assembling Molecules and Boundary Value Problem for Flows on a Space of n-Simplices. Applied Mathematics, 10, 907-946.
https://doi.org/10.4236/am.2019.1011065
[9] Hartshorne, R. (1977) Algebraic Geometry. Springer-Verlag, New York.
https://doi.org/10.1007/978-1-4757-3849-0
[10] Milewski, B. (2016) Category Theory for Programmers 1.1: Motivation and Philosophy.
http://youtube.com/watch?v=I8LbkfSSR58
[11] Morikawa, N. (2020) On the Defining Equations of Protein’s Shape from a Category Theoretical Point of View. Applied Mathematics, 11, 890-916.
https://doi.org/10.4236/am.2020.119058
[12] Morikawa, N. (2018) Global Geometrical Constraints on the Shape of Proteins and Their Influence on Allosteric Regulation. Applied Mathematics, 9, 1116-1155.
https://doi.org/10.4236/am.2018.910076
[13] Morikawa, N. (2022) Discrete Exterior Calculus of Proteins and Their Cohomology. Open Journal of Discrete Mathematics, 12, 47-63.
https://doi.org/10.4236/ojdm.2022.123004

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.