A Novel Design Method for Protein-Like Molecules from the Perspective of Sheaf Theory

Naoto Morikawa

doi:10.4236/ojdm.2023.133007

Open Journal of Discrete Mathematics > Vol.13 No.3, July 2023

A Novel Design Method for Protein-Like Molecules from the Perspective of Sheaf Theory

Naoto Morikawa
Genocript, Zama, Japan.
DOI: 10.4236/ojdm.2023.133007 PDF HTML XML 67 Downloads 402 Views

Abstract

Proteins perform a variety of functions in living organisms and their functions are largely determined by their shape. In this paper, we propose a novel mathematical method for designing protein-like molecules of a given shape. In the mathematical model, molecules are represented as loops of n-simplices (2-simplices are triangles and 3-simplices are tetrahedra). We design a new molecule of a given shape by patching together a set of smaller molecules that cover the shape. The covering set of small molecules is defined using a binary relation between sets of molecules. A new molecule is then obtained as a sum of the smaller molecules, where addition of molecules is defined using transformations acting on a set of (n + 1)-dimensional cones. Due to page limitations, only the two-dimensional case (i.e., loops of triangles) is considered. No prior knowledge of Sheaf Theory, Category Theory, or Protein Science is required. The author hopes that this paper will encourage further collaboration between Mathematics and Protein Science.

Keywords

Discrete Differential Geometry, Protein Design, Sheaf Theory, Protein Structure

Share and Cite:

Morikawa, N. (2023) A Novel Design Method for Protein-Like Molecules from the Perspective of Sheaf Theory. Open Journal of Discrete Mathematics, 13, 63-85. doi: 10.4236/ojdm.2023.133007.

1. Introduction

Proteins are folded sequences of amino acids, which perform variety of functions in cells. They perform their functions by interacting with other proteins as well as small molecule ligands (in enzyme-substrate interactions).

In protein-protein interactions, proteins interact each other by forming temporary complexes of proteins called “reaction intermediates”. Stability of reaction intermediates then depends on shape complementarity at the protein- protein interfaces (i.e., contact area on surface).

In protein-ligand interactions, proteins bind to one or more small molecule ligands at pockets (or grooves) on their surfaces. Specificity and affinity of the interactions then depend on shape complementarity at the ligand-binding pockets.

In both cases, the functions of proteins are largely determined by their shape. Since structural data for thousands of protein-protein interfaces and ligand- binding pockets are available in the PDB database [1] , it is conceivable that artificial proteins could be created by combining these known structures. On the other hand, Mathematics has Sheaf Theory as a framework for patching local data together to obtain global data.

In this paper, we propose a novel design method for artificial protein-like molecules (i.e., folded sequences of basic units) with a given shape. In the method, a new molecule is obtained from a given set of known molecules using the framework of Sheaf Theory. The design of protein-like molecules is carried out in two steps:

1) Specify the shape of a new molecule.

2) Find a folded sequence of basic units that forms the specified shape.

Note that it is not trivial to combine proteins with known structures to form a new protein (i.e., a folded sequence of amino acids). For example, since a local surface structure is often formed by multiple amino acid fragments which are distant in the amino acid sequence, the local surface structure may be unfolded in the new molecule if the corresponding fragments are not arranged adequately in the new amino acid sequence (in other words, proteins are neither “rigid” like holomorphic functions nor “flexible” like continuous functions).

In this paper, protein-like molecules are represented as a closed trajectory in a flow of n-simplices. Due to page limitations, only the two-dimensional case (i.e., flows of 2-simplices) is considered. We then propose a novel design method, called the “incremental design method”, which uses the framework of Sheaf Theory to compute a closed trajectory (i.e., a new molecule) from a given set of shorter closed trajectories (i.e., smaller known molecules). We believe this method is essential, especially when designing hybrids of known proteins.

In the past, mathematical studies of protein structure have been concerned mostly with the classification and characterization of their structure [2] - [7] . The author is unaware of any other mathematical studies on the design of protein-like molecules by other researchers. For an overview of protein-like molecules, see [8] . No prior knowledge of Protein Science, Sheaf Theory [9] , nor Category Theory [10] is required.

A quick review of Sheaf Theory is given: Let U be a subset of a 2D Euclidean space R². Let $A = {V_{1}, \dots, V_{n}}$ be a covering of U, i.e., a set of subsets in R² such that $U = \cup_{i = 1}^{n} V_{i}$ . Suppose that each subset V of R² is associated with a set F(V) of mathematical data. Let σ be a function $σ (V_{i}) \in F (V_{i})$ defined “consistently” on A. In Sheaf Theory, we can compute a value of F(U) by patching together the values ${σ (V_{1}), \dots, σ (V_{n})}$ on A. For example, in the case of the sheaf of continuous functions on R², F(V) is the set of continuous functions defined on an open set V in R². We then obtain a “global” continuous function on an open set U by patching together “local” continuous functions $σ (V_{i})$ on $V_{i}$ .

Figure 1 illustrates the design method proposed in this paper. In our case, F(V) is a set of closed trajectories on V. Figure 1(a) is an example of our design method. Given a subset U of R² (left end) and a covering ${V_{1}, V_{2}}$ of U (second from left). Suppose that closed trajectories $ψ_{1} \in F (V_{1})$ and $ψ_{2} \in F (V_{2})$ are given (third from left). We then obtain a closed trajectory $ϕ \in F (U)$ (right end) by patching together the two closed trajectories $ψ_{1} \in F (V_{1})$ and $ψ_{2} \in F (V_{2})$ (enclosed closed trajectories are considered part of the enclosing closed trajectory).

Here’s where the problem comes up. In the case of sheaves, we can compute the global data on U by patching together the local data on “any” covering of U (if they are “consistent”). On the other hand, computation fails for some covering A in our case. (Note that $σ (V_{i})$ can be the empty set because the restriction of an element of F(U) on $V_{i}$ may not be contained in $F (V_{i})$ .) Figure 1(b) is an example of unsuccessful computation. Given a subset U of R² (left end) and a covering ${{V^{'}}_{1}, {V^{'}}_{2}}$ of U (second from left). Suppose that ${ψ^{'}}_{1} \in F (V_{1})$ and ${ψ^{'}}_{2} \in F (V_{2})$ are given (third from left), Then, patching together ${ψ^{'}}_{1} \in F (V_{1})$ and ${ψ^{'}}_{2} \in F (V_{2})$ , we obtain two closed trajectories. In Section 5, we consider sufficient conditions for “local” flows on a covering to produce a single closed trajectory.

This paper is organized as follows. Section 2 explains the loop model of protein-like molecules. Section 3 defines a differential geometric structure on a triangular mesh B. Section 4 formulates the protein design problem from the perspective of Sheaf Theory, where the design problem is rephrased into the “incremental design problem”. Section 5 studies the incremental design problem. Due to page limitations, we only consider the case where a covering consists of two smaller molecules. Finally, Section 6 presents discussion and future directions.

2. The Loop Model of Protein-Like Molecules

Shapes of molecules are given as a region on a hexagonal mesh H. Molecules then correspond to a closed trajectory on a triangular mesh B, which is a subdivision of H. New molecules are designed using a differential structure defined on B.

2.1. Regions on a Hexagonal Mesh H

Figure 2 is explained in this subsection. Shown in Figure 2(a) is the honeycomb mesh obtained by dividing a 2D Euclidean plane R² into a set of regular hexagons. H denotes the set of all hexagons of the mesh. A subset S of H is called connected if each $h_{a} \in S$ shares a side with another $h_{b} \in S$ (i.e., for each $h_{a} \in S$ , there exists another $h_{b} \in S$ such that $h_{a}$ and $h_{b}$ share a side).

Shown in Figure 2(b) is a connected subset $S = {h_{1}, h_{2}, \dots, h_{7}}$ of H. Since hexagons of H do not overlap each other, we write

Figure 1. The design method for protein-like molecules proposed in this paper.

Figure 2. The mathematical model of the shape of protein-like molecules.

$S = h_{1} \oplus h_{2} \oplus \dots \oplus h_{7}$ . (1)

If S consists of only one hexagon $h_{1}$ , we write either $S = h_{1}$ or $S = {h_{1}}$ .

Shown in Figure 2(c) is an integral region on H, defined as follows. (Addition “+” of hexagons will be defined later in this subsection.)

Definition 2.1. (Integral Region) An integral region $m_{0}$ on H is a hole-free subset of R² covered by a connected finite subset $S = {h_{1}, h_{2}, \dots, h_{n}}$ of H. The hexagonal base of $m_{0}$ is then defined by

${(m_{0})}_{H} : = h_{1} \oplus h_{2} \oplus \dots \oplus h_{n}$ . (2)

For example, the hexagonal base of $m_{0}$ of Figure 2(c) is shown in Figure 2(b). $I_{H}$ denotes the set of all integral regions on H. Note that $H \subset I_{H}$ , i.e., hexagons of H are integral regions.

The set difference between two integral regions $m_{1}, m_{2} \in I_{H}$ is defined by

$m_{1} \ m_{2} : = \cup {h \in H | h \subset m_{1} and h ⊄ m_{2}}$ , (3)

where $\cup {h_{1}, h_{2}, \dots, h_{n}} : = h_{1} \cup h_{2} \cup \dots \cup h_{n}$ . The hexagonal base ${(m_{1} \ m_{2})}_{H}$ of $m_{1} \ m_{2}$ is defined in the same way as for integral regions (it may have holes).

Lemma 2.2. Let $m_{1}, m_{2} \in I_{H}$ . Then, $m_{1} \ m_{2} \in I_{H}$ if $m_{2} ⊄ m_{1}$ .

Shown in Figure 2(d) are region-intermediates on H, defined as follows. Let $M_{0} = {m_{1}, m_{2}, \dots} \subset I_{H}$ . $M_{0}$ is called connected if each $m_{i} \in M_{0}$ shares a side with another $m_{j} \in M_{0}$ . $M_{0}$ is called disjoint if $m_{i}$ ’s do not overlap each other. If $M_{0}$ is disjoint, we write

$M_{0} = m_{1} \oplus m_{2} \oplus \dots$ . (4)

If $M_{0}$ consists of only one integral region $m_{1}$ , we write either $M_{0} = m_{1}$ or $M_{0} = {m_{1}}$ .

Definition 2.3. (Region-Intermediate) A region-intermediate $M_{0}$ on H is a connected finite disjoint subset $S = {m_{1}, m_{2}, \dots, m_{n}}$ of $I_{H}$ . Since S is disjoint, we write

$M_{0} = m_{1} \oplus m_{2} \oplus \dots \oplus m_{n}$ . (5)

The hexagonal base of $M_{0}$ is then defined by

${(M_{0})}_{H} : = {(m_{1})}_{H} \oplus {(m_{2})}_{H} \oplus \dots \oplus {(m_{n})}_{H}$ . (6)

RI denotes the set of all region-intermediate on H.

Finally, addition of integral regions is defined using addition of directed polygonal chains as shown below.

Definition 2.4. (Directed Polygonal Chain) Let $P_{1}, P_{2}, \dots, P_{n}, P_{n + 1}$ be points in R². A directed polygonal chain $P_{1} P_{2} \dots P_{n + 1}$ in R² is a set of directed line segments defined by

$P_{1} P_{2} \dots P_{n + 1} : = {P_{1} P_{2}, P_{2} P_{3}, \dots, P_{n} P_{n + 1}}$ , (7)

where $P_{i} P_{j}$ denotes the directed line segment from $P_{i}$ to $P_{j}$ . If $P_{n + 1} = P_{1}$ , we obtain a closed directed polygonal chain in R². $| P_{1} P_{2} \dots P_{n} P_{1} |$ denotes the area of R² bounded by $P_{1} P_{2} \dots P_{n} P_{1}$ in R².

Let $C_{0} = {c_{1}, c_{2}, \dots}$ be a set of closed directed polygonal chains in R². $C_{0}$ is called disjoint if $| c_{i} |$ ’s do not overlap each other. If $C_{0}$ is disjoint, we write

$C_{0} = c_{1} \oplus c_{2} \oplus \dots$ . (8)

If $C_{0}$ is disjoint, the area $| C_{0} |$ of R² bounded by $C_{0}$ is defined by

$| C_{0} | : = | c_{1} | \oplus | c_{2} | \oplus \dots$ . (9)

Let $m_{0} \in I_{H}$ . Since $m_{0}$ has no hole, we have

$m_{0} = | P_{1} P_{2} \dots P_{n} P_{1} |$ (10)

for some $P_{1}, P_{2}, \dots, P_{n} \in R^{2}$ , where the vertices are labeled counter-clockwise. In the case of Figure 2(c),

$m_{0} = | P_{1} P_{2} \dots P_{20} P_{1} |$ . (11)

Definition 2.5. (The Boundary Operator ∂ on RI) Let $m_{0} = | P_{1} P_{2} \dots P_{n} P_{1} | \in I_{H}$ , where the vertices are labeled counter-clockwise. The boundary $\partial m_{0}$ of $m_{0}$ is defined by

$\partial m_{0} : = P_{1} P_{2} \dots P_{n} P_{1}$ . (12)

In this paper, the boundary of an integral region is always given the counter- clockwise orientation. Let $M_{0} = m_{1} \oplus m_{2} \oplus \dots \oplus m_{n} \in R I$ . Since ${\partial m_{1}, \partial m_{2}, \dots, \partial m_{n}}$ is disjoint, the boundary $\partial M_{0}$ of $M_{0}$ is defined by

$\partial M_{0} : = \partial m_{1} \oplus \partial m_{2} \oplus \dots \oplus \partial m_{n}$ (13)

(See Equation (8)) Note that $| \partial M_{0} | = M_{0}$ .

Addition of integral regions is defined as follows.

Definition 2.6. ( $m_{1} + m_{2}$ ) Let $m_{1} = | P_{1} P_{2} \dots P_{n} P_{1} |, m_{2} = | Q_{1} Q_{2} \dots Q_{n} Q_{1} | \in I_{H}$ such that they do not overlap. Addition of $\partial m_{1}$ and $\partial m_{2}$ is defined by

$\partial m_{1} + \partial m_{2} : = {P_{i} P_{j} \in \partial m_{1} | P_{j} P_{i} \notin \partial m_{2}} \cup {Q_{i} Q_{j} \in \partial m_{2} | Q_{j} Q_{i} \notin \partial m_{1}}$ . (14)

In other words, the same line segments in opposite directions (i.e., $P_{i} P_{j}$ and $P_{j} P_{i}$ ) are cancelled when added. Addition of $m_{1}$ and $m_{2}$ is then defined by

$m_{1} + m_{2} : = | \partial m_{1} + \partial m_{2} |$ . (15)

In the case of Figure 2(c), we have

$m_{0} = h_{1} + h_{2} + \dots + h_{7}$ . (16)

Let’s denote the set of all natural numbers ${1, 2, 3, \dots}$ by $N$ .

Lemma 2.7.

The set of all integral regions on H is given by

$I_{H} = {h_{1} + \dots + h_{n} | n \in N, {h_{i}} \subset H isconnectedandhole-free}$ . (17)

The set of all region-intermediate on H is given by

$R I = {m_{1} \oplus \dots \oplus m_{n} | n \in N, {m_{i}} \subset I_{H} isconnected and disjoint}$ . (18)

Proof. They follow immediately from the definitions.∎

Remark 2.8. Integral domains play the role that “integers” do for rational numbers. That is, “rational” regions are obtained by dividing integral regions into loops of triangles [11] .

2.2. Loops on a Triangular Mesh B

Figure 3 is explained in this subsection. Shown in Figure 3(a) is the triangular mesh obtained by dividing every hexagon of H into 6 equilateral triangles. B denotes the set of all triangles of the mesh.

Definition 2.9. (Trajectories of Triangles) A trajectory of triangles on B is a sequence of triangles of B connected by a common edge. (No direction is assigned to a trajectory.) The edges not used to connect adjacent triangles are called the normal edges (of the trajectory) at the triangle (i.e., the “normal vector” of the trajectory). In figures, normal edges are indicated by thick line segments.

Definition 2.10. (Loops of Triangles) A loop on B is a closed trajectory of triangles on B. In this paper, protein-like molecules are represented as a loop of triangles of B. $| l p |$ denotes the area in R² swept by a loop $l p$ , where the area enclosed by $l p$ is also included in $| l p |$ (for example, $ϕ$ of Figure 1(a)).

A loop $l p_{0}$ of length 6 is called a hexagonal loop. A hexagonal loop $l p_{0}$ is denoted by $h_{0}$ if $| l p_{0} | = h_{0} \in H$ . In other words, $h_{0} \in H$ denotes both a loop of length 6 and a hexagon of H, i.e., $| h_{0} | = h_{0}$ .

Figure 3. The mathematical model of protein-like molecules.

Remark 2.11. The hexagonal base ${(M_{0})}_{H}$ of $M_{0} \in R I$ defined above is a region-intermediate consisting of hexagons as well as a loop-intermediate consisting of loops of length 6.

Shown in Figure 3(b) is a set $L = {h_{1}, h_{2}, \dots, h_{7}}$ of seven hexagonal loops on H. Since $h_{i}$ ’s do not overlap each other, we write

$L = h_{1} \oplus h_{2} \oplus \dots \oplus h_{7}$ . (19)

If L consists of only one hexagon $h_{1}$ , we write either $L = h_{1}$ or $L = {h_{1}}$ .

Shown in Figure 3(c) is an integral loop on B, defined as follows.

Definition 2.12. (Integral Loop) A loop $l p_{0}$ on B is called integral if $| l p_{0} |$ is an integral region on H. $I_{B}$ denotes the set of all integral loops, i.e.,

$I_{B} = {l p_{0} | l p_{0} isaloopon B suchthat | l p_{0} | \in I_{H}}$ . (20)

$l p_{0} \in I_{B}$ is called an implementation of $m_{0} \in I_{H}$ if $| l p_{0} | = m_{0}$ . For example, $l p_{0}$ of Figure 3(c) is an implementation of $m_{0}$ of Figure 2(c).

Let $L = {l p_{1}, l p_{2}, \dots} \subset I_{B}$ . The set $| L |$ of integral regions associated with L is defined by

$| L | : = {| l p_{1} |, | l p_{2} |, \dots} \subset I_{H}$ . (21)

If $| L |$ is disjoint, we write

$L = l p_{1} \oplus l p_{2} \oplus \dots$ . (22)

If L consists of only one integral loop $l p_{1}$ , we write either $L = l p_{1}$ or $L = {l p_{1}}$ .

Shown in Figure 3(d) top is a loop-intermediate on B, defined as follows.

Definition 2.13. (Loop-Intermediate) A loop-intermediate on B is a finite subset $L_{0} = {l p_{1}, l p_{2}, \dots, l p_{n}}$ of $I_{B}$ such that $| L_{0} |$ is a region-intermediate on H. Since $| L_{0} |$ is disjoint, we write

$L_{0} = l p_{1} \oplus l p_{2} \oplus \dots \oplus l p_{n}$ . (23)

LI denotes the set of all loop-intermediate on B, i.e.,

$L I : = {l p_{1} \oplus \dots \oplus l p_{n} | n \in N, l p_{i} \in I_{B} (i = 1, 2, \dots, n), | l p_{1} \oplus \dots \oplus l p_{n} | \in I H}$ . (24)

Let $M_{0} \in R I$ . $L_{0} \in L I$ is called an implementation of $M_{0}$ if $| L_{0} | = M_{0}$ . Note that some region-intermediates have no implementation. For example, $m_{3}$ of Figure 2(d) has no implementation (Figure 3(d) bottom).

Now, fusion and fission of integral loops are defined using addition of the corresponding integral regions (Addition of integral loops is considered in Section 4 below).

Definition 2.14. (Fusion and Fission of Integral Loops) Let $l p_{0} \in I_{B}$ . Let $L_{0} = l p_{1} \oplus l p_{2} \oplus \dots \oplus l p_{n} \in L I$ . $l p_{0}$ is called the fusion of $L_{0}$ if

$| l p_{0} | = | l p_{1} | + | l p_{2} | + \dots + | l p_{n} |$ . (25)

$L_{0}$ is then called a fission of $l p_{0}$ . In Figure 3, both $h_{1} \oplus h_{2} \oplus \dots \oplus h_{7}$ and $l p_{1} \oplus h_{4} \oplus p_{2}$ are fissions of $l p_{0}$ .

Finally, let’s define flows of triangles on B.

Definition 2.15. (Flows of Triangles on B) A flow ψ of triangles on B is an assignment of normal edges to triangles of B, i.e.,

$ψ (t) : = thesetofnormaledgesof t (t \in B) .$ (26)

ψ is called regular at t if $ψ (t)$ consists of one edge (i.e., t is connected to exactly two adjacent triangles). ψ is called regular if ψ is regular at all triangles of B. $F L W_{R}$ denotes the set of all regular flows on B.

A triangle t of B is called singular if it is not regular. Singular triangles are called branch, terminal, or isolated triangles when they have no normal edges (i.e., connected to all the adjacent triangles), two normal edges (i.e., connected to only one adjacent triangle), or three normal edges (i.e., connected to no adjacent triangles), respectively.

Remark 2.16. We often consider trajectories of triangles without explicit reference to the corresponding flow ψ. A triangle t is then called regular if ψ is regular at t.

Definition 2.17. (Disjoint Unions $L (ψ)$ and $M (ψ)$ ) Let $ψ \in F L W_{R}$ . $L (ψ)$ denotes the set of all the loops of ψ. Since trajectories of ψ do not overlap, we have

$L (ψ) : = l p_{1} \oplus l p_{2} \oplus \dots$ , (27)

where $l p_{i}$ ’s are the loops of ψ. $L (ψ)$ is called the loops associated with ψ. $M (ψ)$ denotes the associated regions, i.e.,

$M (ψ) : = | l p_{1} | \oplus | l p_{2} | \oplus \dots$ , (28)

$M (ψ)$ is called the regions associated with ψ.

Lemma 2.18. Let $ψ \in F L W_{R}$ .

1) $L (ψ) \in L I$ if $L (ψ)$ is finite, connected, and hole-free.

2) $M (ψ) \in R I$ if $M (ψ)$ is finite and connected.

2.3. Design Problem for Protein-Like Molecules

In the loop model of protein-like molecules, the shape of a new molecule is an integral region $m_{0}$ . A new molecule of the shape $m_{0}$ then is an implementation $l p_{0}$ of $m_{0}$ . The problem we consider in this paper is now defined as follows.

Problem 2.19. (Design of Protein-like Molecules) Given $M_{0} = {m_{0}} \in R I$ , find $L_{0} = {l p_{0}} \in L I$ such that $| l p_{0} | = m_{0}$ .

In the next section, the problem is rephrased using a differential geometric structure on B.

3. Differential Geometric Structure on B

A differential geometric structure on B is naturally obtained by embedding the honeycomb mesh H in a 3D Euclidean space R as shown in this section. We denote the set of all real numbers by R.

3.1. Embedding of H in R³

Shown in Figure 4(a) is a unit cube in R³ and its orthogonal projection on the plane H₀ in R³ defined by

$H_{0} : = {(x, y, z) \in R^{3} | x + y + z = 0}$ . (29)

H is embedded in H₀ using unit cubes in R³, as explained below.

Definition 3.1. (Unit Cubes in R³) Let $(a, b, c) \in R^{3}$ . $[a, b, c]$ denotes the unit cube at $(a, b, c)$ , i.e.,

$[a, b, c] : = [a, a + 1] \times [b, b + 1] \times [c, c + 1] \subset R^{3}$ , (30)

where $[x, y]$ is the closed interval in R between x and y. If $P = (a, b, c) \in R^{3}$ , then $[a, b, c]$ is also written as $[P]$ . UC denotes the set of all unit cubes at the integer lattice $Z^{3}$ , i.e.,

$U C : = {[a, b, c] | (a, b, c) \in Z^{3}} .$ (31)

The height $h t_{U C} ([a, b, c])$ of $[a, b, c] \in U C$ is defined by

$h t_{U C} ([a, b, c]) : = a + b + c$ . (32)

Remark 3.2. $[a, b, c]$ is given by

$\begin{array}{l} [a, b, c] = {(a, b, c) + (u, 0, 0) + (0, v, 0) + (0, 0, w) \in R^{3} | (u, v, w) \in R^{3} \\ suchthat u + v + w = 1 and u, v, w \geq 0} . \end{array}$ (33)

Shown in Figure 4(a) top is a unit cube $[a, b, c] \in U C$ with vertices $O = (a, b, c)$ , $P = (a + 1, b, c)$ , $Q = (a, b + 1, c)$ , $R = (a, b, c + 1)$ , $U = (a + 1, b + 1, c)$ , $V = (a, b + 1, c + 1)$ , $W = (a + 1, b, c + 1)$ , and $X = (a + 1, b + 1, c + 1)$ . The vertical diagonals OU, OV, and OW are drawn as thick line segments.

Definition 3.3. (Projection π of R³ onto H₀) π is the orthogonal projection of R³ onto $H_{0}$ defined by

$π (x, y, z) : = ((2 x - y - z) / 3, (- x + 2 y - z) / 3, (- x - y + 2 z) / 3)$ . (34)

Shown in Figure 4(a) bottom is the orthogonal projection of OU, OV, and OW onto $H_{0}$ , forming part of a hexagonal mesh on $H_{0}$ .

Definition 3.4. (“Bumpy” Mesh $H_{b u m p}$ ) $H_{b u m p}$ is the “bumpy” honeycomb mesh defined on the top surfaces of

Figure 4. Differential geometric structure on the trianglular mesh B.

$U C_{0} : = {[x, y, x] \in U C | h t_{U C} ([x, y, z]) = 0} .$ (35)

The edges of the mesh are the vertical diagonals. $H_{b u m p}$ denotes the set of “bumpy” hexagons drawn on $U C_{0}$ (Figure 4(b) top).

Shown in Figure 4(b) bottom is the projection of $H_{b u m p}$ onto $H_{0}$ by π. In the following, we identify H with $π (H_{b u m p}) \subset H_{0}$ . An embedding of B in $H_{0}$ is then obtained by dividing every hexagon of $π (H_{b u m p})$ into 6 equilateral triangles.

3.2. Tangent Cones

Shown in Figure 4(c) top-left is a tangent cone $C_{0}$ to a region-intermediate $M_{0}$ , defined as follows. Roughly speaking, a 3D cone with multiple tops is obtained by stacking unit cubes diagonally (from $(\infty, \infty, \infty)$ to $(- \infty, - \infty, - \infty)$ ).

Definition 3.5. (Tangent Cone Cone A) Let $A \subset Z^{3}$ . The tangent cone $C o n e A$ generated by A is defined by

$C o n e A : = {(x, y, z) \in R^{3} | \max_{(a, b, c) \in A} {\min {x - a, y - b, z - c}} \geq 0} .$ (36)

TCONE denotes the set of all tangent cones, i.e.,

$T C O N E : = {C o n e A | A \subset Z^{3}} .$ (37)

P (TCONE) denotes the set of all subsets of TCONE, i.e., the power set of TCONE.

Definition 3.6. (Tops and Bottoms of Cone A) Let $C_{0} \in T C O N E$ . The top vertices of $C_{0}$ are the peaks of the cone. $t o p (C_{0})$ denotes the set of all top vertices of $C_{0}$ . Note that $C_{0} = C o n e t o p (C_{0})$ . The bottom vertices of $C_{0}$ are the dents of the cone which are peaks if we look up the cone from $(\infty, \infty, \infty)$ . $b o t t o m (C_{0})$ denotes the set of all bottom vertices of $C_{0}$ .

We define flows of triangles on B using tangent cones in R³.

Definition 3.7. (The Flow $ψ_{C_{0}}$ on B) Let $C_{0} \in T C O N E$ . Note that the surfaces of $C_{0}$ consist of the top faces of unit cubes of UC. Taking their vertical diagonals as normal edge, we obtain a regular flow of triangles on the surface of $C_{0}$ (Figure 4(c) top-right). Projecting the regular flow onto $H_{0}$ by π, we obtain a regular flow on B. The regular flow is called the flow on B induced by $C_{0}$ and denoted by $ψ_{C_{0}}$ .

Definition 3.8. $F L W_{T C O N E}$ denotes the set of all regular flows on B induced by tangent cones, i.e.,

$F L W_{T C O N E} : = {ψ_{C} | C \in T C O N E} \subset F L W_{R}$ . (38)

$R I_{T C O N E}$ and $L I_{T C O N E}$ denote the corresponding set of region-intermediates and loop-intermediates, respectively, i.e.,

$R I_{T C O N E} : = {M \in R I | \exists C \in T C O N E suchthat M = M (ψ_{C})}$ . (39)

$L I_{T C O N E} : = {L \in L I | \exists C \in T C O N E suchthat L = L (ψ_{C})}$ . (40)

The author has no proof of the following claim.

Claim 3.9. $F L W_{T C O N E} = F L W_{R}$ .

In this paper, only flows of FLW_TCONE are considered.

Remark 3.10. $R I_{T C O N E} \neq R I$ . For example, $m_{3}$ of Figure 2(d) is not contained in $R I_{T C O N E}$ .

3.3. Tangent Cones to M₀

Here we define a tangent “space” to $M_{0} \in R I$ .

Definition 3.11. (The Boundary Cone $B C (\partial M_{0})$ ) Let

$M_{0} = m_{1} \oplus m_{2} \oplus \dots \oplus m_{n} \in R I_{T C O N E}$ , (41)

where

$m_{i} = | P_{i 1} P_{i 2} \dots P_{i k_{i}} P_{i 1} | (i = 1, 2, \dots, n)$ . (42)

The boundary cone $B C (\partial M_{0})$ to $\partial M_{0}$ is defined by

$B C (\partial M_{0}) : = C o n e {Q_{i 1}, Q_{i 2}, \dots, Q_{i k_{i}} | i = 1, 2, \dots, n}$ , (43)

where $Q_{i j}$ ’s are points on the top surfaces of $U C_{0}$ such that

$π (Q_{i j}) = P_{i j}$ . (44)

Remark 3.12. Since π is a one-to-one mapping between the top surfaces of $U C_{0}$ and $H_{0}$ , the boundary cone $B C (\partial M_{0})$ exists for all $M_{0} \in R I_{T C O N E}$ .

Definition 3.13. (The Set TM₀ of Tangent Cones) Let $M_{0} \in R I_{T C O N E}$ . The set $T M_{0}$ of tangent cones to $M_{0}$ is defined by

$T M_{0} : = {C \in T C O N E | t o p (B C (\partial M_{0})) \cup b o t t o m (B C (\partial M_{0})) \subset s u r (C)}$ , (45)

where $s u r (C)$ is the surfaces of C, i.e.,

$s u r (C) : = {(x, y, z) \in R^{3} | \max_{(a, b, c) \in t o p (C)} {\min {x - a, y - b, z - c}} = 0}$ . (46)

Note that $B C (\partial M_{0}) \in T M_{0}$ .

Remark 3.14. $C \in T M_{0}$ dose not imply $t o p (B C (\partial M_{0})) \subset t o p (C)$ .

Lemma 3.15. (The Base Tangent Cone $C_{b a s e} (M_{0})$ ) Let $M_{0} \in R I_{T C O N E}$ . There exists $C_{b a s e} (M_{0}) \in T M_{0}$ such that

$L (ψ_{C_{b a s e} (M_{0})}) = {(M_{0})}_{H}$ , (47)

$h t_{U C} ([a, b, c]) = 0$ for $(a, b, c) \in t o p (C_{b a s e} (M_{0}))$ . (48)

$C_{b a s e} (M_{0})$ is called the base tangent cone associated with $M_{0}$ .

Proof. Since π gives a one-to-one mapping between the top surfaces of $U C_{0}$ and $H_{0}$ , the result follows immediately.∎

Definition 3.16. (Mapping T) Assigning $T M_{0}$ to each $M_{0} \in R I_{T C O N E}$ , we obtain a mapping T from $R I_{T C O N E}$ to P (TCONE). Let $S \subset R I_{T C O N E}$ . A section σ of T on S is a mapping from S to P (TCONE) such that $M (ψ_{σ (M_{i})}) = M_{i}$ for all $M_{i} \in S$ . $Γ_{T} (S)$ denotes the set of all sections of T on S.

The design problem is now rephrased as follows.

Problem 3.17. (Design of Protein-like Molecules) Given $M_{0} = {m_{0}} \in R I_{T C O N E}$ , find $C_{0} \in T M_{0}$ such that $M (ψ_{C_{0}}) = M_{0}$ .

4. Loop Design Problem from the Perspective of Sheaf Theory

To mimic Sheaf Theory, “subsets” of a region-intermediate $M_{0}$ are defined using a binary relation over $R I_{T C O N E}$ . A “covering” $S = {M_{1}, M_{2}, \dots, M_{n}}$ of $M_{0}$ is then defined as a set of region-intermediates such that $M_{0}$ is the least upper bound of S with respect to the binary relation. An implementation of $M_{0}$ is obtained as the sum of implementations of $M_{i} (i = 1, 2, \dots, n)$ , where addition is defined using transformations on $T M_{0}$ as shown below.

4.1. Binary Relation over RI_TCONE and LI_TCONE

Shown in Figure 5(a) is a binary relation over $R I_{T C O N E}$ , defined as follows.

Definition 4.1. (Binary Relation ≤ over $R I_{T C O N E}$ ) Let $M_{a}, M_{b} \in R I_{T C O N E}$ . Then, $M_{a} \leq M_{b}$ if and only if, for any $m_{b} \in M_{b}$ ,

there exist $m_{1}, \dots, m_{n} \in M_{a}$ such that $m_{b} = m_{1} + \dots + m_{n}$ . (49)

In figures, we often use the arrow $M_{a} \to M_{b}$ to indicate $M_{a} \leq M_{b}$ .

Shown in Figure 5(b) is the binary relation over $L I_{T C O N E}$ induced by the binary relation ≤ over $R I_{T C O N E}$ . That is,

Definition 4.2. (Binary Relation ≤ over $L I_{T C O N E}$ ) Let $L_{a}, L_{b} \in L I_{T C O N E}$ . Then, $L_{a} \leq L_{b}$ if and only if, for any $l p_{b} \in L_{b}$ ,

there exists $l p_{1}, \dots, l p_{n} \in L_{a}$ such that $| l p_{b} | = | l p_{1} | + \dots + | l p_{n} |$ . (50)

In figures, we often use the arrow $L_{a} \to L_{b}$ to indicate $L_{a} \leq L_{b}$ .

Remark 4.3. Notations such as $(R I_{T C O N E}, \leq)$ and $(L I_{T C O N E}, \leq)$ are used to explicitly indicate the binary relation equipped with a set.

Lemma 4.4. Let $M_{1}, M_{2} \in R I_{T C O N E}$ . Then,

$T (M_{1}) \subset T (M_{2})$ if $M_{1} \leq M_{2}$ . (51)

That is, T is a “covariant” mapping from $(R I_{T C O N E}, \leq)$ to $(P (T C O N E), \subset)$ .

Shown in Figure 5(c) is examples of the greatest lower bound of loop-inter- mediates, defined as follows.

Definition 4.5. (⋀S and ⋁S) Let $S \subset R I_{T C O N E}$ . The greatest lower bound ⋀S of S is the greatest element of $R I_{T C O N E}$ that is less than or equal to each element

Figure 5. Binary relation ≤ over $R I_{T C O N E}$ and $L I_{T C O N E}$ .

of S. The least upper bound ⋁S of S is the least element of RI that is greater than or equal to each element of S. ⋀S and ⋁S for $S \subset L I_{T C O N E}$ are also defined similarly.

Remark 4.6. In general, there are multiple candidates for ⋀S and ⋁S. In such cases, select one of them arbitrarily. Because of this uncertainty, “ $M_{0} \leq M_{i}$ for all $M_{i} \in S$ ” does not imply M₀ ≤ ⋀S.

Remark 4.7. $H \leq M_{0} \leq \emptyset$ for any $M_{0} \in R I_{T C O N E}$ , where $\emptyset$ denotes the empty set.

We use the following lemma to find “subsets” of a region-intermediate.

Lemma 4.8. Let $M_{0} \in R I_{T C O N E}$ and $C \in T C O N E$ . Then,

$M (ψ_{C}) \leq M_{0}$ If and only if $C \in T M_{0}$ . (52)

Proof. $M (ψ_{C}) \leq M_{0}$ if and only if $\partial M_{0} \subset \partial M (ψ_{C})$ . $\partial M_{0} \subset \partial M (ψ_{C})$ if and only if $t o p (B C (\partial M_{0})) \cup b o t t o m (B C (\partial M_{0})) \subset s u r (C)$ . The result follows immediately. ∎

4.2. Coverings of a Region-Intermediate

Two types of coverings are defined as follows.

Definition 4.9. (Coverings of a Region-Intermediate) Let $M_{0} \in R I_{T C O N E}$ . Let $S \subset R I_{T C O N E}$ . S is called a covering of M₀ if ⋁S = M₀.

Definition 4.10. (Topological Coverings of an Integral Region) Let $m_{0} \in I_{H}$ . Let $V = {c_{1}, c_{2}, \dots, c_{n}} \subset I_{H}$ . V is called a topological covering of $m_{0}$ if 1) $m_{0} = \cup_{i = 1}^{n} c_{i}$ , and 2) for each $c_{i} \in V$ , there exists another $c_{j} \in V$ such that $c_{i} \cap c_{j} \neq \emptyset$ .

Remark 4.11. Since some integral regions have no implementation (i.e., there exists $c_{i} \in I_{H}$ such that $c_{i} \neq | l p |$ for any $l p \in I_{B}$ ), topological coverings may have no sections on them.

Lemma 4.12. Let $m_{0} \in I_{H}$ . Let ${c_{1}, c_{2}, \dots, c_{n}} \subset I_{H}$ be a topological covering of . A covering of $m_{0}$ is then obtained by

${m_{0} \land c_{1}, m_{0} \land c_{2}, \dots, m_{0} \land c_{n}}$ . (53)

Example 4.13. In Figure 5(c), ${A_{1}, B_{1}}$ is a topological covering of X. ${A_{1} \land X, B_{1} \land X}$ is a covering of X.

The proposed design method uses a specific type of covering (in Problem 4.38).

Definition 4.14. (Hexagonal Covering S_V of an Integral Region) Let $V = {c_{1}, c_{2}, \dots, c_{n}} \subset I_{H}$ be a topological covering of $m_{0} \in I_{H}$ . The hexagonal covering $S_{V}$ of $m_{0}$ associated with V is defined by

$S_{V} = {c_{1} \oplus {(m_{0} \ c_{1})}_{H}, c_{2} \oplus {(m_{0} \ c_{2})}_{H}, \dots, c_{n} \oplus {(m_{0} \ c_{n})}_{H}}$ . (54)

Lemma 4.15. $S_{V}$ is a covering of $m_{0}$ .

The design problem is now rephrased as follows.

Problem4.16. (Incremental Design of Protein-like Molecules) Given 1) a target shape $m_{0}$ : $M_{0} = {m_{0}} \in R I_{T C O N E}$ , 2) a topological covering V of $m_{0}$ : $V = {c_{1}, c_{2}, \dots, c_{n}}$ , 3) a section σ of T on V: $σ \in Γ_{T} (V)$ . Then, compute $C_{0} \in T M_{0}$ such that $M (ψ_{C_{0}}) = M_{0}$ by patching “local” loop-intermediates $L (ψ_{σ (c_{1})})$ , $L (ψ_{σ (c_{2})})$ , $\dots$ , and $L (ψ_{σ (c_{n})})$ together.

4.3. Transformations on LI_TCONE Induced by UC

To patch loop-intermediates together, we define addition of loop-intermediates using transformations on TCONE, defined as follows.

Definition 4.17. (TRANS (TCONE)) A transformation on TCONE is a mapping from TCONE to TCONE. TRANS (TCONE) denotes the set of all transformations on .

Let $A \in T R A N S (T C O N E)$ and $C \in T C O N E$ . We use the symbol “ $\cdot$ ” to denote the transformation of C by A, i.e., $A \cdot C$ . $A \cdot C$ is also called the action of A on C. Let $A_{1}, A_{2}, \dots, A_{n} \in T R A N S (T C O N E)$ . We use the symbol “ $\circ$ ” to denote the composition of transformations, i.e.,

$A_{1} \circ A_{2} \circ \dots \circ A_{n} \cdot C : = A_{1} \cdot (A_{2} \cdot (\dots \cdot (A_{n} \cdot C)) \dots)$ . (55)

Example 4.18. Unit cubes induce transformations on TCONE as follows. Let $C \in T C O N E$ , where $t o p (C) = {P_{1}, P_{2}, \dots, P_{n}}$ , i.e.,

$C = C o n e {P_{1}, P_{2}, \dots, P_{n}}$ . (56)

Taking the unit cube $[a, b, c]$ at $P_{1} = (a, b, c)$ from C, we obtain another tangent cone

$C^{'} = C o n e {{P^{'}}_{1}, {P^{'}}_{2}, {P^{'}}_{3}, P_{2}, \dots, P_{n}}$ , (57)

where ${P^{'}}_{1} = (a + 1, b, c)$ , ${P^{'}}_{2} = (a, b + 1, c)$ , and ${P^{'}}_{3} = (a, b, c + 1)$ .Conversely, putting the unit cube $[a, b, c]$ on $C^{'}$ , we obtain the original cone C.

Definition 4.19. (The minimal L-cone C_L) Let $L \in L I_{T C O N E}$ and $C \in T | L |$ . C is called a L-cone if $L (ψ_{C}) = L$ . The tangent cone $C_{L}$ is the minimal L-cone with respect to set inclusion, i.e., $C_{L} \subset C$ for any L-cone C. Since $L \in L I_{T C O N E}$ , $C_{L} \in T | L |$ always exists and uniquely determined by L.

Transformations on TCONE induce transformations on $L I_{T C O N E}$ as follows.

Definition 4.20. (Transformations on $L I_{T C O N E}$ ) Let $A \in T R A N S (T C O N E)$ and $L \in L I_{T C O N E}$ . The transformation of L by A is defined by

$A \cdot L : = L (ψ_{A \cdot C_{L}})$ . (58)

$A \cdot L$ is called the action of A on L.

Definition 4.21. (Transformations $P [a, b, c]$ , $T [a, b, c]$ , and $P T [a, b, c]$ ) Let $[a, b, c] \in U C$ . Two transformations $P [a, b, c]$ and $T [a, b, c]$ on TCONE induced by $[a, b, c]$ is defined by

$P [a, b, c] \cdot C : = C \cup C o n e {(a, b, c)}$ , (59)

$T [a, b, c] \cdot C : = {[x, y, z] \in C | [a, b, c] \notin C o n e {(x, y, z)}}$ , (60)

where $C \in T C O N E$ . $P [a, b, c] \cdot C$ is called the put & fill-action by $[a, b, c]$ on C. $T [a, b, c] \cdot C$ is called the take & clear-action by $[a, b, c]$ on C. We denote the composition of P after T by PT, i.e.,

$P T [a, b, c] : = P [a, b, c] \circ T [a, b, c] ([a, b, c] \in U C)$ . (61)

$P T [a, b, c] \cdot C$ is called the take & put-action by $[a, b, c]$ on C.

Remark 4.22. After the action of $P T [a, b, c]$ on $C \in T C O N E$ , the cube $[a, b, c]$ is always visible from $(- \infty, - \infty, - \infty)$ . On the other hand, after the action of $P [a, b, c]$ on C, $[a, b, c]$ may not be visible from $(- \infty, - \infty, - \infty)$ .

Lemma 4.23. Let $[a, b, c] \in U C$ and $C \in T C O N E$ . Then,

$P [a, b, c] \cdot C = C$ if $[a, b, c] \in C$ , (62)

$T [a, b, c] \cdot C = C$ if $[a, b, c] \notin C$ . (63)

Definition 4.24. ( $T R A N S_{〈 T, P T 〉} (T C O N E)$ ) $T R A N S_{〈 T, P T 〉} (T C O N E)$ denotes the set of all the transformations on TCONE generated by finite compositions of $T [a, b, c]$ and $P T [a, b, c]$ ( $[a, b, c] \in U C$ ), i.e.,

$T R A N S_{〈 T, P T 〉} (T C O N E) : = {A_{1} \circ \dots \circ A_{n} | n \in Z, A_{i} = P u or P T u forsome u \in U C} .$ (64)

In general, $G_{1} \circ G_{2} \cdot C \neq G_{2} \circ G_{1} \cdot C$ for $G_{1}, G_{2} \in T R A N S_{〈 T, P T 〉} (T C O N E)$ and $C \in T C O N E$

Example 4.25. Let $u_{1} = [a, b, c]$ , $u_{2} = [a^{'}, b^{'}, c^{'}] \in U C$ such that $(a, b, c) \in C o n e {a^{'}, b^{'}, c^{'}}$ . Then,

$P T u_{1} \circ P T u_{2} \cdot C o n e {(a^{'}, b^{'}, c^{'})} \neq P T u_{2} \circ P T u_{1} \cdot C o n e {(a^{'}, b^{'}, c^{'})}$ , (65)

$T u_{1} \circ P T u_{2} \cdot C o n e {(a^{'}, b^{'}, c^{'})} \neq P T u_{2} \circ T u_{1} \cdot C o n e {(a^{'}, b^{'}, c^{'})}$ . (66)

Definition 4.26 (Well-defined Transformations) Let $G = A_{1} \circ A_{2} \circ \dots \circ A_{n} \in T R A N S_{〈 T, P T 〉} (T C O N E)$ . G is called well-defined if the action of G on TCONE does not depend on the order of $A_{i}$ ’s, i.e.,

$G \cdot C = A_{ρ (1)} \circ A_{ρ (2)} \circ \dots \circ A_{ρ (n)} \cdot C$ for all $C \in T C O N E$ . (67)

for any permutation ρ of ${1, 2, \dots, n}$ .

Remark 4.27. If G is well-defined, removed unit cubes are removed forever and placed unit cubes are placed forever.

4.4. Addition on LI_TCONE

Addition of loop-intermediates is now defined using transformations on TCONE.

Definition 4.28. {Transformations on TM₀} Let $M_{0} \in R I$ . The set $T R A N S (T M_{0})$ of transformations on $T M_{0}$ is defined by

$T R A N S (T M_{0}) : = {G \in T R A N S_{〈 T, P T 〉} (T C O N E) | G \cdot C \in T M_{0} forall C \in T M_{0}}$ . (68)

Lemma 4.29. Let $M_{0} \in R I_{T C O N E}$ and $G_{1}, G_{2} \in T R A N S (T M_{0})$ . Then,

$G_{1} \circ G_{2} \in T R A N S (T M_{0})$ . (69)

Lemma 4.30. Let $L_{0}, L \in L I_{T C O N E}$ and $G \in T R A N S (T | L_{0} |)$ . Then,

If $L \leq L_{0}$ , then $G \cdot L \leq L_{0}$ . (70)

Definition 4.31. (The Relative Transformation $G_{H} (L)$ ) Let $L \in L I_{T C O N E}$ . The relative transformation $G_{H} (L)$ of L with respect to H is defined by

$G_{H} (L) : = P T [P_{1}] \circ \dots \circ P T [P_{n}] \circ T [Q_{1} - (1, 1, 1)] \circ \dots \circ T [Q_{k} - (1, 1, 1)]$ , (71)

where

${P_{1}, \dots, P_{n}} = {P \in t o p (C_{L}) | h t_{U C} ([P]) \neq 0}$ , (72)

${Q_{1}, \dots, Q_{k}} = {Q \in b o t t o m (C_{L}) | h t_{U C} ([Q]) \neq 2}$ . (73)

$[Q - (1, 1, 1)]$ is defined by $[Q - (1, 1, 1)] : = (a - 1, b - 1, c - 1)$ for $Q = (a, b, c)$ .

Lemma 4.32. Let $L \in L I_{T C O N E}$ and $M \in R I_{T C O N E}$ such that $| L | \leq M$ . Then, $G_{H} (L)$ is well-defined and

$L = G_{H} (L) \cdot L_{H} \leq G_{H} (L) \cdot M_{H}$ . (74)

Remark 4.33. The hexagonal base $M_{H}$ is a loop-intermediate consisting of loops of length 6 as well as a region-intermediate consisting of hexagons.

Lemma 4.34. Let $L \in L I_{T C O N E}$ and $M \in R I_{T C O N E}$ such that $| L | \leq M$ . Then,

$G_{H} (L) \in T R A N S (T M)$ . (75)

Addition of loop-intermediates is now defined as follows.

Definition 4.35. (Addition of Loop-Intermediates) Let $L_{1}, L_{2}, \dots, L_{n} \in L I_{T C O N E}$ and $M \in R I_{T C O N E}$ such that $| L_{1} |, | L_{2} |, \dots, | L_{n} | \leq M$ . Then,

$L_{1} + L_{2} + \dots + L_{n} : = G_{H} (L_{1}) \circ G_{H} (L_{2}) \circ \dots \circ G_{H} (L_{n}) \cdot M_{H}$ . (76)

Remark 4.36. Addition $L_{1} + L_{2} + \dots + L_{n}$ is defined with respect to $M_{H}$ , which is not explicitly indicated in the formula.

Definition 4.37. (Section $σ_{S_{V}}$ on $S_{V}$ ) Let $V = {c_{1}, c_{2}, \dots, c_{n}} \subset I_{H}$ be a topological covering of $m_{0} \in I_{H}$ . Let $σ \in Γ_{T} (V)$ . The section $σ_{S_{V}}$ of T on the hexagonal covering $S_{V}$ is defined by

$σ_{S_{V}} (c_{i} \oplus {(m_{0} \ c_{i})}_{H}) : = σ (c_{i}) \cup C_{b a s e} ({(m_{0} \ c_{i})}_{H})$ . (77)

Note that

$L (ψ_{σ_{S_{V}} (c_{i} \oplus {(m_{0} \ c_{i})}_{H})}) = L (ψ_{σ (c_{i})}) \oplus {(m_{0} \ c_{i})}_{H} (i = 1, \dots, n)$ . (78)

Since $c_{i} \oplus {(m_{0} \ c_{i})}_{H} \leq m_{0} (i = 1, \dots, n)$ , we can define addition

$\sum_{i = 1}^{n} L (ψ_{σ_{S_{V}} (c_{i} \oplus {(m_{0} \ c_{i})}_{H})})$ (79)

by Definition 4.35.

Using addition of loop-intermediates, the design problem is now rephrased as follows.

Problem 4.38. (Incremental Design of Protein-like Molecules) Given 1) a target shape $m_{0}$ : $M_{0} = {m_{0}} \in R I_{T C O N E}$ ; 2) a topological covering V of $m_{0}$ : $V = {c_{1}, c_{2}, \dots, c_{n}}$ ; 3) a section σ of T on V: $σ \in Γ_{T} (V)$ . Then, we obtain $L_{0} \in L I_{T C O N E}$ such that $| L_{0} | \leq M_{0}$ by

$\begin{array}{l} L_{0} : = \sum_{i = 1}^{n} L (ψ_{σ_{S_{V}} (c_{i} \oplus {(m_{0} \ c_{i})}_{H})}) \\ = G_{H} (L_{1}) \circ G_{H} (L_{2}) \circ \dots \circ G_{H} (L_{n}) \cdot {(M_{0})}_{H}, \end{array}$ (80)

where $L_{i} : = L (ψ_{σ_{S_{V}} (c_{i} \oplus {(m_{0} \ c_{i})}_{H})})$ . The question here is “when dose L₀ consist of a single loop?”

5. Incremental Design of Protein-Like Molecules (N = 2)

In general, the sum of loops is not a loop. In this section, we consider sufficient conditions for the L₀ of Problem 4.38 to be a loop. Due to page limitations, we only consider a topological covering consisting of two integral regions. The incremental design problem is then given as follows

Problem 5.1. (Incremental Design of Protein-like Molecules (n = 2)) Given 1) a target shape $m_{0}$ : $M_{0} = {m_{0}} \in R I_{T C O N E}$ ; 2) a topological covering V of $m_{0}$ : $V = {c_{1}, c_{2}}$ ; 3) a section σ of T on V: $σ \in Γ_{T} (V)$ . Then, we obtain $L_{0} \in L I_{T C O N E}$ such that $| L_{0} | \leq M_{0}$ by

$\begin{array}{l} L_{0} : = L (ψ_{σ_{S_{V}} (c_{1} \oplus {(m_{0} \ c_{1})}_{H})}) + L (ψ_{σ_{S_{V}} (c_{2} \oplus {(m_{0} \ c_{2})}_{H})}) \\ = G_{H} (L_{1}) \circ G_{H} (L_{2}) \cdot {(M_{0})}_{H}, \end{array}$ (81)

where $L_{i} : = L (ψ_{σ_{S_{V}} (c_{i} \oplus {(m_{0} \ c_{i})}_{H})})$ . Find sufficient conditions for $L_{0}$ to be a loop.

5.1. Closer Look at the Action of $T R A N S_{〈 T, P T 〉} (T C O N E)$

Figure 6 shows the effect of the action of $T R A N S (T M_{0})$ on $L (ψ_{C_{b a s e} (M_{0})})$ using the height of normal edges, defined as follows.

Definition 5.2. (Height of Normal Edges) Let $P_{1} P_{2}$ be a vertical diagonal of a top face of $[a, b, c] \in U C$ . The height $h t_{E} (P_{1} P_{2})$ of $P_{1} P_{2}$ is defined by

$h t_{E} (P_{1} P_{2}) : = h t_{U C} ([a, b, c])$ . (82)

Let $C \in T C O N E$ . Let $Q_{1} Q_{2}$ be a normal edge of the induced flow $ψ_{C}$ such that $π (P_{1} P_{2}) = Q_{1} Q_{2}$ , where $P_{1} P_{2}$ is the corresponding vertical diagonal on the surfaces of C. The height $h t_{E} (Q_{1} Q_{2})$ of $Q_{1} Q_{2}$ (with respect to C) is defined by

Figure 6. The action of $T R A N S (T M_{0})$ on $L (ψ_{C_{b a s e} (M_{0})})$ .

$h t_{E} (Q_{1} Q_{2}) : = h t_{E} (P_{1} P_{2})$ . (83)

Shown in Figure 6(a) top is the top view of the base tangent cone

$C_{b a s e} (M_{0}) \in T M_{0}$ (84)

of some $M_{0} \in R I$ . Shown in Figure 6(a) bottom is the loop-intermediate

$L (ψ_{C_{b a s e} (M_{0})})$ (85)

on B. By definition, the heights of all normal edges of $ψ_{C_{b a s e} (M_{0})}$ are 0.

Shown in Figure 6(b) top is the tangent cone

$P T [P_{1}] \circ P T [P_{2}] \circ T [P_{3}] \cdot C_{b a s e} (M_{0})$ . (86)

In the upper part, two Y-shaped sets of normal edges are replaced by two inverted Y-shaped sets of normal edges (thick line segments) by putting the two unit cubes $[P_{1}]$ and $[P_{2}]$ . In the lower part, a Y-shaped set of normal edges is replaced byan inverted Y-shaped set of normal edges (thick line segments) by taking the unit cube $[P_{3}]$ .

Shown in Figure 6(b) bottom is the loop-intermediate

$P T [P_{1}] \circ P T [P_{2}] \circ T [P_{3}] \cdot L (ψ_{C_{b a s e}} (M_{0}))$ . (87)

The light grey area indicates the projection image of $[P_{1}]$ and $[P_{2}]$ by π. The dark grey area indicates the projection image of the removed $[P_{3}]$ by π. Note that normal edges of different heights are not directly connected.

Lemma 5.3. If two normal edges $n_{1}$ and $n_{2}$ of a flow of triangles are connected, the difference of their heights is even, i.e.,

$h t_{E} (n_{1}) \equiv h t_{E} (n_{2}) \mod 2$ . (88)

In Figure 6(c) top, the normal edges of height −1 are connected to the normal edges of height 1 by putting the unit cube $[P_{4}]$ of height −1 on the tangent cone of Figure 6(b). In Figure 6(c) bottom, the light grey area (height −1) and the dark grey area (height 1) are now in contact.

In Figure 6(d) top, the normal edges of height −2 are connected to the normal edges of height 0 by putting the unit cube $[P_{5}]$ of height −2 on the tangent cone of Figure 6(c). In Figure 6(d) bottom, the white area indicates the projection image of $[P_{5}]$ . Note that the grey area (height 0) and the white area (height −2) are in contact.

5.2. Computation of Loops from the Hexagonal Base

In the loop model, it is easier to design an integral loop from scratch than to design a “hybrid” of known integral loops, since the area enclosed by a loop $l p$ is included in $| l p |$ . (In protein science, it is a formidable task to design a novel artificial protein from scratch.)

Let $M_{0} = {m_{0}} \in R I_{T C O N E}$ . Disconnecting normal edges of $L (ψ_{C_{b a s e} (M_{0})})$ along the boundary, we obtain $l p \in I_{B}$ such that $| l p_{0} | = m_{0}$ as explained below.

Shown in Figure 7(a) top is the top view of the base tangent cone

$C_{b a s e} (M_{0}) \in T M_{0}$ (89)

of some $M_{0} \in R I_{T C O N E}$ . Shown in Figure 7(a) bottom is the loop-intermediate

$L (ψ_{C_{b a s e} (M_{0})})$ (90)

on B. By definition, the heights of all normal edges of $ψ_{C_{b a s e} (M_{0})}$ are 0.

In Figure 7(b), normal edges of height 0 is disconnected by putting unit cubes of height −1 along the boundary on $C_{b a s e} (M_{0})$ . The light grey area in Figure 7(b) bottom indicates the projection image of the added unit cubes.

In Figure 7(c), normal edges of height 0 is disconnected by taking unit cubes of height 0 along the boundary from $C_{b a s e} (M_{0})$ . The dark grey area in Figure 7(c) bottom indicates the projection image of the removed unit cubes.

In Figure 7(d) left, a loop-intermediate consisting of three integral loops is obtained by putting 8 unit cubes of height −1 and taking a unit cube of height 0 along the boundary. Then, taking another unit cube of height 0 at the meeting point of the boundaries of the three loops, we obtain the integral loop shown in Figure 7(d) right.

5.3. Sufficient Conditions for L₀ to be a Loop

Sufficient conditions for L₀ of Problem 5.1 to be a loop are given using the two concepts defined below.

Definition 5.4. (The set $N_{h t = 0} (L)$ of normal edges) Let $L \in L I_{T C O N E}$ . $N_{h t = 0} (L)$ denotes the set of all the normal edges of height 0 contained in L.

Definition 5.5. (Rifts of a Loop-Intermediate) A crack of $L \in L I_{T C O N E}$ is a polygonal chain of normal edges of L connected to the boundary of L. A crack is called a rift if it consists of more than one normal edge.

Lemma 5.6. Let $M_{0} \in R I$ . Let $L_{1}, L_{2} \in L I_{T C O N E}$ such that $| L_{1} |, | L_{2} | \leq M_{0}$ . Let $σ \in Γ_{T} ({| L_{1} |, | L_{2} |})$ . If $G_{H} (L_{1}) \circ G_{H} (L_{2})$ is well-defined, then

$N_{h t = 0} (L (ψ_{σ (| L_{1} |)}) + L (ψ_{σ (| L_{2} |)})) = N_{h t = 0} (L (ψ_{σ (| L_{1} |)})) \cap N_{h t = 0} (L (ψ_{σ (| L_{2} |)}))$ . (91)

Figure 7. Computation of loops from the hexagonal base.

In general, the set $N_{h t = 0} (\sum_{i} L (ψ_{σ (| L_{i} |)}))$ shrinks monotonically as more loop- intermediates are added.

Proof. Because of Remark 4.27, removed normal edges of height 0 are removed forever. ∎

Proposition 5.7. (Sufficient Conditions to be a Loop) Settings are the same as for Problem 5.1. Let $l p_{i} = L (ψ_{σ (c_{i})}) \in I_{B} (i = 1, 2)$ , i.e.,

$L (ψ_{σ_{S_{V}} (c_{i} \oplus {(m_{0} \ c_{i})}_{H})}) = l p_{i} \oplus {(m_{0} \ c_{i})}_{H} (i = 1, 2)$ . (92)

Then, $L_{0}$ consists of a single loop if $G_{H} (L_{1}) \circ G_{H} (L_{2})$ is well-defined and one of the following three conditions are satisfied:

1) No cracks of $l p_{1}$ and $l p_{2}$ connect to $\partial M_{0} \cap \partial (c_{1} \cap c_{2})$ . There is at most only one rift of $l p_{1}$ that penetrates into $c_{1} \cap c_{2}$ through $c_{1} \ c_{2}$ . No rift of $l p_{2}$ penetrates into $c_{1} \cap c_{2}$ through $c_{2} \ c_{1}$ (Figure 8(a) and Figure 8(b)).

2) No cracks of $l p_{1}$ and $l p_{2}$ connect to $\partial M_{0} \cap \partial (c_{1} \cap c_{2})$ . There is at most only one rift of $l p_{2}$ that penetrates into $c_{1} \cap c_{2}$ through $c_{2} \ c_{1}$ . No rift of $l p_{1}$ penetrates into $c_{1} \cap c_{2}$ through $c_{1} \ c_{2}$ .

3) Both $l p_{1}$ and $l p_{2}$ have a crack connected to $\partial M_{0} \cap \partial (c_{1} \cap c_{2})$ . No rift of $l p_{1}$ penetrates into $c_{1} \cap c_{2}$ through $c_{1} \ c_{2}$ . No rift of $l p_{2}$ penetrates into $c_{1} \cap c_{2}$ through $c_{2} \ c_{1}$ (Figure 8(c)).

Proof. Since $G_{H} (L_{1}) \circ G_{H} (L_{2})$ is well-defined, normal edges of height 0 are not added as a result of addition by Lemma 5.6. That is, cracks are extended only by normal edges of height n, where $n \in N$ such that $n \neq 0$ and n is even. “No cracks of $l p_{1}$ and $l p_{2}$ connect to $\partial M_{0} \cap \partial (c_{1} \cap c_{2})$ ” implies $M_{0}$ is not separated by a polygonal chain contained in $c_{1} \cap c_{2}$ .

Since $l p_{1}$ and $l p_{2}$ bring no normal edges of height n ( $n \neq 0$ ) into $c_{2} \ c_{1}$ and $c_{1} \ c_{2}$ , respectively, the result follows. ∎

Shown in Figure 9 are examples where the sufficient conditions are not satisfied.

In Figure 9(a), both $l p_{1}$ and $l p_{2}$ have a rift that penetrates into $c_{1} \cap c_{2}$ through $c_{1} \ c_{2}$ and $c_{2} \ c_{1}$ , respectively. In Figure 9(b), $l p_{1}$ has two rifts that penetrate into $c_{1} \cap c_{2}$ through $c_{1} \ c_{2}$ . In Figure 9(c), both $l p_{1}$ and $l p_{2}$ have a crack connected to $\partial M_{0} \cap \partial (c_{1} \cap c_{2})$ , and $l p_{1}$ has a rift that penetrates into $c_{1} \cap c_{2}$ through $c_{1} \ c_{2}$ .

Figure 10(a) is the case given in Figure 1(a). Shown in Figure 10(a) bottom are all normal edges of height 0, where both $l p_{1}$ and $l p_{2}$ have a rift consisting

Figure 8. Sufficient conditions for L₀ of Problem 5.1 to be a loop.

Figure 9. Examples where the sufficient conditions are not satisfied.

Figure 10. Incremental design of protein-like molecules (Examples given in Figure 1).

normal edges of height 0. All other normal edges are height 1. As a result of addition, some of the normal edges of height 0 are removed and we obtain the loop $l p_{0}$ .

Figure 10(b) is the case given in Figure 1(b). Shown in Figure 10(b) bottom are all normal edges of height 0 and height −1 (thick line segments). All other normal edges are height 1. In this case, $G (L_{1}, M_{0}) \circ G (L_{2}, M_{0})$ is not well- defined and we cannot use Lemma 5.6. As a result of addition, the rift of $l {p^{'}}_{2}$ is extended by normal edges of height 0 and we obtain two loops $l {p^{'}}_{3}$ and $l {p^{'}}_{4}$ .

6. Discussion

A novel design method for protein-like molecules is proposed from the perspective of Sheaf Theory. In this method, a new molecule of a given shape is obtained as the sum of smaller molecules. Since the sum of loops is not a loop in general, sufficient conditions for a sum to be a loop are also considered. We believe this method is essential, especially when designing hybrids of known proteins.

Previous mathematical studies of protein structure have focused primarily on characterization and classification of structures, and the author is aware of no other mathematical research on protein design. As such, there is much room for improvement in this study, which is still in its infancy. The author hopes that this paper will inspire more mathematicians to become interested in the mathematical research on protein design.

As directions for future research, there are two directions. One is the study of three-dimensional case, in which protein-like molecules are represented as a loop of tetrahedra [12] . The other is the study of loops on various hexagonal meshes other than the “flat” mesh H considered in this paper [13] . Examples include hexagonal meshes on the surface of 3D molecules (i.e., loops of tetrahedra). Note that a 2D triangular flow is induced on the surface of a complex of loops of tetrahedra.

In the three-dimensional case, two difficulties arise. First, the shape of a molecule is given on a mesh of dodecahedron, where a dodecahedron can be divided into four loops of tetrahedra (A hexagon cannot be divided into more than one loop of triangles). Second, the height of normal edges of tetrahedra is classified into three congruence classes of modulo 3, not two congruence classes of modulo 2.

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper.

References

[1]	Berman, H.M., Westbrook, J., Feng, G., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N. and Bourne, P.E. (2000) The Protein Data Bank. Nucleic Acids Research, 28, 235-242. https://doi.org/10.1093/nar/28.1.235
[2]	Taylor, W.R., May, A.C.W., Brown, N.P. and Aszódi, A. (2001) Protein Structure: Geometry, Topology and Classification. Reports on Progress in Physics, 64, Article No. 517. https://doi.org/10.1088/0034-4885/64/4/203
[3]	Albou, L.-P., Schwarz, B., Poch, O., Wurtz, J.M. and Moras, D. (2009) Defining and Characterizing Protein Surface Using Alpha Shapes. Proteins, 76, 1-12. https://doi.org/10.1002/prot.22301
[4]	Gromov, M. (2011) Crystals, Proteins, Stability and Isoperimetry. Bulletin of the American Mathematical Society, 48, 229-257. https://doi.org/10.1090/S0273-0979-2010-01319-7
[5]	Xia, K. and Wei, G.-W. (2014) Persistent Homology Analysis of Protein Structure, Flexibility and Folding. International Journal for Numerical Methods in Biomedical Engineering, 30, 814-844. https://doi.org/10.1002/cnm.2655
[6]	Penner, R.C. (2016) Moduli Spaces and Macromolecules. Bulletin of the American Mathematical Society, 53, 217-268. https://doi.org/10.1090/bull/1524
[7]	Zhao, R., Wang, M., Chen, J., Tong, Y. and Wei, G.-W. (2021) The de Rham-Hodge Analysis and Modeling of Biomolecules. Bulletin of Mathematical Biology, 82, Article No. 108. https://doi.org/10.1007/s11538-020-00783-2
[8]	Morikawa, N. (2019) Design of Self-Assembling Molecules and Boundary Value Problem for Flows on a Space of n-Simplices. Applied Mathematics, 10, 907-946. https://doi.org/10.4236/am.2019.1011065
[9]	Hartshorne, R. (1977) Algebraic Geometry. Springer-Verlag, New York. https://doi.org/10.1007/978-1-4757-3849-0
[10]	Milewski, B. (2016) Category Theory for Programmers 1.1: Motivation and Philosophy. http://youtube.com/watch?v=I8LbkfSSR58
[11]	Morikawa, N. (2020) On the Defining Equations of Protein’s Shape from a Category Theoretical Point of View. Applied Mathematics, 11, 890-916. https://doi.org/10.4236/am.2020.119058
[12]	Morikawa, N. (2018) Global Geometrical Constraints on the Shape of Proteins and Their Influence on Allosteric Regulation. Applied Mathematics, 9, 1116-1155. https://doi.org/10.4236/am.2018.910076
[13]	Morikawa, N. (2022) Discrete Exterior Calculus of Proteins and Their Cohomology. Open Journal of Discrete Mathematics, 12, 47-63. https://doi.org/10.4236/ojdm.2022.123004

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies