Using Parallel Filtering Algorithms to Solve the 0-1 Knapsack Problem on DNA-based Computing

It is shown first by Adleman that deoxyribonucleic acid (DNA) strands could be employed towards calculating solution to an instance of the NP-complete Hamiltonian Path Problem (HPP). Lipton also demonstrated that Adleman’s techniques could be used to solve the satisfiability (SAT) problem. In this paper, it is demonstrated how the DNA operations presented by Adleman and Lipton can be used to develop the DNA-based algorithm for solving the 0-1 Knapsack Problem.


Introduction
In 1961, Feynman first offered bio-molecular computation, but his idea was not implemented by experiments for a few decades [1]. Adleman [2] in 1994 succeeded to solve an instance of the Hamiltonian path problem in a test tube, just by handling DNA strands. From [5], it was indicated that an optimal solution of every NP-complete or NP-hard problem is determined from its characteristics. DNA-based algorithms had been proposed to solve many computational problems and those consisted of the satisfiability problem [3], the maximal clique [6], three-vertex-coloring [7], the set-splitting problem [8], the set-cover problem and the problem of exact cover by 3-sets [9], the dominating-set [10], the maximum cut problem [11], the binary integer programming problem [12] and the set-partition problem [24]. One potentially significant area of application for DNA algorithms is the breaking of encryption schemes [13,14]. From [15][16][17], DNA-based arithmetic algorithms are proposed. Furthermore from [18], DNA-based algorithms for constructing DNA databases are also offered.
The rest of the paper is organized as follows. Section 2 introduces DNA models of computation proposed by Adleman and his co-authors in details. Section 3 introduces the DNA program to solve the 0-1 kapsack problem from solution spaces of DNA strands. Conclusions are drawn in Section 4.

DNA Model of Computation
In the last decade there have been revolutionary advances in the field of biomedical engineering particularly in recombinant DNA and RNA manipulating. Due to the industrialization of the biotechnology field, laboratory techniques for recombinant DNA and RNA manipulation are becoming highly standardized. Basic principles about recombinant DNA can be found in [22][23][24][25][26]. In this section we describe eight biological operations useful for solving the 0-1 knapsack problem. The method of constructing DNA solution space for the 0-1 knapsack problem is based on the proposed method in [20,21].
A (test) tube is a set of molecules of DNA (a multi-set of finite strings over the alphabet {A, C, G, T}). Given a tube, one can perform the following operations: 1. Extract: Given a tube T and a short single strand of DNA, "s", produce two tubes + (T, s) and -(T, s), where + (T, s) is all of the molecules of DNA in T which contain the strand "s" as a sub-strand and − (T, s) is all of the molecules of DNA in T which do not contain the short strand "s".
2. Merge: Given tubes T 1 and T 2 , yield ∪(T 1 , T 2 ), where ∪(T 1 , T 2 ) = T 1 ∪ T 2. This operation is to pour two tubes into one, without any change of the individual strands.
3. Amplify: Given a tube T, the operation Amplify (T, T 1 , T 2 ), will produce two new tubes T 1 and T 2 so that T 1 and T 2 are a copy of T (T 1 and T 2 are identical) and T becomes an empty tube.
4. Append: Given a tube T and a short strand of DNA," s", the operation will append the short strand, "s", onto the end of every strand in the tube T. It is denoted by Append (T, s).
5. Append-head: Given a tube T and a short strand of DNA, "s", and the operation will append the short strand, "s", onto the head of every strand in the tube T. It is denoted by Append-head (T, s). 6. Detect: Given a tube T, say 'yes' if T includes at least one DNA molecule, and say 'no' if it contains none. It is denoted by Detect (T). 7. Discard: Given a tube T, the operation will discard the tube T. It is denoted by Discard (T). 8. Read: Given a tube T, the operation is used to describe a single molecule, contained in the tube T. Even if T contains many different molecules each encoding a different set of bases, the operation can give an explicit description of exactly one of them. It is denoted by Read (T).

The DNA Algorithms for
knapsack and x i = 0 otherwise. Without loss of generality we also assume that w i ≤ M for 1≤ i ≤ q, so each item fits into the knapsack.
For example, suppose we have three items expressed by a finite set S = {item 1 , item 2 , item 3 }, the weight and profit for each item are listed as Table1. The value of M, the maximum weight that knapsack can carry, is 10 lb. The total subsets of S are, respectively, ∅, 3 }. According to the definition above of the 0-1 knapsack problem, their corresponding binary values of {x 3 , x 2 , x 1 } are subsequently, 000, 001, 010, 100, 011, 110, 101, 111. The feasible solutions satisfying the constraint w 1 x 1 + w 2 x 2 + w 3 x 3 ≤ 10 are ∅, {item 1 }, {item 2 }, {item 3 }, {item 1 , item 2 } and {item 1 , item 3 }. Clearly the optimal solution is to fill the knapsack with items whose profit is the largest. Then the optimal solution to this problem instance is A = {item 1 , item 3 }, because total profit $11 is the maximum satisfied the constraint that the total weight of all selected items doesn't exceed 10 lb.

Construct the Solution Space of DNA Strands for the 0-1 Knapsack Problem
Assume that x q x q-1 …x 2 x 1 is a q-bit binary number, which is applied to represent one of the 2 q subsets of a q-element set S. From [20,21], for every bit x k representing the kth element in S for 1≤ k ≤ q, two distinct 15-base value sequences are designed. One represents the value "0" for x k, and the other represents the value "1" for x k . For sake of convenience in our representation, assume that x k 1 , which represents the kth item selected in set S, denotes the value of x k to be 1 and x k 0 , which represents the kth item not selected in set S, denotes the value of x k to be 0. The following DNA-based algorithm is used to construct the solution space for 2 q possible subsets of a q-element set S. The result generated by Init (T 0 , q) for our example in the previous section is shown in Table 2.

Tube
The result generated by Init (T0, q)

Lemma 1:
The algorithm, Init (T 0 , q) is used to construct the solution space of 2 q possible subsets for a q-element set S.
Proof: The algorithm, Init (T 0 , q), is implemented via amplify, append and merge operations.
Step (1) and Step (2) are subsequently applied to append DNA sequences, which represent the value "1" for x q and the value "0" for x q respectively , onto the end of every strand in tube T 1 and tube T 2 . This means that subsets containing the qth element appear in tube T 1 , and subsets not containing the qth element appear in tube T 2. Next, Step (3) is used to pour tube T 1 and T 2 into tube T 0 . This indicates that DNA strands in tube T 0 include DNA sequences of x q = 1 and x q = 0.
Each time Step (4a) is performed, it uses amplify operation to copy the contents of tube T 0 into two new tubes, T 1 and T 2 , which are copies of T 0 . Tube T 0 becomes empty. Step (4b) and Step (4c) are used to subsequently append DNA sequences, respectively representing the value "1" for x k and the value "0" for x k , onto the end of every strand in tube T 1 and tube T 2. This implies that subsets containing the kth element appear in tube T 1 and subsets without containing the kth element appear in tube T 2. Next, Step (4d) is used to pour tube T 1 and T 2 into tube T 0 . This indicates that DNA strands in tube T 0 include DNA sequences of x k = 1 and x k = 0. After repeating execution of Step (4a) through Step (4d), it finally produces tube T 0 that consists of 2 q DNA sequences representing 2 q possible subsets. Therefore, it is inferred that 2 q possible subsets of a q-element set S can be constructed with DNA strands via this algorithm.
From Init (T 0 , q), it takes (q−1) amplify operations, 2 × q append operations, q merge operations and three test tubes to construct the solution space for a q-element set S. A q-bit binary number corresponds to a subset. A value sequence for every bit contains 15 bases. Therefore, the length of a DNA strand, encoding a subset, is 15 × q bases, which is comprised of the concatenation of one value sequence for each bit.

Solution Space of the Value for Every Element of Each Subset for Solving the 0-1 Knapsack Problem of a Finite Set
For the purpose of appending the DNA strands that encode the weight w m or profit p m of all selected items, an element w m (p m ) for 1≤ m ≤ q representing the size of weight (profit) of items m can be converted as a n-bit binary number, w m,n , w m,n-1 ,…, w m,2, w m,1 (p m,n , p m,n-1 ,…, p m,2 , p m,1 ). Suppose that w m,n (p m,n ) is the most significant bit, while w m,1 (p m,1 ) is the least significant bit. For every bit w m,k (p m,k ) , 1≤ m ≤ q and 1 ≤ k ≤ n, from [20,21] two distinct DNA sequences are designed. One corresponds to the value "0" for w m,k (p m,k ), and the other corresponds to the value "1" for w m,k (p m,k ). For the sake of convenience in our representation, assume that w m,k 1 (p m,k 1 ) denotes the value of w m,k (p m,k ) to be 1 and w m,k 0 (p m,k 0 ) defines the value of w m,k (p m,k ) to be 0. The following algorithm is employed to construct the binary values of weight and profit for each element in 2 q subsets of a q-element set S. The partial result generated by ValueWT_PT (T 0 , q, n) for our example in subsection 3.1 is shown in Table 3.
End For End Procedure Lemma 2: The binary value of weight and profit for each element in 2 q subsets of a q-element set S can be constructed from the algorithm, ValueWT_PT (T 0 , q, n). Proof: Refer to Lemma 1.
From ValueWT_PT (T 0 , q, n), it takes q extract operations, 4 × n × q append operations, q merge operations and three test tubes to construct the solution space for elements in 2 q subsets of a q-element set S. A q-bit binary number corresponds to a choice of items and an n-bit binary number encodes the weight or profit of an item. A value sequence for every bit contains 15 bases. Therefore, the length of a DNA strand, encoding the corresponding weight and profit for 2 q possible choices of q items, is 15 × (4 × n × q) bases which consist of the concatenation of one value sequence for each bit.

The Construction of a Parallel One-bit Adder
A one-bit adder is a Boolean function that forms the arithmetic sum of three inputs. It includes three inputs and two outputs. Two of the input bits represent augend and addend, respectively. The third input represents the carry from the previous lower significant position. The first output gives the value of the least significant bit of the sum for augend, addend and previous carry. The second output gives the output carry transferred into the input carry of the next one-bit adder. The truth table of the one-bit adder is shown in Table 4. Suppose that two one-bit binary numbers, α m-1,k and α m,k , represent the first input (addend) and the first output (sum) of a one-bit adder for 1≤ m ≤ q and 1≤ k ≤ n, respectively. A one-bit binary number, β m,k , is applied to represent the second input(augend) of a one-bit adder. Two one-bit binary numbers, γ m,k-1 and γ m,k , are used to represent the third input (previous carry) and the second output (carry) of a one-bit adder respectively. From [20,21], two distinct DNA sequences are designed to represent the value "0" and "1" for every corresponding bit. For the sake of convenience in our representations, assume that β m,k 1 contains the value of β m,k to be 1, and β m,k 0 contains the value of β m,k to be 0. Also suppose that  α m-1,k , β m,k , γ m,k-1 , m, k), can be applied to perform the Boolean function of a parallel one-bit adder.
From ParallelOneBitAdder (T 0 , α m-1,k , β m,k , γ m,k-1 , m, k), it takes seven extract operations, eight detect operations, sixteen append-head operations, one merge operation, and fifteen test tubes to compute the addition of three input bits. Two output bits of a one-bit adder encode the sum and the carry to the addition of a bit. A value sequence for every output bit contains 15 base pairs. Therefore the length of a DNA strand, encoding two output bits has 30 base pairs, consists of the concatenation of one value sequence for each output bit.

The Construction of a Parallel N-bit Adder
The parallel one-bit adder introduced in subsection 3.4 figures out the arithmetic sum of two bits and a previous carry. Similarly, A binary parallel n-bit adder is also directly to perform the arithmetic sum for the two input operands of n-bit and the input carry by means of performing this one-bit adder n times. The following algorithm is proposed to perform the arithmetic sum for a parallel n-bit adder.
Procedure ParallelAdder (T 0 , α, β, γ, q, n) can be applied to perform the Boolean function to a binary parallel adder of n bits. Proof: Refer to Lemma 1.
From ParallelAdder (T 0 , α, β, γ, q, n), it takes 7× n × q extract operations, (n + q + 2 × n × q) append operations, n × q merge operations and fifteen test tubes to compute the sum of weight for elements in 2 q subsets of a q-element set S. A q-bit binary number corresponds to a subset. An n-bit binary number encodes the size of weight for an element in S. Therefore, (q + 1)× n bits correspond to the sum of weight for q elements, and one accumulator element (α). q × (n + 1) bits encode the carry of the sum. A value sequence for every bit contains 15 bases. Therefore, the length of a DNA strand, encoding the total weight of selected items, is 15 × n base pairs consisting of the concatenation of one value sequence for each bit.

Parallel Comparator for Comparing the Sum of Weight Corresponding to Subsets of a Finite Set with Any Given Positive Integer
Any given positive integer, M, can be converted as n one-bit binary numbers, M n M n-1 …M 2 M 1 . The main advantage is that it is feasible for bit operations of the DNA , T 0 < , q, n), it takes 2 × n extract and detect operations, 4 × n + 1 merge operations, one discard operation and seven test tubes to carry out the function of an n-bit parallel comparator.

Search and Calculation of the Maximum Total Profit under the Restriction That the Total Weight Does Not Exceed the Capacity of Knapsack M
The following algorithm is applied to determine which strand has the maximum profit among those chosen items satisfying the constraint of problem. The strand that remains in tube T 0 has the greatest binary value of profit after performing this algorithm.
From SearchMaxi (T 0 , q, n), it takes n extract operations, n detect operations, 2 × n merge operations and n discard operations, and three test tubes to carry out the function of searching the maximum total profit subject to the condition that the total weight of selected items must not exceed the capacity of the knapsack.

Conclusions
The knapsack problem proved to be the NP-complete problem by restriction has been solved by a number of different algorithms with exponential time complexity in conventional silicon-based computer [27]. Here the proposed algorithm for solving the 0-1 knapsack problem is based on basic biological operations. The number of tubes, the number of biological operations, the number of memory strands and the longest length of memory strands, respectively, are O (1), O (q × n), O (2 n ) and O (q × n).
The presented algorithm has several advantages lying in its massive parallelism as described below. First, these biological operations needed in the proposed algorithm are experimentally feasible in lab level [20,21]. Second, the DNA-based algorithms of polynomial time complexity are proposed for searching the maximum profit among a group of items. Third, the Adleman program [20,21] can be applied to generate good DNA sequences for the constructing solution space of our problem. It demonstrates that the proposed algorithm has a lower rate of errors for hybridization. Fourth, the contribution of this study is that DNA-based algorithms developed herein can be applied to solve addition-related problems.