Quantum Algorithm for Mining Frequent Patterns for Association Rule Mining ()
1. Introduction
We live in a digitalized era where vast amounts of data are collected daily. Data mining is a process of extracting interesting information, which is called knowledge discovery (KDD) in a database. According to [1] , data mining is a field related to machine learning, but data mining is more applied than machine learning. Data mining has a broader spectrum of applications in engineering, science, business, medical applications, and many other areas. Data mining is used to process the raw data on a larger scale of data. There are various types of data mining techniques, such as association rules, classification, and clustering [2] . Association rule mining (ARM) is one of the most important research topics in data mining. ARM aims to discover interesting correlations, frequent patterns, associations, or causal structures between items hidden in a large database. Association rule mining is a branch of unsupervised learning processes that discover hidden patterns in data in the form of easily recognizable rules. Association rule mining is often termed as market basket analysis which studies the buying behaviors of customers by searching for sets of items that are frequently purchased together. Association rule mining is widely used in the retail analysis of transactions [3] [4] , recommendation engines [5] [6] [7] , web mining [8] [9] , medical diagnosis, bioinformatics [10] , and other applications [11] in various areas.
• Retail analysis of transactions: The data from past transactions can be used to generate items that the customers most like to be purchased together. The retailer can then adjust the store layout, sales strategy, bundling prices, and inventory control to take advantage of the extracted rules that are generated from association rule mining.
• Recommendation engines: Recommendation systems such as entertainment, news, and social media can be designed using association rule mining to recommend the most interesting content based on the user’s past behavior.
• Web mining: Web usage mining is used in e-commerce applications to get useful information about the behavior of customers. Association rule mining is applied to the data from past behavior of the customers, and then rules based on customer preferences are generated. The developers can optimize and improve websites to personalize the web portals based on the association rules.
• Medical diagnosis: Association rule mining can be used to help diagnose patients. The symptoms of diseases and illnesses can be identified to find the conditional probability of the occurrence of illness using associate rule mining.
• Bioinformatics: There are many applications in bioinformatics problems, such as protein interaction networks, gene expression data, and others [10] , that can be applied to association rule mining to identify biologically relevant patterns. These patterns can be translated into a biological context.
Association rule mining increases revenue by ensuring customer satisfaction based on customized web portals, as well as enhances medical treatment by relating the severity of the sickness and its symptoms.
Given a set of transactions in a database, the goal is to find the association rules that connect between itemsets. For example,
(X implies Y), which means the customer who buys the items in X also tends to buy the items in Y. The implication means the co-occurrence of X and Y items. If patterns within transaction data tell us that baby formula and diapers are usually purchased together in the same transaction, a retailer can take advantage of this association for bundle pricing, product placement, and even shelf space optimization within the store layout. Association rules are considered interesting and called strong if they satisfy the user-predefined thresholds, which are minimum support (minsup) and minimum confidence (minconf) thresholds. These threshold values are pre-defined by users or domain experts.
The original association rule mining problem was first introduced in [12] . Let
be the set of all items. Let T be a set of database transactions where
and
. Each transaction is associated with an identifier, called a TID. A set of items is referred to as an itemset. An itemset that contains k items is called a k-itemset. k-itemsets that occur frequently are called a frequent k-itemsets. Association rules of the form
(X implies Y) is measured by support, the percentage of transactions that contain both X and Y, the union of itemsets X and Y. The support is taken to be the probability,
.
The support in the above equation is called relative support, where the frequency or occurrence of an item is called an absolute support or a support count. Another objective measure for association rules is confidence, which assesses the degree of certainty of the detected association [13] . The rule
has confidence, the percentage of transactions that containing X also contains Y. The confidence is taken to be the conditional probability,
.
The main objective of ARM is to discover the itemsets that frequently appear in the transactions. The support (occurrence frequency) for each of these itemsets is generated from a number of candidate itemsets and not less than a pre-defined threshold [13] [14] . In Figure 1, association rule mining is decomposed into two phases: First, find out all the frequent itemsets such that each of these itemsets will occur at least as frequently as the pre-defined minimum support threshold. Those itemsets are called frequent or large itemsets. Second, generate association rules from those frequent itemsets with the constraints of minimum confidence threshold. There are many algorithms for mining frequent itemsets; the first phase of the association rule mining task. This first phase dominates the complexity of the whole process.
Figure 1. Association rule mining phases.
2. Related Works
2.1. Classical Algorithms for Association Rule Mining
The AIS (Agrawal, Imielinski, Swami) algorithm was the first algorithm proposed for mining association rules in [12] . In the AIS algorithm, the database was scanned many times to get maximum frequent or large itemsets. During the first pass, C1, the candidate 1-itemsets are generated, and the support count of each individual item was accumulated. From candidate 1-itemsets, F1 frequent 1-itemsets are generated by eliminating itemsets whose support count less than the value of minsup. Candidate 2-itemsets are generated by extending frequent 1-itemsets with other items in the same transaction. During the second pass over the database, the support counts of those candidate 2-itemsets are accumulated and checked against the minsup. Similarly, those candidate (k + 1)-itemsets are generated by extending frequent k-itemsets with items in the same transaction. However, extending the itemsets that are not present in the previous pass results in unnecessarily generating and counting too many candidate itemsets that turn out to be small.
A number of ARM algorithms were proposed. Among these, the most well-known algorithm is the Apriori algorithm [14] that makes additional use of prune property to those candidates which have an infrequent subset before counting their supports. This optimization is possible because the support values of all subsets of a candidate are known in advance. The Apriori algorithm employs an iterative approach known as level-wise search, where k-itemsets are used to explore (k + 1)-itemsets. First, the set of candidate 1-itemsets is found by scanning the database to accumulate the count for each item and collecting those items that satisfy the minimum support threshold. The resulting set is denoted by F1, the set of frequent 1-itemsets. Next, F1 is used to find C2, the set of candidate 2-itemsets, which is used to find F2, and so on, until no more frequent k-itemsets can be found. The finding of each Ck requires one full scan of the database, which becomes the dilemma of performance in ARM when the transaction database is very large.
In the Apriori algorithm, finding of each Ck requires one full scan of the database. To improve the efficiency of the level-wise generation of frequent itemsets, an important property called the Apriori property is used to reduce the search space. Apriori property is:
• If an itemset is frequent, then all its subsets must also be frequent.
• If an itemset is not frequent, then all its supersets cannot be frequent.
To construct candidate Ck combine frequent itemsets of size k. If
, take all 1-itemset. If
, join pairs of itemset that differ by just one item. For each generated candidate itemset ensure that all subsets of size k are frequent. The first pass of the Apriori algorithm simply counts item occurrences to determine the frequent 1-itemsets. A subsequent pass, say pass k, consists of two phases. First, the frequent itemsets
found in the
pass are used to generate the candidate itemsets Ck, using two step processes: self-join and prune.
• Self-join
: Generate set Ck by joining
itemsets that share the first
items.
• Prune: Remove from Ck the itemsets that contain a subset k-itemset that is not frequent.
Second, the database is scanned and the support of candidates in Ck is counted.
Example: Let a store promote certain computer accessories, and the store manager bundles some accessories with other discounted accessories. The manager wants to know which accessories the customers pay mostly when purchased together. Table 1 contains transactions with items.
Let minsup = 2 and mincof = 70%. We need first to generate a maximum frequent itemset. The transaction database in Table 1 is applied in the Apriori algorithm, as can be seen in Figure 2. The set of candidate 1-itemsets, C1, obtained and then scanned the database to count the support of each itemset. The set of frequent 1-itemsets, F1, whose support is equal to or greater than 2, the minsup value, are generated from C1. Items in F1 are joined to get candidate 2-itemsets, C2. The database is scanned, and the support of each candidate itemset in C2 is counted. The set of frequent 2-itemsets, F2, is determined based on the support count of each candidate 2-itemset in C2. From F2 pairs of itemset are joined that differ by just one item to get candidate 3-itemsets, C3. The support of each itemset
Figure 2. Generating maximum frequent itemsets using the apriori algorithm.
in C3 is counted by scanning the database and F3 is generated based on the minsup value. The candidate of 4-itemset, C4, is created by joining itemsets that differ by just two items in F3. The support count of C4 itemset is less than the minsup value, so there is no frequent 4-itemsets, F4. Thus, in this case the maximum frequent itemset is F3. So, the maximum frequent itemset are: {Mouse, Keyboard, Monitor}, {Mouse, Keyboard, Headphone} and {Mouse, Monitor, Headphone} as can see in Figure 2.
Let generate association rules based on mincof = 70% for {Mouse, Keyboard, Monitor}:
Rule 1: {Mouse, Keyboard} → {Monitor}
Rule 2: {Mouse, Monitor} → {Keyboard}
Rule 3: {Keyboard, Monitor} → {Mouse}
Rule 4: {Mouse} → {Keyboard, Monitor}
Rule 5: {Keyboard} → {Mouse, Monitor}
Rule 6: {Monitor} → {Mouse, Keyboard}
If the desired mincof is 70%, then all interested rules that satisfy the mincof are rules 1, 3, and 5. Based on rule 1, 75% of the customers who purchase {Mouse, Keyboard} also purchase a Monitor. Also, based on rule 3, 100% of the customers who purchase {Keyboard, Monitor} always purchase {Mouse}.
Although the research in the Apriori algorithm has grown the recent years [15] , the drawback of the Apriori algorithm is the necessity of scanning the whole database many times. Based on the Apriori algorithm, many new algorithms were designed with some modifications or improvements. Generally, there were two approaches: one is to reduce the number of passes over the whole database or replace the whole database with only part of it based on the current frequent itemsets, and another approach is to explore different kinds of pruning techniques to make the number of candidate itemsets much smaller [16] . Direct hash and pruning (DHP) [17] and partitioning [18] are modifications of the Apriori algorithm. Another mining technique is to simultaneously mine both frequent and infrequent itemsets [19] .
Associate rule mining aims to extract interesting correlations from a raw dataset that contains a huge number of database transactions and items in each transaction. Computing such a large scale of raw data in classical computing is very expensive. Leveraging the quantum search algorithm can solve such a problem significantly faster than the classical algorithm. Grover’s search algorithm gives a quadratic speedup compared to an exhaustive classical algorithm for the same problem.
2.2. Quantum Algorithms for Association Rule Mining
Quantum algorithms for association rule mining [20] [21] was proposed using a quantum counting algorithm [22] . The quantum circuit design that was proposed in [20] [21] presented experimental implementation for 2 × 2 (two transactions and two items). The experiment for 2 × 2 required 9 qubits. Let T, the number of the transaction and I, the number of the items, the required number of qubits is equal to 2TI + 1 for each iteration to find the maximum frequent k-itemset. Also, for each iteration, the whole database is a search space.
To find the maximum frequent k-itemsets from n items, we are interested only in the subspace of states of Hamming weight k. We propose an efficient method that the search space is reduced from 2n-dimensional Hilbert space to an
-dimensional subspace. Also, we presented an efficient quantum circuit design that the required number of qubits reduced significantly compared with [20] [21] . We present a new quantum design method for association rule mining to generate the maximum frequent k-itemsets, which required fewer qubits, and the search space based only on the candidate k-itemset to discover the maximum frequent k-itemsets. We modified Grover’s search algorithm [12] by employing the Dicke state [23] to create the superposition for k-itemset, quantum counter [24] [25] to count the frequent of k-itemset and quantum comparator to check the frequent k-itemset equal or greater than to minsup threshold. We start to transform the transaction database into a binary matrix such that the item value is “1” if the item is present in a transaction and “0” otherwise. The database is converted to binary matrix
, where each row corresponds to transaction in T, each column corresponds to an item in the set of all items I, and N is the number of transactions, and M is the number of items. A Boolean function in the sum-of-product (SOP) form is generated from the binary matrix such that each row of the matrix corresponds to one product term of the SOP function expression. In the presented case, the term is a product of variables for all items that have a value of one.
Apriori algorithm for associate rule mining, the maximum frequent k-itemset is required to scan the database for every 1-itemset, 2-itemset, etc. until to find the maximum k-itemset. The candidate itemset generated during an early iteration are generally of larger magnitude than the maximum frequent k-itemset, which likelihood can be found in later iterations. Therefore, the initial candidate itemset generation is the key issue in improving the performance of association rule mining [17] . In this case, we started the search from the maximum k-itemsets as the candidate to search the maximum frequent k-itemsets, which may be a better chance to be the actual or close to the maximum frequent k-itemsets rather than starting the search from candidate 1-itemsets. Thus, the database size and the computation cost are reduced substantially. We choose only all the terms that have the maximum k-itemset from the SOP function. We built the quantum oracle design from the optimized SOP function that contained only the maximum k-itemset combined with the quantum counter and quantum comparator. We run an experiment and perform analysis using QISKIT, an IBM quantum simulator [26] .
3. Quantum Oracle Design
First, let us create a binary matrix in Table 2 such that the item value is “1” if the item is present in a transaction and “0” otherwise. The binary matrix is based on items from Table 1.
To simplify, we change each item name to a letter so that we can build the SOP function. Let us observe that the SOP expressions used here are different from those applied in binary circuit synthesis. This is because here we can use repeated products or products included in other products, which can’t happen in SOP expressions used for binary circuit synthesis. Therefore, our expression compiled from Table 2 is as follows:
Let minsup = 2 and we need to find the maximum frequent 3-itemset. First, extract all itemsets equal to or greater than the 3-itemset.
Table 2. Binary matrix corresponds from Table 1.
Second, we decompose any itemset that is greater than 3-itemset. In this case, we have
which is decomposed into
such that:
Third, we build the quantum oracle for SOP function expression:
(1)
Each term in the traditional quantum oracle design for the SOP function is a Toffoli gate, and the outcome is stored in an additional ancilla qubit. One large Toffoli gate that is controlled for all results is used to compute the output of the SOP function. The problem with the traditional quantum oracle architecture is that one additional ancilla qubit is needed for each term. We proposed that each term of the SOP function be connected to a quantum counter and then mirror the term back before computing the next term. In [13] [14] , more information on this design is provided.
3.1. Grover’s Search Algorithms
Grover’s Algorithm [22] searches an unordered array of N elements to find a particular element with a given property. In classical computations, in the worst case, this search takes N queries (tests, evaluations of the classical oracle). In the average case, the particular element will be found in N/2 queries. Grover’s algorithm can find the element in
queries. Thus, Grover’s algorithm can be used to search the maximum frequent k-itemset for associate rule mining. Grover’s algorithm is a quantum search algorithm, which speeds up a classical search algorithm of complexity
to
in the space of N objects, hence Grover gives a quadratic speed up.
The SOP Boolean function in Equation (1) contains n variables from the given binary matrix in Table 2, which is used to represent the search space of
elements. To apply the SOP Boolean function in Grover’s algorithm, these N elements are applied in a superposition state which is the input to the oracle. If the oracle recognizes an element as the solution, then the phase of the desired state is inverted. This is called the Phase inversion of the marked element. Grover’s search algorithm uses another trick called inversion about the mean (average), which is also known as diffusion operation or amplitude amplification. Inversion about the mean amplifies the amplitude of the marked states and shrinks the amplitudes of other items. The amplitude amplification increases the probability of marked states, so that measuring the final states will return the target solution with a high probability near 1.
As shown in Figure 3(a), the n qubits in the superposition state result from applying a vector of Hadamard gates to initial state
. Next applied is repeated operator G which is called the Grover Loop. After the iteration of the Grover Loop operator
times the output is measured for all input qubits. Oracle can use an arbitrary number of ancilla qubits, but all these qubits must be returned to value
inside the oracle. The number of required iterations
(a)(b)
Figure 3. (a) Schematic circuit for Grover’s algorithm [22] . (b) Grover operator G.
for the Grover operator is:
where N is number of all search space elements and M is number of solutions. The Grover Loop G is a quantum subroutine which can be broken into four steps as shown in Figure 3(b).
1) Phase inversion: apply the oracle. If the oracle recognizes the solution, then the phase of the desired state is inverted.
2) Apply the Hadamard transform
(
).
3) Zero state phase shift: Perform the condition phase shift, in which all states receive a phase shift of -1 except for the zero state
.
4) Apply the Hadamard transform
.
Grover’s search algorithm is started to create the superposition by using Hadamard operator H. Hadamard operator of n qubits creates
quantum states. Hadamard operator of
applied
:
For maximum frequent k-itemsets, all N states are not solutions, but the solution would be only in Hamming weight k-itemsets. Hamming weight in binary is the number of ones in the binary number. k-itemsets is Hamming weight of k-itemset. For instance, finding the maximum frequent 2-itemset from 4 items, the solution will be only in 6 states with Hamming weight 2 which are equal to 0011, 0101, 0110, 1001, 1010, 1100. So, using the Hadamard operator for four items creates 16 quantum states (0000, 0001, …, 1111) which are not required to search from the possible solutions, but the possible solutions are only in 6 quantum states of Hamming weight 2.
3.2. Dicke State
The search space of search algorithms has a critical role in terms of performance in both classical and quantum computing. In Grover’s search algorithm starts preparing superposition states for the search problems. The Hadamard operator is the traditional way to create 2n superposition states for n qubits. There are many real problems such as symmetric Boolean function [27] , Johnson graph [28] , and frequent patterns for associate rule mining that the possible solutions are in the form of Hamming weight, a pure symmetric state in
states. For problems such as these, a proper quantum superposition state can be achieved using the Dicke state [23] . Finding the maximum frequent k-itemsets from n items, we are interested only in the subspace of states of Hamming weight k. The Dicke state
is an equal-weight superposition of all n-qubit states with Hamming Weight k (i.e., all strings of length n with exactly k ones over a binary).
creates
symmetric states. Below we illustrate practical examples of the Dicke state.
where
and
. There are n qubits and
of them are 0 and k are 1. For instance,
is Dicke state of 4 qubits that has 6 symmetric states and each state have 2 ones.
is Dicke state of 5 qubits that has 10 symmetric states and each state have 3 ones.
Dicke state creates arbitrary symmetric pure states
. Using the Dicke state increases the rate of the correct solution since the search is only in subspace. The classical bitstring can be converted to the Dicke state by a recursive unitary operation called Split & Cyclic Shift (SCS) unitary
[23] . To build
state, we start from
where the original sequence is multiplied by a factor of
and then shift the first zero to the end and shift the whole sequence forward by multiplying by a factor of
.
Dicke state given as input string
generates the entire Dicke states
by recursive unitary operation
.
The transformation or mapping of
is constructed by 1-controlled and 2-controlled Y-rotation
gate between two CNOT, where
. Dicke state is constructed recursively by smaller
, where
.
In Figure 4 and Figure 5 presented are the explicit constructions of
. To get more in-depth view of this construction can refer to [23] . The Ry gate is a single qubit rotation gate through angle θ radian. The Ry gate rotation is around Y-axis by angle θ. If the controlled qubits for Ry gate are all active, then the original sequence is multiplied by a factor of
and then shift the first zero to end and shift the whole sequence forward by multiplying by a factor of
. If the controlled qubits for Ry gate are not all active, then the two CNOT gates cancel each other.
The full circuit of Dicke state
is constructed recursively by
as can be seen in Figure 6 which contains Ry gate in between two CNOT gates. The Ry gates are controlled either one-qubit or two-qubits.
Figure 4. Construction of
of a two-qubit gate [23] .
Figure 5. Construction of
of a three-qubit gate [23] .
Figure 6. Dicke state
circuit, where
gates are shorthand for Y-roatation
[23] .
3.3. Quantum Counter
Quantum counter [24] [25] is used to count the number of terms in the SOP Boolean function. Quantum counter block is a sequence of n-Toffoli followed by Feynman (CNOT) gates which is called Peres gate [29] , as shown in Figure 7. The first qubit is applied a constant 1 with other variables combined, and the Peres gate is then turned into a quantum counter. (This qubit will be next taken from the term of the SOP formula to activate the counter block realized from Peres gates). For simplicity of explanation, we assume that the counter block is built from Toffoli and CNOT gates, as shown in Figure 7.
Here z is the least significant qubit and x the most significant. The outputs of CNOT and two of the Toffoli gates are
,
, and
, respectively. When
, the first Toffoli gate outputs
and the second
.
The outputs of the qubits y and x are both zeros. The output of the qubit z is
. Hence the circuit incremented 000 by 1 to 001. Quantum counter circuit indeed outputs the value input +1.
If we connect the first control input of the quantum counter block to a circuit, then the output of the connected circuit (a term of the SOP) will either activate or deactivate the counter. When the output of the connected circuit is equal to 1, the output of the counter block is incremented by 1. When the output of the circuit is equal to 0, the output of the counter block is unchanged.
3.4. Quantum Comparator
In Table 3, the truth table of one-bit comparator which contains equal, greater than and less than.
For 1-qubit comparator, to check equal (
) or greater than (
), we add both values
. To build this circuit requires 5 qubits, 2 Toffoli, 4NOT, and 2CNOT gates, as can be seen in Figure 8(a). For more than 1-qubit comparator, the circuit would be huge in terms of the number of qubits and gates. To minimize the required qubits and gates, we use “not less than” such
which required only 3 qubits, 1 Toffoli, and 2 NOT gates as can be seen in Figure 8(b). The output from the quantum counter would be compared with the minsup value. In this case we have 4-bit quantum counter compared with minsup = 2 (0010). Every qubit from the quantum counter output is compared with every bit of the minsup value. For instance, let
quantum counter output and
minsup values, we build the quantum circuit for 4-bit comparator
. In Figure 9 for 4-bit quantum comparator, we rename
to
and
to
. The quantum comparator output is out0.
In Figure 9, the four-bit comparator is compared the minsup value 0010 with the out of quantum counter. Every n-qubit comparator requires 3n + 1 qubits. The n-
qubits are used to store the obtained value from the comparison.
Figure 10 is the complete quantum oracle circuit design for associate rule mining. The oracle circuit contains the circuit for each term in the SOP function expression connected with the quantum counter for each term and the quantum comparator connected with the quantum counter output. The
represent the items and control_counter_0 is the control bit for the quantum
Table 3. Truth Table of one-bit comparator.
Figure 8. One-qubit comparator (a) equal or greater than. (b) Not less than.
Figure 9. Four-bit quantum comparator build using IBM Qiskit simulator.
Figure 10. Full quantum oracle circuit for
.
counter. If the control bit is active (1), then the counter value is incremented by 1. If the control bit is 0, then the counter keeps its value. The number of required qubits for the quantum counter is equal to
, where the number of required qubits for the quantum comparator required is
. Note that if
is integer value, then add 1 to the
value. The SOP function
contains 9 terms that requires
qubits for the quantum counter. The quantum comparator compares the output of the quantum counter 4-qubit with minsup value 4-qubit. The quantum comparator required four additional ancilla qubits for computation and one qubit for the output. In associate rule mining, we need the maximum frequent of k-itemset that is equal to or greater than to minsup = 2. In this case, 4-qubit from the quantum counter compared with 0010. The output is
is equal to 1 when frequent of k-itemset equal to or greater than 2.
Let us observe that the SOP function above is not used as a logical function, but it used as a pattern matching for our problem. This general idea can be used for other problems that require counting matching patterns or counting satisfied constraints. Some constraint satisfaction problems that can be formulated as such the SOP function above can refer to [30] .
Figure 11 is the complete quantum algorithm circuit design for the associate rule mining, which is a modification of Grover’s search algorithm. The Dicke state is used for superposition preparation, oracle, and diffuser to recognize the solution states. We applied this oracle in Grover’s search algorithms for R = 2
Figure 11. Full quantum algorithm circuit design for the associate rule mining.
iterations from this formula:
where M = 3 is the number of
solutions in Figure 2 in the Apriori algorithm, and N = 10 is the number of all search space elements from the Dicke state of
. We measured only
for the items but for verification the measurement can be added to out0 which is equal to 1 only if the k-itemset is greater than or equal to the minsup. As can be seen in Figure 12 the values with high probability are 10,011, 11,001 and 11,010 for
respectively based on this SOP function:
In Figure 13, we generalized our design to handle for associate rule mining to generate the maximum frequent k-itemset. As can be seen from Figure 13, the input of Dicke state is
, where n is the number of items and k is the
Figure 12. Histogram of measured value from Figure 11.
Figure 13. Proposed algorithm design for associate rule mining.
large itemset to check whether is the maximum frequent k-itemset or not. Each term from the SOP function is connected to the first qubit (Control Counter) of the quantum counter. The remaining number of qubits for the quantum counter is equal to
, and the number of qubits of the quantum comparator is equal to
, where T is the number of terms in the SOP function. Note that if the
is integer value, then add 1 to the
value. The
is connected to the diffuser to amplify the solution. If
is equal to one, then the oracle has recognized the solution. Finally, the Dicke state qubits for the items are measured. For checking purposes, the
can be added to the measurement in order to check the solutions with a high probability that
is equal to 1.
Scaling the required number of qubits in design is critical in the current generation of quantum computers because the number of qubits determines the degree of computational complexity. The more qubits a quantum processor possesses, the more complex and valuable the quantum circuits can run [31] . Thus, to scale and optimize the number of qubits that can accommodate in quantum algorithm design is directly reflected on the performance.
We compare the number of qubits that need in our design and the proposed design in [20] [21] . The number of required qubits in [20] [21] is equal to:
(2)
Our proposed design, the number of required qubits is:
(3)
where T is the number of transactions and I is the number of items. For instance, 8 × 5 (8 transactions and 5 items), the number of required qubits in [20] [21] is equal to 81 qubits, while our design requires only 19 qubits.
Figure 14. Histogram between our purposed design and other design in [20] [21] .
As can be seen in the histogram in Figure 14, based on the equations in (2) and (3), for 10 transactions and 5 items, 10 × 5 requires 101 qubits for the design in [20] [21] , while our proposed design just requires 19 qubits. According to the design in [20] [21] , one qubit is needed for every item, as well as one qubit for every transaction and one qubit for the output qubit. A large transaction database typically contains massive transactions and large item sets. As a result, the number of qubits will be unreasonably high, even for large quantum computers. Our proposed quantum architecture employs the Dicke state, which reduces the search space into a sub-search space. In addition, we employ a quantum counter and a quantum comparator that can handle more transactions and items while still performing well with a small number of qubits.
4. Conclusion
We presented a novel quantum design for association rule mining to discover the maximum frequent k-itemset. We converted the transaction database into a new type of SOP Boolean function and then reduced the SOP function into large k-items with only Hamming weight that satisfies the minimum support threshold. In addition, we proposed using the Dicke state to prepare the superposition
states for Grover’s search algorithm instead of the conventional Hadamard operator, which would require 2n states. We reduced the SOP function to large k-items where the likelihood of the maximum frequent k-itemset can be found. Then we design an advanced quantum oracle with a quantum counter and quantum comparator. Using these three different quantum blocks, Dicke state, quantum counter, and quantum comparator, we achieved to reduce search space and the required number of qubits in the design. We have then shown a full implementation of our design on the IBM Qiskit simulator and observed the correct results. We compared our proposal design with [20] [21] design that our design requires fewer qubits. We can solve many transactions and items by scaling and optimizing the search space and required qubits. For future improvement, a fully quantum design for association rule mining can be developed using our proposed design to generate association rules from a maximum frequent k-itemset.