www.fgks.org   »   [go: up one dir, main page]

Academia.eduAcademia.edu
Implementation of a Hamming-Distance-Like Genomic Quantum Classifier Using Inner Products on IBMQX4 and IBMQX16 Kunal Kathuria1,a,* , Aakrosh Ratan2 , Michael McConnell1 , and Stefan Bekiranov1,b,* 1 Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, USA for Public Health Genomics, University of Virginia, Charlottesville, VA, USA * corresponding author a kk7t@virginia.edu b sb3de@virginia.edu arXiv:1907.08267v1 [quant-ph] 18 Jul 2019 2 Center ABSTRACT Motivated by the problem of classifying individuals with a disease versus controls using functional genomic attributes as input, we encode the input as a string of 1s (presence) or 0s (absence) of the genomic attribute across the genome. Blocks of physical regions in the subdivided genome serve as the feature dimensions, which takes full advantage of remaining in the computational basis of a quantum computer. Given that a natural distance between two binary strings is the Hamming distance and that this distance shares properties with an inner product between qubits, we developed two Hamming-distance-like classifiers which apply two different kinds of inner products ("active" and "symmetric") to directly and efficiently classify a test input into either of two training classes. To account for multiple disease and control samples, each train class can be composed of an arbitrary number of bit strings (i.e., number of samples) that can be compressed into one input string per class. Thus, our circuits require the same number of qubits regardless of the number of training samples. These algorithms, which implement a training bisection decision plane, were simulated and implemented on IBMQX4 and IBMQX16. The latter allowed for encoding of 64 training features across the genome for 2 (disease and control) training classes. Introduction Quantum computing algorithms have been developed that show great promise of making potentially significant improvements upon existing classical equivalents, particularly in the area of machine learning. Exponential speedups have been shown for implementing least squares fitting1 , quantum Boltzmann machines2, 3 , quantum principal components analysis4 and quantum support vector machines5 on a quantum computer, over their classical counterparts6 . Quadratic speedups have been demonstrated for Bayesian inference7, 8 , online perceptron9 , classical Boltzmann machines10 and quantum reinforcement learning6, 11 . However, these speedups presume a low-error-rate, universal, quantum computer with hundreds to thousands of qubits. In addition, the speedup of a subset of these algorithms (e.g. quantum support vector machines) requires quantum RAM which would enable a quantum coherent mapping of a classical vector into a quantum state12 , but this does not currently exist. Recent progress has been made in exploiting a relatively natural connection between kernel-based classification and quantum computing13–15 . One set of approaches maps a large feature space to a quantum state space in order to compute a kernel function on a quantum computer which is then used to optimally classify input data using, for example, a support vector machine (SVM)14, 15 . In another set of approaches, variational quantum circuits are used to directly classify the data in the large quantum feature space on a quantum computer, similar to the approach employed by an SVM14, 15 . An even more direct and simple implementation of kernel-based classification on quantum computers exploit quantum interference of the training and test vectors which when properly prepared as quantum states execute distance-based classification upon measurement13, 16 . An advantage of these kernel-based approaches is that they can and have been implemented on existing quantum computers. However, in the case of the quantum interference circuit13 the distance measure was Euclidean-based and only two-training vectors were used to build the classifier. Motivated by the problem of classifying individuals with a disease (e.g., Alzheimer’s disease) given single cell neuronal genomic copy number variation (CNV) data17–20 , we developed a set of quantum classifier circuits which exploit a biologically relevant encoding of the training and test vectors that allow us to take full advantage of the computational basis of the quantum computer. Specifically, the training and test vectors encode the presence and absence of a CNV in a given genomic window with a 1 and 0, respectively, in non-overlapping windows across the human genome (i.e. they can be represented as binary strings). These genomic windows, which are nothing but ordered physical subdivisions of the genome, act as our feature dimensions (more precisely, the CNVs in respective genomic regions constitute the different features). The classification is efficient as it is based directly on the highest inner product between the test and training vector group of each class. This is possible because the class label qubit (which is eventually measured) is directly entangled with the training vectors, rendering the usual ancilla qubit unnecessary. Furthermore, we take advantage of many of the shared properties between a natural distance measure between binary strings, the Hamming distance, and the inner product between vectors composed of many binary strings to arrive at the simplest implementation of a near-term quantum disease-control classifier using genomic input data. The classifier is broad in its scope and can be applied to any situation amenable to using inner products in a feature space composed of divisions of physical space, time etc. or of any correspondingly comparable attributes as discussed in the following sections. Our circuits are employed to execute binary (disease/normal) classification of genomic samples on 5-qubit (IBMQX4) and 14-qubit (IBMQX16) architectures using two different inner-product metrics. Genomics is an exciting rapidly advancing field that is much in need of effective machine-learning solutions to keep up with fast-growing technological advancements in sequencing, the large amount of data generated in each sequence and the vast array of different relevant data types. Development of these and other near-term quantum algorithms will enable quantum computation to solve demanding, data-driven problems in genomics as we approach the development of powerful, low-error-rate, universal quantum computers. Results and Discussion If one is interested in quantifying how well two arbitrary binary strings (of equal length n) match with each other, the most natural way would be to calculate their Hamming distance. The Hamming distance is simply the sum of the positional mismatches of the two bit strings. Thus, the Hamming distance of identical strings is 0 and that of two strings that are binary complements is n. The Hamming distance has broad utility in solving classification problems that can use binary or binarized inputs representing a more complex event. Our classifier is similar in spirit to the Hamming distance as it is based on the inner product of bit strings. In fact, it is exactly equivalent to a Hamming-distance-based classifier provided some caveats to the input data (see Methods for proof). As a simple warm-up example without any technical implementation details, we examine how our inner-product-based classifier can be applied to a schedule-matching problem. There are certain time blocks in the day where individuals are either available or not available, and the goal is to determine which individual (or set of individuals) a given "test" individual’s schedule matches best (so they can find the most common time among all individual pairs to work together on a new issue, for example). Each time block is represented uniquely by a "block state" and a coefficient/prefactor attached to each such state represents a person’s availability during that time of day. For example, if we had 8 time blocks, we would represent them with 3 qubits, where the block state |000i would represent the first of these 8 blocks and the block state |111i would represent the last block. Thus the time blocks constitute the feature space of the problem. In the simplest case, the state coefficients would be binary themselves (1 indicating the person’s availability and 0 indicating non-availability). The prefactored block states for each individual are summed to yield that individual’s total state vector. An inner product between two individuals’ state vectors would then yield a score quantifying how well the two individual schedules agree. A class is defined generally as a collection of state vectors (henceforth referred to as "training vectors") satisfying a unique grouping property or pattern. The training vectors of each class are summed to yield its class vector. The classifier simultaneously executes an inner product between the test input and each class vector, which is equal to the sum of the inner products between the test vector and the class’s training vectors. Based on measurement probabilities, the classifier then classifies the test input into the training class whose class vector yields the highest inner product with the test. In general the class vector is allowed to be a vector sum of many individuals’ state vectors that belong to the same class (and is prepared based on a classically precomputed sum for the actual circuit). Let us suppose for this example that there were 3 schedules in all, 2 training schedules represented by vectors A and B each playing the role of a class vector for its respective class, and 1 test input represented by vector T . Suppose that there are 3 time blocks and A has an opening only in the first time block, B only in the second and T in both the second and the third. Thus A = (1,0,0), B=(0,1,0) and T = (0, 1, 1). We immediately see that T · B > T · A and therefore the test vector is classified into the class of class vector B. This classification is exactly implemented by our inner product classifier. This problem is of course just one example of a broad range of possible applications for this classifier. The specific motivation for the development of our classifier was a DNA copy-number-variation(CNV)-based disease classification problem in genomics. The human genome (which is a collection of chromosomes) can be subdivided into smaller regional blocks in chromosomal coordinate space. This is done effectively in the reference genome, which is a consensus of a relatively small number of individual’s genomes and is represented as one large sequence of "A," "T," "C" and/or "G" bases 2/24 strung together in a line. Each coordinate block of this reference genome is marked for the presence or absence of a CNV as found in a sequenced individual sample genome (any given sample genome itself is not as well-studied as the reference and does not have a robust coordinate representation by itself, and hence is "mapped" to the reference coordinate space. For technical details, see for example ref.21 .). A CNV in a given regional block indicates a deviation from the number of times (referred to as "copy number") the genomic string/sequence of that region in the reference genome is expected to occur in the sample genome. The default expected copy-number of any region is 2 (there are two copies of each chromosome inherited by every individual from both parents). Copy-number variations are associated with a variety of phenotypes and can be strongly correlated with various diseases22 . We looked at CNVs in neuronal cell samples from healthy individuals as well as those affected by Alzheimer’s Disease (AD) and developed this classifier as an attempt to classify genomes as healthy or containing AD. This is in fact a very natural specific application for our generic classifier in the regional/spatial domain. Thus the genome is divided into regional blocks (see Fig. 1) and each block state represents exactly one such region in order. Similar to above, if we subdivided the reference into 64 genomic regions, we would represent them with 6 qubits where a block state of |000000i would represent the 1st region and the block state |111111i would represent the 64th region. Thus, we will identify our ordered n-qubit computational "basis vectors" (or block states) with basis vectors in feature space in order to encode training and test vectors (referred to as "sample vectors") in the usual fashion13 . If there is a copy-number-variation in the ith genomic region of a sample, the sample vector has a projection of 1 onto the ith feature dimension, and if a CNV is absent there, its projection onto the same dimension is 0. As an example, if we had a sample with 4 genomic regions such that the first three contained a CNV and the last one did not, the sample vector in the feature basis would be S = (1, 1, 1, 0). The Classification Metrics We employ two classification metrics: the active inner product (AIP) and the symmetric inner product (SIP). First we will list some common features shared by the two metrics. Both metrics have the special merit of being able to handle an arbitrary number of training samples/inputs in the summed class-vector form. This is because both are linear: the sum of the inner products between the test vector and each training vector is the same as the inner product between the test vector and the class vector (see Methods for proof). This means that in one mathematical operation arbitrarily many inner products can be calculated between the test and training vectors and summed together for each class. This is the implementational advantage of our classifier over a Hamming-distance- based classifier, as the Hamming distance is not a linear measure for multiple bit strings in the sense above and would not allow for simultaneous calculation of arbitrarily many mutual distances. Further, for both the inner products, the sample vectors are easily represented in the computational basis using only unitary (or zero-valued) prefactors. As a note on efficiency, the swap-test applied to n-qubit states can simultaneously evaluate the inner product of two 2n -dimensional vectors in the feature basis, which is the formulation we use, without the need to store each vector component separately. This has the potential to become much faster than the classical inner product, which in the worst case will compute 2n vector-component products and add them together. Finally, in our framework, there is no theoretical limitation as to the number of dimensions/regions encoded in feature space or the kind of binary-level/CNV-level training data that can be encoded. The difference between the two classification metrics lies in the initial state preparation and not in the inner product implementation itself. The classes of the training data represent in our case disease and normal samples. If the data is separable in some sense, the training vectors for the disease class will have a different regional block pattern for CNV presence as compared to those of the normal class. Given a test sample and sufficient separable training data, both the classifiers would classify the test sample into the appropriate category, making disease diagnosis of unknown random samples thus possible. As a technical note, the CNV-level data at this initial stage of the classifier does not allow for distinctions between genomic deletions and duplications. Metric 1: Active Inner Product The active inner product (AIP) is defined as the total number of times the same region in the test and training vectors for a given class contains a CNV. It is simply the dot product between the sample and class vector in the feature basis. For example, if one of our class vectors was C = (5, 4, 3, 2) and the test vector was T = (0, 1, 1, 0) in the feature basis, the AIP between them would be 7 (for simplicity, we are ignoring normalization). The test sample is then classified into the class yielding the higher inner product with this test vector. The sample vectors are encoded in the computational basis just as they are in the feature basis, without any change in their respective components. Thus, for each sample vector, all block states that correspond to genomic regions containing a CNV are assigned a coefficient of 1 whereas regions not containing a CNV are assigned a coefficient of 0. These block states are summed in the usual way to produce the state vector for each sample in the computational basis. For example, if there were a single sample divided into four genomic regions and only the first 3 had a CNV each, its normalized 3/24 a b c Figure 1. Examples of feature-space vectors of genomic samples containing different CNV patterns. Each column in the genome represents a feature dimension or physical genomic region. "1" indicates the presence of a CNV in a given region, and "0" indicates its absence. a shows a case where there are two genomic regions (setup of 5-qubit Example Problem 1), and b and c show 64 genomic regions (setup of 14-qubit Example Problem 1 and 2 respectively). Ellipsis points in a dimensional label imply that the vector has the same CNV value for all the unindicated dimensions. Unnormalized wavefunctions for disease and normal class vectors, as well as for the test vector, are written with subscripts "D," "N" and "T" respectively. The unnormalized wavefunctions simply show summed block states in the computational basis pertaining to the respective vector qubits only (not class, swapper etc.). They are all shown in the AIP framework for simplicity, though the situation depicted in (c) is solved in the manuscript using the SIP framework. 4/24 state vector |ψi would be: 1 |ψi = √ (|00i + |01i + |10i) 3 (1) where the state |11i is not present due to a coefficient of 0. And if there were a class with two such identical samples, its class vector under the AIP framework in the computational basis would read (using ψ as a generic placeholder again): 1 |ψi = √ (2 |00i + 2 |01i + 2 |10i) 12 (2) The inner product is calculated between the test vector and the class vector (the vector sum of the training vectors in each class) simultaneously for the two training classes classes by overlapping their states via the swap test23 . The test sample is classified now into its rightful class (implementation details shown shortly). We note that AIP gives preference to "1-matches" over "0-matches." Though both AIP and SIP (described next) are well suited for our genomic problem depending upon the context, AIP is the naturally applicable metric for the schedule-matching problem. In fact, it happens to be even better suited than the Hamming distance for that problems of that nature. This is because in such cases one is interested in how well people’s availabilities match, which the AIP calculates by summing the total number of 1-matches without any regard for the non-availabilities or 0-matches (unlike Hamming distance). For our genomic problem, though the Hamming distance is the natural classifier of choice, AIP may be better suited for samples or diseases where the existence of a CNV in the same region in different samples is considered more significant than the absence of a CNV. However, even if this CNV-bias were absent, the data that we deal with in such contexts is typically in a form such that AIP turns out to be equivalent to the Hamming distance as a classification metric as previously mentioned (see Methods). Metric 2: Symmetric Inner Product The symmetric inner product (SIP) is defined as the total number of times the same region in the test and training vectors for a given class "match" in terms of CNV presence, minus the number of times they do not match. "Matching" refers to two vectors both having a CNV or not having a CNV in the same region. In the SIP framework, the sample vectors are represented a bit differently in the computational basis as compared to the feature basis. For each sample vector, all block states that correspond to genomic regions containing a CNV are assigned a coefficient of 1 whereas regions not containing a CNV are assigned a coefficient of −1. For example, the sample vector in (1) would be represented by: |ψi = 1 (|00i + |01i + |10i − |11i) 2 (3) The rest of the routine proceeds identically to the AIP routine. The reason for the coefficient −1 in the SIP is that it introduces a penalty for unlike CNV events. Two non-CNV regions or two CNV regions will both contribute +1 to the inner product, but one CNV and one non-CNV region will contribute −1 to the inner product to account for mismatches in regional CNV events. SIP is exactly equivalent to Hamming distance as a classification measure (see Methods) and is thus very well suited to CNV-based genome classification (and many other similar problems). The Inner-Product Decision Plane The decision plane for both the active and the symmetric inner product is the same. As shown in Fig. 2 for 2 feature dimensions, the decision plane is the bisector plane of the two class vectors, as the test vector is classified along with the class vector with which it yields a higher dot product. This of course means that its projection onto that vector is larger than its projection onto the other class vector. For a higher number of feature dimensions, the decision plane is the bisecting hyperplane that is orthogonal to the plane in which the class vectors lie. One advantage of this formulation is that one effectively compares the test vector with arbitrarily many training vectors for each class in one mathematical operation, as these training vectors are 5/24 Figure 2. Inner Product Decision Plane in 2 feature dimensions. There are two classes represented in red and green respectively. The class vector for each class is the sum of the class’s respective training vectors, shown with dotted lines. The decision plane, shown in blue, bisects the two class vectors. The test vector will of course be classified according to the side of the decision plane in which it is located. For simplicity, only positive feature coordinates are shown. summed together into one final class vector for each class. This is possible due to the linearity of the inner product metrics we employ (more in following section). As an aside, for multiple class vectors, the decision boundary is not a simple hyperplane and is elaborated upon in the Supplementary Notes. The Inner Product Circuit The state manipulation routines and circuits can be formulated as generic functions of data parameters (see Methods), but we utilize circuit optimizations as suitable to the form of input data in any given problem. In terms of future vision, when the equivalent stage of "quantum hardware assembly" arrives, optimizations pertaining to specific kinds of data operations would hopefully be effectively applied in some shape in the design of the quantum processor. We will now look at the different general components/stages of the inner product circuit. In the state equations of this section, we note that the order of the states on the R.H.S. corresponds to the physical order of the qubits that the states occupy on the real hardware (this will be relevant for upcoming swap operations as part of the inner product calculation). In addition, the subscript following a state denotes the qubits’ functional role (as opposed to physical qubit position above) and is written in order of the qubits that compose the state vector. The class vector qubit(s) is (are) labelled by the subscript "d," the class index qubit by "m," the test vector qubit(s) by "t" and the swapper qubit by "s." For example, the two-qubit state |00imd denotes that the "left" qubit in the state serves to encode the class index and the "right" one serves to encode the class vector. Further, when |ψi is used with a subscript, it represents the state of the circuit when only the qubits referred to by the subscript labels are considered (e.g. |ψimd ). When |ψi is used with a superscript (0, f or c), it represents the temporal state of all combined circuit elements ("initial","final" or 6/24 "current" in the given context respectively). We now write the generic initial state for all inner-product circuits. The initial state is given by: M 1 ψ 0 = |sis |tit ∑ |iim |di id √ M i=1 (4) where |si is the state of the swapper qubit, |ti is the state of the test vector, |ii indexes the computational basis state representing the ith class, |di i is the normalized class vector state for the ith class, M is the total number of classes and √1M is an overall normalization constant stemming from the number of classes. The "s" etc. labels used both in the subscript and state label itself are not redundant as they serve different purposes and are retained for clarity. Now, |di i can be specifically written in the feature basis as |di i = ∑Ff=1 cif | f i, where f represents a feature dimension, cif ’s are the projections of the class vector onto the f th feature dimension, F is the total number of feature dimensions and | f i is a placeholder for the computational basis-vector state for the f th feature dimension. Analogously, |ti = ∑Ff=1 ctf | f i. Of course, log2 F qubits are required in general to index F feature dimensions. A key point in the circuit is that the usual role of the ancilla qubit (see for example13 ) is subsumed in the class index qubit (we only need 1 qubit to index 2 input classes), which is now directly entangled with the summed training vectors via the sum in (4). All circuits proceed in the same general way through the following stages: 1. The feature-space data is encoded in the circuit. 2. The well-known swap test is applied to calculate the inner product between the test and entangled class vectors. 3. The probabilities encoding the final inner product result are measured in the computational basis. As mentioned, the data-encoding may vary from circuit to circuit and will be specifically presented with the example problems. The Fredkin (controlled swap) gate is used in the swap test as presented in ref.24 . The swapper qubit s is acted upon by a Hadamard gate, used as a control qubit to swap |tit and ∑M i=1 |di id via the Fredkin gate, and again acted upon by a Hadamard gate. This effectively calculates the inner product between the states contained in the qubits it swaps (in our case the test vector and the class vectors). For clarity, the exact inner product between the test and the entangled class vectors is given by: M ht|φ i = ∑ F 1 ∑ ctf cif |iim √M (5) i=1 f =1 √1 where |φ imd ≡ ∑M i=1 |iim |di id M and the overlap above is understood to be between the test state and the class vector state, i.e. "between" indices "t" and "d." We will soon see how this closely matches the monotonically equivalent form of the inner product the circuit calculates. The final state vector after the swap test reads (please note the ordering of states): ψf = M M M M 1 1 1 1 1 1 |0is (|tit ∑ |iim |di id · √ + ∑ |iim |di id |tit · √ ) + |1is (|tit ∑ |iim |di id · √ − ∑ |iim |di id |tit · √ ) (6) 2 2 M M M M i=1 i=1 i=1 i=1 The interchange of the order of the test and class-training states in each term in parentheses signifies the swapping of the qubits holding these state vectors (see examples below for more details). Repeated measurements of the swapper and class index qubits are made to arrive at the measurement probability encoding the solution to our problem. To see how this is so, let us calculate some relevant theoretical measurement probabilities. Define ρsk as the probability of measuring the swapper qubit in state |s = si and the class index qubit in state |i = ki. For reasons pertaining to better statistical fidelity observed when compared to using s = 0, we will focus on the measurement probabilities ρ00 and ρ11 . For detailed analysis of error modes 7/24 associated with IBM processors, see for example ref.25 . First, the projection operator projecting the above state onto states corresponding to fixed values of the swapper (s = 1) and class index (i = k), the |s = 1is |i = kim basis, yields: P(|s = 1is , |i = kim ) ψ f M M 1 = √ |1is |tit ∑ δik |iim |di id − ∑ δik |iim |di id |tit 2 M i=1 i=1   1 = √ |1is |tit |kim |dk id − |kim |dk id |tit 2 M ! (7) where the Kronecker delta function δik serves to pick out terms corresponding to class k. The squared norm of the above is     1 h1| (ht|t hk|m hdk |d − hk|m . hdk |d ht|t |1is (|tit |kim |dk id − |kim . |dk id |tit ρ1k = 4M s   1 2 1 − |Ak | = 2M (8) where Ak ≡ ht|dk i is exactly equal to the inner product between the test vector and the class vector of class k. Thus, the directly measurable probability ρ1k is the negative of the inner product added to a constant, always a positive number (as is easily verified in (8)) and a monotonically decreasing function of the true inner product of normalized states. All the circuits thus end with a measurement of the probabilities ρ10 and ρ11 and our classifier choosing the class k that yields the lower measured probability and thus the higher inner product with the test vector. In terms of data-encoding, though specific routines for encoding data will be employed in our circuits, the kind of data that can be entered in principle is not limited (up to normalization constraints mentioned in Methods). The general recipe for encoding arbitrary data involves solving a system of non-linear equations. However, circuits encoding non-trivial data may have extremely large gate depth in large part due to the limitations of CNOT connectivity in currently available quantum processors. Therefore, for relevance and clarity, we will present problems involving simplified data that can be input feasibly into IBM machines. These solutions themselves require non-trivial gate-depth at times as they need several swap operations. We now present our circuits with their respective example problems. We will first describe the 5-qubit circuit and then the 14-qubit version. For each, problems are solved using the simulator as well as the real processor using the AIP and/or SIP framework as applicable. 1 For each case, we will first introduce the generic circuit and state notation, and then present specific example problems. Five-qubit Generic Circuit AIP problems involving an artificial genome containing two regional blocks can be solved with the 5-qubit circuit. The SIP framework does not have much relevance or power here due to the small feature space. Only 1 qubit was used to encode the class vector ("d"), which spans the two genomic regions. The other 3 qubits were taken by the class index ("m"), test vector ("t"), which also resides in 2-dimensional genomic feature space, and the swapper qubit ("s"). As we have 1 qubit to index the class, |0im is the first computational basis state representing class 0 and |1im the second representing class 1. This class index qubit is first entangled with the training qubit using a "rotation routine" so as to separate the training vectors for each class. As an example, the state |ψimd = |00imd + |11imd represents the simple case that in class 0 (disease class) there is a training vector with a CNV only in the first of two regions (block state |0id ) and in class 1 (normal class) there is a CNV only in the second region given by block state |1id . (using the qubit labelling convention described previously). 1 Problems that are solved by the simulator are also solvable by the real processors, which present additional architectural constraints, and we present such solution counterparts for most of the cases. 8/24 Example Problem 1: AIP on 2-Block Genome We will solve a very simple problem with the 5-qubit circuit now in the AIP framework. In this example the normal class vector (class 1) has one training vector that contains a CNV in each of the 2 total genomic regions. The disease class vector (class 0) has only 1 CNV, present in the second region. The test vector also has only 1 CNV, present in the first region (see Fig. 1(a)). Thus, the test sample should be classified as normal. All qubits start in the ground state |0i. To encode this data, a specific rotation routine is defined similar to other works13 which allows one to prepare a broad class of entangled two-qubit states involving the class and training qubits. In these states, the disease class vector always has one regional block that contains a CNV while the other does not, and the normal class vector contains CNVs in both regions in any desired mutual ratio, Rc . Concretely, Rc is the ratio of the number of CNVs present in the normal class vector in the first region to the number present in the second region (in this case, Rc , f ornoreasonotherthansimplicity, waschosentobe1). Our routine is applied to the initial state ψ 0 md = |00imd to render it into the generic state ψ f md = |01imd + sin θ |10imd + cos θ |11imd after employing six 1 qubit gates (as shown in module labelled "Initial State Preparation" in Fig. 3(a)). This signifies that class 0 has a training vector that lies wholly in the second genomic region whereas class 1 has a training vector with sin θ CNVs in the first region and cos θ CNVs in the second region, up to a normalization scaling factor (see Methods). The "rotation angle" θ can thus simply be written as θ = tan-1 Rc . The absence of a CNV in the first region of the disease class vector implies that the block state |00imd does not feature above. Setting Rc = 1 yields the post-rotation-routine state for this circuit: |ψ c imd = √12 |01imd + 21 (|10imd +|11imd ). At this point, the qubit states for the other qubits are: |ti = |0it (corresponding to the desired test sample vector) and |si = |0is , yielding the state: |ψ c i = |0is |0it   1 1 1 √ |0im |1id + √ |1im √ (|0id + |1id ) 2 2 2 (9) The second stage of the circuit is the swap test now effected on the training and test qubits with the swapper qubit as control, which encapsulates the inner product between the class vector and test vector qubit states (see Fig. 3). This inner product value is given by the measurement probabilities of the state of the "s" and "m" bits as measured in the computational basis, which M and ρ M are measured and plotted on a histogram leads us to the last phase of the computation. 8192 "shots" are taken, and ρ10 11 (the superscript "M" denotes a measured probability as opposed to a theoretical one). As a prior, the theoretical values for the inner product are set by (8), yielding ρ10 = 41 and ρ11 = 18 , showing that the test vector yields a higher inner product with class 1 and should be classified as such. Problem 5-qubit Example 1 14-qubit Example 1 14-qubit Example 2 Theory 1 2 2 0 Simulation 0.53 1.9 0.0 Table 1. The ratio Real .77 .26 n/a ρ11 ρ10 This problem is solved first on the IBM simulator on a version of the circuit not limited by architectural constraints, then on the simulator using a circuit appropriate for the IBMQX4 architecture, and finally using the same circuit on the IBMQX4 processor itself. The latter circuit along with the histogram of measured probabilities for the simulation and the real execution are shown in Fig. 3. The unconstrained simulation circuit is shown in Supplementary Figure 2. The real and simulated circuits differ only in the order of the qubits used to encode the data, and in the swap operations required to satisfy the constraints of the real processor. In Table 1 we compare probability values from the theoretical calculation, "measurements" from the architecturally-constrained simulation and measurements from the real execution for the various (5-qubit and 14-qubit) circuit 12.78 = 0.53 while IBMQX4 in Fig. 3(c) gives ρρ11 examples. The simulation Fig. 3(b) yields ρρ11 = 24.16 = 19.14 24.7 = 0.77. As 10 10 expected, the simulation probability is almost identical with the theoretical one for this example and classifies the test sample correctly as normal. The real measured probability, though 45% away from the expected value, classifies the sample correctly. 9/24 a b c Figure 3. 5-qubit Example Problem 1 as drawn on IBM Composer. We show (a) the circuit, (b) measurement probabilities on simulator and (c) measurement probabilities on IBMQX4. In (a), sequential circuit modules are boxed and labelled. The swap gate, which is a necessity for IBMQX4 CNOT-connectivity constraints, swaps the qubits for the class index and swapper qubits during execution; the "IR" box contains gates that do not alter the measurement probabilities as they are placed after measurement, but are phase-altering components from the Fredkin gate retained for completeness;"CI" stands for "class index". In (b) and (c), the measured values of only the swapper and class index qubit states are shown for clarity in that order. 10/24 Fourteen-qubit Generic Circuit With the 14-qubit circuit, we can solve problems involving a simple genome containing up to 26 = 64 regional blocks (with some optimizations this can be extended if there are any CNV-absent regions – see Zero-coefficient Exclusion in Methods). Using the same notation scheme as for the 5-qubit circuit, 1 qubit was used to encode the class index, 6 qubits were used for the class vectors (labelled individually by "d1,...,d6" in decreasing order of bit place-value, or simply by "d" collectively), 6 qubits for the test vector ("t1,...,t6" individually or "t" collectively) and 1 for the swapper qubit ("s"). Example Problem 1: AIP on 64-Block Genome For this problem a single CNV is present in each of the first 32 regions (1 − 32) only in the class vector for class 0, and each of the last 32 regions (33 − 64) only in the class vector for class 1. The test vector represents a test sample that contains a CNV in each of regions 1 − 16 only (see Fig. 1(b)). As a practical matter, the class vectors of this (and the following) example would be composed of multiple sample vectors to have this particular form in our context. The test sample here should be classified into class 0 as a disease sample. As previously, |0im is the first computational basis state representing class 0 and |1im the second representing class 1. As a note, in the simulated version of the problem, the first qubit (q0) is used for class index, q1 − q6 for class vectors, q7 − q12 for the test vector (both in decreasing order of place value of bits), and q13 for the swapper qubit. This changes in the real-processor version as shown in Supplementary Figure 1 for ease of required swap operations. The class index qubit is first entangled with the 6 training qubits as follows (see also Fig. 4). A Hadamard gate is applied to m, followed by a CNOT gate with m in control and d1 as target. This puts the first 7 qubits in the following superposition, effectively entangling the class index qubit with the 6 training qubits: 1 |ψimd = √ (|0im |000000id + |1im |1000000id ) 2 (10) To encode our data optimally, a Hadamard gate is applied to d2 − d6 each and not to d1. This puts |ψimd in the state: |ψimd =  1 √ 2 6  ⊗5 |0im |0id1 (|0i + |1i)⊗5 d2−d6 + |1im |1id1 (|0i + |1i)d2−d6  (11) where the subscripted tensor product (|0i + |1i)⊗5 d2−d6 ≡ (|0i + |1i)d2 (|0i + |1i)d3 (|0i + |1i)d4 (|0i + |1i)d5 (|0i + |1i)d6 etc. This state exactly represents the CNV configuration in the problem definition per the AIP framework (see Fig. 1). At this point, the qubit states for the other qubits are: |ti = |000000it and |si = |0is . Now, using the same technique as above to achieve the desired test vector, a Hadamard gate is applied to t3 − t6 each, yielding the pre-swap-test state: |ψ c i =  1 √ 2 10   ⊗4 ⊗5 |0is |00it1−t2 (|0i + |1i)t3−t6 |1i) |1i |1i |0im |0id1 (|0i + |1i)⊗5 (|0i + + m d1 d2−d6 d2−d6 (12) The swap test is executed as usual to evaluate the inner product via measurement probabilities of the states of the "s" and "m" M and ρ M are measured and plotted on histogram. In this case, bits in the computational basis. 8192 "shots" are taken, and ρ10 11 1 the theoretical measurement probabilities from (8) are ρ10 = 8 and ρ11 = 41 , showing that the test vector yields a higher inner product with class 1 and should be classified as a disease sample. As before, the problem is first straightforwardly solved on the simulator, then on the simulator using an architecturally constrained circuit and finally on the IBMQX16 processor. The measured results along with the circuit are shown in Fig. 4. Again, in Table 1 the theoretical probabilities, the architecturally simulated probabilities and the real measured probabilities can be seen. Both the simulated probabilities are very close to their respective theoretical values as expected. However, due to current limitations in hardware, measurements from the real processor yielded a value of .26 that did not show correlation with the theoretical value of 2 and did not classify the sample correctly. As a last note, one can see in the circuit that the swap test was only applied to some of the test and class vector qubits and not to all. This is an optimization specific to this circuit as mentioned in "Inner Product Circuits" employing the principle of 11/24 a b c Figure 4. 14-qubit Example Problem 1 as drawn on IBM Composer. We show (a) the simulator circuit, (b) measurement probabilities on simulator and (c) measurement probabilities on IBMQX16. The circuit adapted to satisfy IBMQX16 connectivity constraints is too cumbersome to show here and is shown in Supplementary Figure 1. The measurement probabilities shown pertain the adapted version. "CV" refers to "class vector" and "TV" to "test vector." In the histograms, only the measured values of the swapper and class index qubit states are shown for clarity in that order. 12/24 summing coefficient products of only like-valued qubits to calculate the inner product (please refer to Optimization Techniques in Methods). Example Problem 2: SIP on 64-Block Genome As shown in Fig. 1(c), for this problem a single CNV is present in each of the 64 regions in the class vector for class 0, and each of the last 32 regions (33 − 64) only in the class vector for class 1. The test vector represents a test sample that contains a CNV in each of regions 1 − 32. Thus, the test sample should be classified into class 0. As mentioned in Methods, one limitation of SIP is that one has to know a priori whether there are more total matches or total mismatches expected between the test vector and class vector in general, as only the square of the SIP, i.e., square of the difference between matches and mismatches, is actually measurable. This is no problem for the data we used to guide the development of our classification metrics17 (pioneering single-cell whole genome sequencing data for the context of our quantum classifier attempting to classify neuronal disease based on CNVs in samples, hereupon referred to as "contextual data"), where the number of matches always exceeds the number of mismatches (see SIP in Methods). This example is fabricated differently from the actual data to highlight certain features of the circuit 2 and here we will posit that the number of matches is not expected to exceed the number of mismatches, as shown in the genomic setup in Fig. 1(c). This means that the measured inner product between test and class-1 vector (stemming from an SIP of absolute value 64, due to 0 matches and 64 mismatches) is expected to be higher than the inner product between test and class-0 vector (stemming from an SIP of 0, due to 32 matches and 32 mismatches). In the same way as in the above example (see Fig. 5), we create the state: 1 |ψimd = √ (|0im |000000id + |1im |1000000id ) 2 and apply Ry c |ψ i = π 2   (13)  to d1, and Ry − π2 to t1, to arrive at the full state: 1 √ 2 10   ⊗5 ⊗5 |0is (|0i − |1i)t1 (|0i + |1i)t2−t6 |0i) |1i) |1i |0im (|0i + |1i)⊗6 (|1i − (|0i + + d1 m d2−d6 . d1−d6 (14) (15) The swap test was applied and the simulated probabilities measured as usual. We compare them with their theoretical counterparts in Table 1 and see that they are extremely close as expected. The test sample was classified into class 0, which yielded the lower inner product, due to the precondition indicating fewer mismatches with class 0. This example was not run on the real processor simply because, given current hardware limitations, it would add no value to our results. We have already demonstrated how a very similar design can be mounted on IBMQX16. Methods Hamming Distance and Inner Product Equivalence All individual sample vectors (training or test) have binary-valued components (indicating in our context the presence or absence of a CNV in the regional dimensions of feature space). For what follows, we define the "bit-string-equivalent" (BSE) for a binary-valued sample vector such that the vector’s ith component is simply treated as the value of the ith bit of its BSE. Then, the Hamming distance between two binary-valued sample vectors in feature space is naturally defined to be the Hamming distance between their respective BSEs. We will work out below the equivalence between the Hamming distance and the SIP/AIP of two sample vectors in the sense of their being equivalent classification measures. We will also show how SIP and AIP are linear in the sample vectors while the Hamming distance is not. 2 The data motivated the development of the classifier but example problems showing specific circuit functionality can certainly be set up with differentlooking genomes 13/24 a b Figure 5. 14-qubit Example Problem 2 as drawn on IBM Composer. We show (a) the circuit and (b) measurement probabilities on simulator. "CV" refers to "class vector" and "TV" to "test vector." In the histograms, only the measured values of the swapper and class index qubit states are shown for clarity in that order. 14/24 SIP We will first prove the linear behavior of SIP in terms of sample vectors and show how it is implemented as a veritable classifier in a quantum machine. Then, the proof of its equivalence with Hamming distance will follow. An n-dimensional class vector in feature space takes the general form C = N1C ∑Ff=1 c f ~f , where ~f is the indexed standard basis vector in F-dimensional feature space, c f ’s are un-normalized vector components or regional prefactors of C, and NC is an overall normalization constant. Suppose C is composed of an arbitrary number of training vectors, which in analogous notation take the form D = N1D ∑Ff=1 aDf ~f , where we note that D serves as a training index and aDf is a binary regional prefactor in feature space for the Dth training vector. Suppose also that we have a test sample given by the vector B = N1B ∑Ff=1 b f ~f with notation exactly analogous to D. The Hamming distance between D and B is now by definition: F H(B, D) = ∑ (1 − δaDf b f ) (16) f =1 where δaD b f is the standard Kronecker delta function applied to aDf and b f . We are interested in minimizing the Hamming f distance between sample vectors for optimal classification, which is equivalent to maximizing the quantity −H. For clarity, we will focus on one training class only for this proof and generalize at the end. Let us first define S11 (B, D) as the total number of times corresponding vector components/prefactors of D and the test vector B are both 1, i.e. F S11 (B, D) = ∑ δaDf 1 δb f 1 (17) f =1 Similarly, we define S00 (B, D) as the total number of times corresponding components of A and B are both 0, i.e. S00 (B, D) = ∑Ff=1 δaDf 0 δb f 0 . We also define the "1 − 0 mismatches" in the self-evident way following above: S01 (B, D) = ∑Ff=1 δaDf 0 δb f 1 and S10 (B, D) = ∑Ff=1 δaD 1 δb f 0 . f Let us now define the quantity S(B, D) as the total number of corresponding prefactor-matches minus the total number of prefactor-mismatches between the Dth training vector and test vector B, i.e. S(B, D) = S11 (B, D) + S00 (B, D) − S10 (B, D) − S01 (B, D). We note that SIP was defined as the natural extension of S, as the total number of prefactor matches in corresponding regions between the test vector and all the training vectors in the class minus the total number of mismatches. Define σ11 (B,C) = ∑W D=1 S11 (B, D) etc. Thus, the SIP between test B and class C is written as: σ (B,C) = σ11 (B,C) + σ00 (B,C) − σ10 (B,C) − σ01 (B,C) (18) W = ∑ S(B, D) (19) D=1 where W is the total number of training vectors in the class. In the computational basis, σ is a natural linear extension of S (i.e. the sum of the SIPs of multiple sample vectors is equivalent to the SIP of the sum of the vectors as far as successful classification is concerned.). Let us see how this condition of SIP linearity is achieved and utilized in the computational basis, with state normalization, for the same above-given sample vectors. We will use different notation for some of the state vectors in this section compared to previous sections for ease of proof. In moving from feature basis to computational basis now, we will use corresponding Greek notation for state vectors where necessary. We recall that in the SIP framework, for the state vector of any training or test sample, the coefficient for the f th computational-basis vector can only be 1 or −1 (pertaining to the case where there is a CNV present in the region and the case where there is not, respectively). So we start with defining the training vector (we will use |di to denote the training vector and |wi to denote the class vector in this section to make the distinction): 15/24  1  |di = ηd F ∑ f =1|α df =1 α df | f i + F ∑ f =1|α df =−1  α df | f i (20) where | f i’s are computational-basis vectors as before, α df is the unnormalized f th component of |di in the computational basis, ηd is the overall normalization constant and F again denotes the total number of feature/computational dimensions. The two summations in (20) serve to pick out only terms where α df = 1 and α df = −1 respectively. Now the   W d |fi+ F d | f i , where η is the overall normalization conclass vector is |wi = η1w ∑Ff=1|α d =1 ∑W α α ∑ f =1|α d =−1 ∑d=1 f w d=1 f f f vectors in the class. And finally, the test vector can be written as stant and  W is again the total number of training  |ti = η1t ∑Ff=1|β f =1 β f | f i + ∑Ff=1|β f =−1 β f | f i , in exact functional and notational analogy with |di in (20). Given that the class vector is just a normalized sum over its training vectors, and that in any SIP in the computational basis each matching computational vector component will add +1 to the result and each mismatching component will add −1, ht|di is equal to the total number of regional CNV-matches minus the total number of mismatches between the test vector |ti and training vector |di, multiplied by their respective normalization constants. 1 (S11 (B, D) + S00 (B, D) − S01 (B, D) − S10 (B, D)) ηd ηt 1 = S(B, D) ηd ηt ht|di = (21) (22) Now, since the class vector |wi is simply a linear sum of all its training vectors and the dot product (state overlap) is a linear operation: ht|ii = = 1 ηw ηt W ∑ S(B, D) (23) D=1 1 σ (B,C) ηw ηt (24) Note that the difference in normalization constant between the single training vector and the class vector is retained. As W can be arbitrary, (23) proves the implementation of SIP linearity in the computational basis 3 . We see that ht|ii is of the form κσ (B,C), for some positive constant κ = ηw1ηt , and hence is a monotonic function of σ (B,C). The fact that κ is a constant is seen trivially for the case where there is only one training vector in each class, as ηw = √1n for both classes in the SIP framework. For the case of multiple training vectors per class, the form of the data determines the constancy of κ. The statistical nature of our neuronal AD/neurotypic contextual data is such that the total number of CNVs in each sample is similar across both disease and normal classes. In fact, the data reveals that the total number of CNVs per genomic region has the same overall statistical distribution in each class17 . This gives rise to the normalization constant (which is a function of these regional coefficients) being the same for both classes in the computational basis. Classification in separable data is still possible due to a salient region where all training samples of a class show atypical activity or different regional arrangements giving rise to the same overall histogram of CNVs per region. As one general example, different floors of large multistory buildings may have different occupancy for different buildings, but the overall distribution of floor occupancy could be very similar. At any rate, only the normalization constant needs to be similar for both classes. Any data satisfying this constraint is amenable to SIP classification. This implies that ηw , and thus κ, are very similar for both classes if the same (or similar) statistically significant number of training vectors is chosen for both, which is natural for our context. When ht|ii is used as a classification metric, the higher of two SIPs will remain the higher one irrespective of the value of κ. This means that our classifier is valid independent of the value of the normalization 3 The attentive reader will protest here as ηw will change with addition of training vectors. This will not matter for classification if we keep the number of training vectors similar for both classes as seen next. 16/24 constants. Thus, we have shown that our quantum machine successfully executes SIP-based classification and preserves linearity. There are ways to address the normalization constant issue and are expected to be presented in future work. As an example, one can encode a simplistic "scaling factor" (ratio of the raw normalization constant of one class vector to the other’s) using the ratio of the coefficients of the ground and excited state of a qubit which would be multiplied and encoded into the inner-product at run-time. Of course, one can multiply this factor classically after the inner product computation, but that would not be a satisfying solution. Without this, if normalization constants for the classes are not similar, correct classification is still possible in a weaker sense. The normalization constant, being the square root of the sum of squares of basis-vector coefficients, is a slowly increasing function of any one coefficient. From the form of the inner product, one can see that if one genomic region in one training class had a particular prevalence of CNVs, classification of a similar test sample into that class would not suffer. In the contextual data, for example, a test sample typically has only 1 CNV in the genome and is highly inclined towards being classified into the class displaying particularly more CNVs in the very same region where that CNV lies. Finally, we will now prove the equivalence of SIP to the Hamming distance (actually its negative, as mentioned) as a classification metric. In the language of SIP, (16) can be rewritten by definition as H(B, D) = S01 (B, D) + S10 (B, D). We will simply define the Hamming distance χ between a test vector and class vector to be the sum of the Hamming distances between the test vector and each training vector in the class, i.e. χ(B,C) = ∑W D=1 H(B, D). As an aside here, we can immediately see how the Hamming distance is not a linear function of the sample vectors as χ is not a linear extension of H. As a trivial example, the two Hamming distances between 1-d training vectors (1), (0) and test vector (1) add to 1 whereas the Hamming distance between the test vector and the vector sum of the training vectors is 0. Coming back to SIP-Hamming distance equivalence, −χ(B,C) can be rewritten as: W −χ(B,C) = − =− ∑ S01 (B, D) + S10 (B, D) ∑ n − S11 (B, D) − S00 (B, D) (26) D=1 ! W ∑ S11 (B, D) + S00 (B, D) = (25) D=1 W D=1 −W n (27) Now, (19) can be rewritten as: W σ (B,C) = ∑ S11 (B, D) + S00 (B, D) − (n − S00 (B, D) − S11 (B, D)) (28) D=1 W =2 ! ∑ S11 (B, D) + S00 (B, D) D=1 −W n (29) We see that (27) and (29) are very similar. In each equation, if we fix the number of training vectors per class, the term W n can be W removed. Now, we see that ∑W D=1 S11 (B, D) + S00 (B, D) and 2 ∑D=1 S11 (B, D) + S00 (B, D) only differ by a factor of 2 and hence are classificationally equivalent. Thus we have proven that SIP and Hamming distance are classificationally equivalent measures. There is one main limitation of SIP to mention here. As measurement statistics are always represented by the square of the coefficients/probability amplitudes, ht|ii in (24) will be measured indirectly via | ht|ii |2 . Thus, the missing information will be whether σ (B,C) in (18) is positive or negative, i.e. whether the total number of prefactor matches is higher or lower than the number of mismatches. Again, this is no issue for the contextual data since each sample has less than 2 CNVs across all genomic regions17 , ensuring that any test sample will have many more matches (mostly "0 − 0 matches) than mismatches with each training class. For data where individual samples have decidedly fewer 1’s than 0’s, or vice-versa, the sign of σ will be a priori known and this ambiguity will not be an issue. For clarity, we will summarize the cases where SIP-based classification in our circuits will not succeed: 1. If the raw normalization constants of the two class vectors in feature space are not similar 17/24 2. If the data differs by a significant overall scaling factor (see above or see Efficient Data-encoding Techniques) 3. If it is not a priori known whether the number of prefactor matches will be greater or less than the number of prefactor mismatches AIP AIP was defined as the total number of "1 − 1" matches between the test vector and all training vectors composing the class vector for a given class. Note that σ11 (B,C) in (18) is exactly the AIP, A(B,C), for test B and class C. As for SIP, we will again begin with showing how AIP is a bona fide classification metric and prove its linearity. In the AIP framework, we recall that in the state vector of any training or test sample the coefficient for the f th computational-basis vector can only be 1 or 0 (pertaining to the case where there is a CNV present in the region and the case where there is not, respectively). Following the same line of proof as for SIP and retaining the above feature basis definitions (and overall notation), we will now redefine state vectors in the computational basis. The training vector in the AIP framework reads: |di = =  1  ηd 1 ηd F ∑ f =1|α df =1 F ∑ f =1|α df =1 α df | f i + F ∑ f =1|α df =0 α df | f i α df | f i The test vector can similarly be written as |ti = ht|di =  1 S11 (B, D) ηd ηt (30) (31) 1 ηt ∑Ff=1|β f =1 β f | f i. Now, in analogy with SIP, (32) and ht|wi = = 1 ηw ηt W ∑ S11 (B, D) (33) D=1 1 σ11 (B,C) ηw ηt (34) As W can be arbitrary, (33) proves the implementation of AIP linearity in the computational basis. Following exactly the same argument as for SIP at this point for the same kind of data, we conclude that the above inner product leads to successful classification. Thus, we have shown that our quantum AIP machine classifies successfully and preserves linearity. We will now show the classificational equivalence of AIP to the negative Hamming distance with the caveat that its equivalence to Hamming distance is not crucial, even undesirable, for classification problems where 1 − 1 matches are more significant than other kinds of prefactor matches. In such cases, like the schedule-matching problem shown in the previous section, AIP is the natural measure of choice. Proceeding with the demonstration of equivalence now, we know that the AIP between test sample B and class vector of C, σ11 (B,C) = ∑W D=1 S11 (B, D). We would like to show as above that this is classificationally equivalent to ∑W D=1 S11 (B, D) + S00 (B, D) = σ11 (B,C) + σ00 (B,C). Clearly, this equivalence is valid when σ00 , the total number of "0 − 0" matches between the test and training vectors of the class in question, is either fixed for both classes somehow, or is a monotonically increasing function of σ11 in general. The way these two conditions hold true would certainly vary from case to case, so we will simply sketch a few plausible scenarios here. The latter condition is true, for example, when the difference in the number of CNV and non-CNV regions in any given sample is similar to that in other samples (assuming again that the same number of training vectors is chosen for each class). It is also true in the case where CNVs are generally sparsely distributed for all samples. This is most certainly the situation with the contextual data, where any sample statistically has only a single CNV across the whole genome. For test and training samples with sparsely distributed CNVs, an increase in σ11 is accompanied by an equal increase in σ00 with a high likelihood. To see how this is the case, one can visualize a test sample vector with a 18/24 CNV not "in line" (see Fig. 1) with regionally coinciding CNVs of a class’s training vectors, and as soon as it is placed "in line" by swapping it with a non-CNV in another region in the sample, both the "1 − 1" and "0 − 0" matches will very likely simultaneously increase, as long as the likelihood of finding a CNV in the samples is low. Thus, we have shown that our AIP quantum machine is a valid classifier and that it is equivalent to Hamming-distance as a measure under certain conditions, and summarized specific cases satisfying those conditions. We will again enumerate the cases where AIP classification in our circuits will not succeed: 1. If the raw normalization constants of the two class vectors in feature space are not similar 2. If the data differs by a significant overall scaling factor (see Efficient Data-encoding Techniques) 3. If the total number of "0 − 0" prefactor matches is neither fixed for the different classes nor is a monotonically increasing function of the "1 − 1" matches Efficient Data-encoding Techniques One universal challenge for encoding feature-space data using the computational basis is that one can fix only the relative values of the vector components, as the state vector must be normalized in the computational basis. For example, the unnormalized vectors A = (1, 1), B = (2, 2),C = (3, 3) in two-dimensional feature space will all be encoded as S = √12 (1, 1) in the computational basis and will not be distinguishable. As mentioned above, there are ways like encoding a scaling factor that future work would address. Fortunately, the kind of data that we are dealing with does not present this issue, again due to the fact that the raw normalization constant for the two classes is very similar if a similar number of training vectors is used. This normalization ambiguity is a current major limitation of our metrics when used for general classification problems without the scaling factor in place. Thus, the metrics are most applicable to data sets whose overall scaling across the feature dimensions is similar enough so that the inner product values for the different classes would not be relatively affected. Brute Force Approach It is not in general trivial to encode arbitrary data values from feature space into n-qubit systems starting from the universal ground state via a series of available unitary gate operations. A non-trivial combination of ancilla-like entangling qubits and controlled rotations are typically required even for the case of two qubits (see "Initial State Preparation" in Fig. 3). Here, we will present a recipe for encoding arbitrary forms of data using entangled quit states, up to a possible quantifiable error, given normalization constraints. Typically, in our context, an n-qubit sample vector is encoded by the (binomial-series-like) state: |ψi = (an |0i + bn |1i)...(a1 |0i + b1 |1i) (35) where the ai ’s (i = 1, 2, ...n) are data-dependent coefficients with subscripts in decreasing order of place value. Even though this representation is without qubit entanglement, it has the potential to encode sample vectors otherwise represented by entangled states in the computational basis, up to some known error. As a reminder, the combined coefficient of the term |0in represents the prefactor (CNV-value) of the first physical (genomic) region, that of the term |0in−1 |1i represents the CNV-value of the second region etc. The recipe is as follows: • If the sample vector can be represented by unentangled qubits in the computational basis, it is simply written in the form of (35) and all the coefficients ai , bi in the solution |ψi are readily identified. • If not, first the sample vector is normalized in the feature basis using its overall normalization/scaling factor N1 . Then, expanding terms in (35) leads to 2n non-linear equations in the coefficients ai and bi of the form cn cn−1 ...c1 = NA etc. where the ci ’s (i = 1, 2, ...n) are placeholders for either ai or bi , and A refers to the relevant regional prefactor/CNV-value. • Next, the non-linear equations are numerically solved for the ai and bi in the closest possible form satisfying the constraint that each ith qubit state-term in |ψi in (35) is individually normalized (clear below), which automatically leads to the solution |ψi being normalized overall. Error minimization techniques can be used to solve these equations for the ai ’s and bi ’s with the state normalization constraint, possibly containing some error relative to the actual data. • Finally, each ith qubit state in (35) is rotated by Ry (θ ) to achieve these minimum-error values of ai and bi . 19/24 The last bullet reveals why the individual qubit states need to be normalized, so as to be achievable by a unitary gate operation on these qubits. Thus, modulo the scaling factor, this recipe can be used to encode a relatively large feature-data space up to some known possible error. There will necessarily be cases where a small error will not be achievable (e.g. cases where the natural representation of the sample vector in the computational basis is with entanglement and where our recipe leads to state coefficients that cannot be normalized without introducing a large error in the solution), rendering this recipe ineffective for such scenarios. "Binomial Series" Approach This is an optimization that can be utilized in both the AIP and SIP framework if the data has certain simplifying CNV patterns amenable to this approach. In fact, it was employed in both Example Problem 1 and 2 of the 14-qubit circuit. We first note that, in the AIP framework, the non-zero coefficients in (35) will decide which regional blocks are encoded as having CNVs and which ones as not. Therefore, the values of these coefficients can be chosen to encode specific themes or patterns in the data. We will first motivate this in the context of the AIP. Suppose, for example, that the data has all zero CNV entries in the first half (block) of all genomic regions (i.e. in 32 of 64 regions etc.). We immediately see that setting an = 0 will achieve this condition. If the data has all zero entries in the second half of the first and second half blocks of genomic regions, setting bn−1 = 0 will achieve this. Setting bn , bn−1 = 0 will enable only states beginning with the term |00...i to be present, nullifying all CNV-values except those of the first half of the first half of all genomic regions and so on. Similarly if it is desired that the CNV-values of the first half of all regions be exactly twice that of the second half, one would set an = 2 and all the other coefficients to 1 (prior to normalization) and so forth. Many such desired patterns can be creatively effected in data encoding from this general framework. For example, this approach was extended to the SIP framework of Example Problem 2 of the 14-qubit circuit by finding simple rotations to encode the zero coefficients as "−1." This "binomial" series approach combines the idea of using ordered bit strings/ block states to enumerate physical regions with the use of unentangled 1-qubit states juxtaposed to yield a series representation for all unentangled n-qubit states in the computational basis. This series representation is then minimalistically utilized for optimal encoding. Optimization Techniques Swap Like-valued Bits Only in Inner Product Evaluation When calculating an inner product of two state vectors, each composed of multiple qubits, one need not assess the contribution of corresponding qubits when their contribution is unity. This happens typically when corresponding qubits are like-valued. For example, if |Ai = (a1 |0i + a2 |1i) |0i |1i and |Bi = (b1 |0i + b2 |1i) |0i |1i, hA|Bi = a1 b1 h0|0i + a1 b2 h0|1i + a2 b1 h1|0i + a2 b2 h1|1i = a1 b1 + a2 b2 Clearly, the second and third qubit values of A and B do not enter the actual evaluation of the inner product, as they are correspondingly identical and contribute a factor of 1 to the result. Of course, if a1 = b1 and a2 = b2 , then the inner product is simply unity by default. For the kind of cases presented in Results, an efficient way to assess qubit equality is to apply a CNOT gate between two corresponding qubits, and then a NOT gate on the target (second) qubit, whose final value serves as a control to the swap test operation. A prior copy of the second qubit would be made via a CNOT gate having this qubit as control and a qubit containing |0i as the target, which would be the copy destination. In other words, the overlap between corresponding qubits is calculated only when they are different. The control qubit is 1 only if the two qubits are different to begin with. The value of the second qubit is copied before the operation in order to preserve it. This is in line with the future vision for quantum computing mentioned in the previous section. There will no doubt be better and more efficient ways to determine qubit-state equality and to embed these optimizations within the quantum circuitry in the long term. We have provided a sketch here. We are currently unable to implement sophisticated circuits anyway due to architectural and other constraints, and in our work have given a proof of concept for future implementations of the inner-product classifier that will employ much more developed quantum hardware and circuitry. One could just as well insert swap gates for like-valued qubits (for computing their trivial overlaps) in addition to the unlike-valued case, in order to present "complete" circuits, but given the current state of the field we did not think it vital to the theme of this work. 20/24 Zero-coefficient Exclusion This is a data-reformulation technique for AIP that saves dimensions in feature space and hence qubits in computational space. Suppose that some of the physical regions in the test vector have a prefactor of 0, thus ensuring that these dimensions will have no contribution to the inner product. The feature vectors can now be mapped onto a new basis so as to eliminate the non-contributing dimensions from the computational framework altogether. For example, say the original test vector resides in 4-dimensional feature space as x = (1, 0, 1, 0), requiring 2 qubits to encode. Clearly, we only need to use dimensions 1 and 3 to calculate the AIP. Thus, we eliminate dimensions 2 and 4 from consideration and reformulate our basis such that the dimensional labels 1, 3− > 1′ , 2′ , where the primed dimensions refer to the new basis. The test vector in the new basis now reads x′ = (1, 1), and is encoded in the usual way as √12 (1, 1), requiring only 1 qubit. The class vectors are also mapped to the new basis and the AIP is calculated in the usual manner. Data availability The data that support the findings of this study are available from the corresponding author on reasonable request. References 1. Wiebe, N., Braun, D. & Lloyd, S. Quantum algorithm for data fitting. Phys. Rev. Lett. 109, 050505, DOI: 10.1103/ PhysRevLett.109.050505 (2012). 2. Amin, M. H., Andriyash, E., Rolfe, J., Kulchytskyy, B. & Melko, R. Quantum boltzmann machine. Phys. Rev. X 8, 021050, DOI: 10.1103/PhysRevX.8.021050 (2018). 3. Kieferová, M. & Wiebe, N. Tomography and generative training with quantum boltzmann machines. Phys. Rev. A 96, 062327, DOI: 10.1103/PhysRevA.96.062327 (2017). 4. Lloyd, S., Mohseni, M. & Rebentrost, P. Quantum principal component analysis. Nat. Phys. 10, 631 EP – (2014). 5. Spagnolo, N. et al. General rules for bosonic bunching in multimode interferometers. Phys. Rev. Lett. 111, 130503, DOI: 10.1103/PhysRevLett.111.130503 (2013). 6. Biamonte, J. et al. Quantum machine learning. Nature 549, 195 EP – (2017). 7. Low, G. H., Yoder, T. J. & Chuang, I. L. Quantum inference on bayesian networks. Phys. Rev. A 89, 062315, DOI: 10.1103/PhysRevA.89.062315 (2014). 8. Wiebe, N. & Granade, C. Can small quantum systems learn? (2015). arXiv:1512.03145. 9. Kapoor, A., Wiebe, N. & Svore, K. Quantum perceptron models. In Advances in Neural Information Processing Systems, 3999–4007 (2016). 10. Wiebe, N., Kapoor, A. & Svore, K. M. Quantum deep learning (2014). arXiv:1412.3489. 11. Dunjko, V., Taylor, J. M. & Briegel, H. J. Quantum-enhanced machine learning. Phys. Rev. Lett. 117, 130501, DOI: 10.1103/PhysRevLett.117.130501 (2016). 12. Rebentrost, P., Mohseni, M. & Lloyd, S. Quantum support vector machine for big data classification. Phys. Rev. Lett. 113, 130503, DOI: 10.1103/PhysRevLett.113.130503 (2014). 13. Schuld, M., Fingerhuth, M. & Petruccione, F. Implementing a distance-based classifier with a quantum interference circuit. EPL (Europhysics Lett. 119, 60002, DOI: 10.1209/0295-5075/119/60002 (2017). 14. Havlícek, V. et al. Supervised learning with quantum-enhanced feature spaces. Nature 567, 209–212, DOI: 10.1038/ s41586-019-0980-2 (2019). 15. Schuld, M. & Killoran, N. Quantum machine learning in feature hilbert spaces. Phys. Rev. Lett. 122, 040504, DOI: 10.1103/PhysRevLett.122.040504 (2019). 16. Schuld, M., Sinayskiy, I. & Petruccione, F. Quantum computing for pattern classification. In Pham, D.-N. & Park, S.-B. (eds.) PRICAI 2014: Trends in Artificial Intelligence, 208–220 (Springer International Publishing, Cham, 2014). 17. van den Bos, H. et al. Single-cell whole genome sequencing reveals no evidence for common aneuploidy in normal and alzheimer’s disease neurons. Genome Biol. 17, 116, DOI: 10.1186/s13059-016-0976-2 (2016). 18. McConnell, M. J. et al. Mosaic copy number variation in human neurons. Science 342, 632–637, DOI: 10.1126/science. 1243472 (2013). https://science.sciencemag.org/content/342/6158/632.full.pdf. 21/24 19. McConnell, M. J. et al. Intersection of diverse neuronal genomes and neuropsychiatric disease: The brain somatic mosaicism network. Science 356, DOI: 10.1126/science.aal1641 (2017). https://science.sciencemag.org/content/356/6336/ eaal1641.full.pdf. 20. Chronister, W. D. et al. Neurons with complex karyotypes are rare in aged human neocortex. Cell Reports 26, 825 – 835.e7, DOI: https://doi.org/10.1016/j.celrep.2018.12.107 (2019). 21. Trapnell, C. & Salzberg, S. L. How to map billions of short reads onto genomes. Nat. biotechnology 27, 455–457, DOI: 10.1038/nbt0509-455 (2009). 19430453[pmid]. 22. lafrate, A. J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951, DOI: 10.1038/ng1416 (2004). 23. Buhrman, H., Cleve, R., Watrous, J. & de Wolf, R. Quantum fingerprinting. Phys. Rev. Lett. 87, 167902, DOI: 10.1103/ PhysRevLett.87.167902 (2001). 24. Smolin, J. A. & DiVincenzo, D. P. Five two-bit quantum gates are sufficient to implement the quantum fredkin gate. Phys. Rev. A 53, 2855–2856, DOI: 10.1103/PhysRevA.53.2855 (1996). 25. Sisodia, M., Shukla, A. & Pathak, A. Experimental realization of nondestructive discrimination of bell states using a five-qubit quantum computer. Phys. Lett. A 381, 3860 – 3874, DOI: https://doi.org/10.1016/j.physleta.2017.09.050 (2017). Acknowledgements We would like to thank Dr. Thomas Lehner and Dr. Geetha Senthil at the National Institute of Mental Health for stimulating our interest in this study, organizing a series of discussions which brought quantum computing, neuroscience, genomics and computational biology experts together to discuss the potential of quantum computing in solving computational challenges in neuroscience and supporting this work. K.K. A.R. M.M. and S.B. were supported by NIH grant 3U01MH106882-04S1. M.M. and A.R. were also supported by NIH grants 5U01MH106882-05 and P30CA044579 (to the UVA Cancer Center), respectively. Author contributions statement K.K. designed the classification metrics, quantum circuits and executed the study. S.B. conceived the study. K.K. and S.B. conceived inner product as a metric and performed the experiment. M.M. and A.R. set the biological context and motivated the biological problem. Competing interests The authors declare that there are no competing interests. Supplement Useful Techniques and Observations The following contains generally useful observations or techniques that the authors stumbled upon, but were not directly applied to our inner product circuits. Quantum Compression Technique One can encode an arbitrary large number m using merely 2 qubits as follows. Prepare a state |ψi = a |0i + b |1i such that a/b = k, and another state |φ i = c |0i + d |1i such that c/d = l, where l ≤ 1 and m = l2k . Thus m is a fraction, given by l, of the largest value that a binary string of k bits can hold in the usual base-2 representation. Of course, k could in principle be directly used to store the desired large value, but k and l are both used here for better precision. The success of quantum compression necessarily depends on the precision achievable in state preparation (i.e. in rotation angles). Classically, there is no way around storing the precise number itself in some form whereas here we can effectively compress it into one precisely prepared qubit. The measurement would constitute the decoding/decompression. Calculating Hamming Distance with XOR-based Schemes We used the feature basis to encode our example data, but will point out here that the most seemingly natural way to calculate the Hamming distance in a quantum computer is to use bit-wise CNOT (or XOR) gates applied between bit strings that are encoded directly as a series of |0i and |1i states in the computational basis. For example, the Hamming distance between "001" 22/24 and "111" is given by the bit string "110," which is of course output of |001i ⊕ |111i. Measuring the coefficients of all 3-qubit basis states after this CNOT operation would reveal |110i as the final result. However, this way to calculate the Hamming distance is highly inefficient in general, as each bit occupies a qubit and each training vector has to be separately encoded. Inner-product Decision Plane for Multiple Classes For multiple training classes, the decision boundary for the inner product classifiers is not a simple plane or hyperplane as it is for two classes. In order to construct the decision space, one would draw a bisector plane for each class, dividing the feature space into preferred subregions by class, and pick the class that is the universally preferred class in the region where the test vector lies. For example, on a plane, one could have 3 class vectors for classes 1, 2 and 3 respectively, yielding a total of 3 bisecting planes or lines. These lines divide the feature space into "R − 1 > R − 2", "R − 2 > R − 3" etc. subregions for a total of six such overlapping subregions (see Supplementary Figure 3). Here, R − 1 > R − 2 means that class 1 is preferred over class 2 for that subregion. Each point in the feature space will be part of exactly two overlapping subregions where a particular class (1,2 or 3) is preferred over the other two. So the point where test vector lies determines its classification without any ambiguity in this manner. Supplementary Figures Supplementary Figure 1. 5-qubit Example Problem 1 Simulation Circuit Supplementary Figure 2. 14-qubit Example Problem 1 Circuit on IBMQX16 23/24 Supplementary Figure 3. Preferred decision subregions for > 2 classes. The class vectors are shown in red, blue and green and indicated in roman numerals, whereas the grey dotted lines are the respective bisectors of each pair of class vectors. The dotted circles label the preferred subregions. 24/24