Data Analysis: Quantum Vs Classical Computing

In this project I will compare the accuracy of a classifier algorithm, using a quantum computer vs a classical computer. In the output, we will compare the top features selected by the quantum computer and the ones chosen by the classical one.

To be able to code all in Python in an easy way I will use the Qiskit package.

Github Repository

Description

As mentioned before, we are going to compare two machines performing a classification task, the algorithm that we will put to the test is called SVC (Support Vector Classifier), which is an implementation of an SVM (Support Vector Machine) for classification tasks.

SVC (Support Vector Classifier) The technical definition of an SVC is to find a hyperplane in an N-dimensional space that distinctly classifies the data points.

This is the technical definition, but if we want the simplified version it will be:

“SVC is like drawing the best possible line to separate two groups of things. It tries to make this line so that it keeps the groups as far apart as possible, which helps it guess correctly which group news things belong to”

Installation

Before starting with the code description, we are going to start with the needed packages.

pip install -qU qiskit numpy pandas scikit-learn

Data

To perform the classification, we are going to use some data that we need to load into the model. The data for this project is an example dataset included in the datasets from sklearn, the iris data. This dataset is cleaned and prepared so there is no need to clean it for an example of how to perform the data pre-processing for a project I recommend reading two of my previous articles Smartwatch data analysis and Breast cancer survival rate.

Before doing the classification, let’s take a look at the data.

Iris data

This dataset can be loaded easily in Python just by importing the sklearn package.

from sklearn import datasets

iris = datasets.load_iris()

This dataset contains labels, called feature_names that represent what has been measured in this case:

sepal length (cm)
sepal width (cm)
petal length (cm)
petal width (cm)

The other part of the dataset, includes for multiple samples the measurements for these units, the idea it’s to use this to identify groups that follow a pattern, and then make a classifier for potential new data.

For this classification in groups, this dataset contains the corresponding names for the flowers:

setosa
versicolor
virginica

Let’s visualize the data we are working with using plotly express.

First, we just load the data and then assign it to the variables, plus map the labels.

df = pd.DataFrame(X, columns=iris.feature_names)
df['species'] = y

df['species'] = df['species'].map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})

With that we just have to use Plotly Express to visualize, in this case, I will use a grid and will adjust some values on the graphs to have a more clear view of the white axis, plus use a template for the look:

fig = px.scatter_matrix(
    df,
    dimensions=iris.feature_names,
    color='species',
    title='Pairplot of Iris Dataset',
    template='plotly_white'
)


fig.update_layout(
    width=800,
    height=800,
    font=dict(size=10),
    title=dict(x=0.5),
)


fig.update_xaxes(tickangle=45)
fig.update_yaxes(tickangle=0)


fig.show()

After having visualized all the data we can proceed and start creating the model, we will start with the classical computer and then we will move to the quantum version.

Classical Computer

The implementation of the algorithm in a classical computer is pretty straightforward using libraries such as scikit-learn. The steps to follow when creating this is pretty simple.

Step 1: Import the libraries

First, we will import all the libraries we might need.

from sklearn import datasets # This one was imported before
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

Step 2: Split the data

In the step before, to do all the visualization we already divided the data into the data and the target,

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Step 3: Standardize the features

In this step, we use the standard scaler to Standardize the features. Standardizing is a really useful technique to have variables with zero mean. For more information about why I must recommend this article.

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Step 4: SVM start and training

Now we just have to initialize the SVM and proceed with the training.

svm_clf = SVC(kernel='linear', random_state=42)

svm_clf.fit(X_train, y_train)

Step 5: Start predicting

Now we can check with the X values left to the test to see the accuracy of our model.

y_pred = svm_clf.predict(X_test)

Now let’s check the accuracy:

accuracy = accuracy_score(y_test, y_pred)

print(f"Classical SVM Accuracy: {accuracy}")

The result we get is: 0.98 which is pretty good, let’s see now how the quantum computer performs.

Quantum Computer

For the implementation in the quantum computer, I’m going to use the Python library called Qiskit, this library allows us to perform multiple operations, and create quantum gates. circuits and even run our code in a real quantum computer.

Disclaimer: The code in this project is based on a specific version of Qiskit, version 0.46, but since the release of this code Qiskit has launched a new version 1.0, and some things in this code might not work in the latest version.

For this Quantum version, I will follow a similar pattern compared to the classical one, this makes it easier to compare.

Step 1: Import the libraries

First, let’s start importing the libraries that we need. In this case, I will do the full import, but take into account that some of these libraries were already imported when creating the classical version of this project.

# Quantum libraries
from qiskit import Aer
from qiskit.utils import algorithm_globals, QuantumInstance
from qiskit.circuit.library import ZZFeatureMap
from qiskit_machine_learning.algorithms import QSVC
from qiskit_machine_learning.kernels import QuantumKernel

# Previously Installed
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

Step 2: Split the data

This step is the same one that we did before, if you are using the full code you can skip this step.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Step 3: Standardize the features

Same as the previous step, we will use the standard scaler to prepare the data for the model.

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Step 4: Initialize the feature map

This is the first “Quantum” step, combined with the next one represents the extra work we are going to do for this model. Before implementing this, let’s check what this means:

A feature map is a transformation that takes your input data and maps it into a higher-dimensional space. This is often done to make the data more separable for algorithms like Support Vector Machines (SVMs). In classical machine learning, this could involve polynomial or Gaussian transformations of the input features.

In quantum computing, a feature map typically involves encoding classical data into a quantum state using a quantum circuit. For example, the ZZFeatureMap in Qiskit encodes data into the amplitudes of a quantum state using a series of quantum gates.

Let’s see what this will look like in Python:

feature_map = ZZFeatureMap(feature_dimension=4, reps=2)

Step 5: Initialize the quantum kernel

In machine learning, a kernel is a function that computes the similarity (or inner product) between pairs of data points in the transformed feature space. The kernel trick allows algorithms to operate in high-dimensional spaces without explicitly computing the transformation.

Quantum kernels can capture intricate relationships in the data due to the complex nature of quantum states and operations. This can potentially provide a more powerful similarity measure than classical kernels, leading to better performance in certain tasks.

Just to summarize in clear and simpler words both concepts seen in steps 4 and 5:

Feature Map: Transforms data into a high-dimensional space.
Quantum Kernel: Measures similarity between data points in this high-dimensional quantum space.

Having all this information, let’s write it in Python:

quantum_kernel = QuantumKernel(feature_map=feature_map, quantum_instance=QuantumInstance(Aer.get_backend('qasm_simulator'), shots=1024))

It’s important to notice two critical parts in that line of code, the first one is the backend, in this case, is the qasm_simulator which means that we are not going to connect to a real quantum computer, instead our PC is going to simulate one.

The second important concept is the shots which means how many times we are going to run this code on the computer. In real-world scenarios, this number is critical in some areas such as error mitigation.

Step 6: QSVC start and training

It’s time to start and fit the model:

qsvc = QSVC(quantum_kernel=quantum_kernel)

qsvc.fit(X_train, y_train)

Step 7: Start predicting

Now that we reached the final step, it’s time to make predictions

y_pred = qsvc.predict(X_test)

Now let’s check the accuracy:

accuracy = accuracy_score(y_test, y_pred)

print(f"Quantum SVM Accuracy: {accuracy}")

The result we get is: 0.44 which is lower compared to the classical computer, let’s see why this might be happening.

Accuracy Discrepancies

We have seen that the result is not that good in quantum computers compared to the classical counterpart, but this is some reason behind it, let’s see some of the most common reasons a classical computer can perform better:

Classical Preprocessing: Classical preprocessing steps like feature scaling or normalization can significantly impact the performance of both classical and quantum models. The quantum model might be more sensitive to these steps.
Algorithm Implementation: Implementing quantum algorithms effectively requires deep understanding and careful tuning. The quantum feature maps and kernels used in QSVM may not be optimized for the specific dataset, leading to suboptimal performance.
Algorithm Complexity: Classical algorithms, especially well-established ones like SVM, have been optimized and fine-tuned over decades. Quantum algorithms are relatively new and might not yet be as refined.

Some of these explanations might be the reason why our quantum computer performed worse than the classical one. Some solutions such as Parameter Optimization or Hybrid Models can significantly improve the results.

Thanks for reading!

Description#

Installation#

Data#

Classical Computer#

Quantum Computer#

Accuracy Discrepancies#