Qristal SDK

Pat Scott
Quantum Brilliance Technology Blog
12 min readJun 8, 2023

--

A few weeks back at ISC Hamburg 2023 we announced that our full-stack, open source quantum SDK Qristal was exiting beta and getting ready to hit the shelves as version 1.0. 🎉 Here I’m going to give a you a crash course in what makes Qristal SDK tick.

Qristal SDK in a nutshell…

Frontends, frontends, frontends.

Qristal is specifically set up for developing hybrid quantum-classical programs. You write a classical main program that runs on CPUs and offloads different calculations in the form of compiled “kernels” to both quantum and classical graphics accelerators. The main program inspects the results, does some more calculations with them, then either offloads some new calculations to the accelerators, or returns the results to the user.

Qristal provides quite a lot of choice in terms of programming languages for both the classical and the quantum parts of your programs. The classical parts can be written in either Python or C++. Qristal itself is implemented in C++ and CUDA — so it’s fast — but there are fully documented APIs and tons of examples available in both languages. If you’re interested in Qristal bindings in another language, just let us know. Learning new programming languages is fun, so as a dev team we’re always happy to have an excuse to go in this direction… 😉

The quantum parts of the program can be constructed in a few different ways. One is to just write them out explicitly as quantum circuit kernels. For example, in OpenQASM:

__qpu__ void MY_QUANTUM_CIRCUIT(qreg q)
{
OPENQASM 2.0;
include "qelib1.inc";
creg c[4];
h q[0];
cx q[0],q[1];
cx q[1],q[2];
cx q[2],q[3];
measure q[0] -> c[0];
measure q[1] -> c[1];
measure q[2] -> c[2];
measure q[3] -> c[3];
}

Or in CUDA Quantum:

auto operator()() __qpu__
{
cudaq::qreg<4> q;
h(q[0]);
for (int i = 0; i < 3; i++) x<cudaq::ctrl>(q[i], q[i + 1]);
mz(q);
}

These examples are of a 4-qubit, fully entangled GHZ state. They each declare 4 quantum bit registers, operate on the first one with a Hadamard gate to put it into an equal superposition of |0⟩ and |1⟩, and then successively apply controlled NOT gates to each qubit in order to entangle it with the others. Qristal accepts kernels written not only in OpenQASM and CUDA Quantum, but also in Rigetti’s Quil, or in XACC’s XASM.

An alternative way to write the quantum parts of your program is to use Qristal’s own built-in CircuitBuilder. This allows you to construct a quantum circuit entirely from Python or C++, without having to drop down into kernel language at all. Using the CircuitBuilder in Python, the 4-qubit GHZ circuit would look like

import qb.core
circ = qb.core.Circuit()
circ.h(0)
for i in range(3): circ.cnot(i, i +1)
circ.measure_all()

Compilation pathways: choose your own (IR) adventure

Once you’ve written a program with classical and quantum parts, you need to convert it into a set of low-level commands that can be executed directly on two (or three) different types of hardware:

  • a quantum processor;
  • a classical CPU; and (optionally)
  • a classical GPU.

To get there, Qristal goes through what is known as an Intermediate Representation (IR). This follows the LLVM compiler infrastructure, where code can be written in a variety of different languages, and different frontends are available for ‘lowering’ those to a common IR language. Optimisation and associated transformations get done on the IR by middleware, before hardware-specific backends compile the IR into assembly or binary machine code that can run on hardware (or emulated hardware).

Qristal supports two different compilation pathways, each of which passes through a different IR. The first is the CUDA Quantum pathway, where programs featuring CUDA Quantum kernels get compiled into Quantum Intermediate Representation (QIR). A series of transformations get done on the QIR, before a CUDA Quantum backend converts the IR into executable code. The other is the XACC pathway, where programs written in OpenQASM, Quil or XASM get converted to XACC IR, before passing through various IR transformations and getting turned into executable commands by different XACC backends.

There are also possibilities to cross between the two pathways. For example, Qristal allows you to write out circuits in OpenQASM/Quil/XASM, compile them to XACC IR, translate the XACC IR into QIR, and then send it to CUDA Quantum backends. This is relatively straightforward because XACC IR is (approximately speaking) essentially a subset of QIR. Similarly, although at present it emits only OpenQASM/XASM, the CircuitBuilder will also soon offer the possibility to automatically construct circuits in whatever frontend kernel language you like. The same quantum simulator or hardware can also be set up to act as both a CUDA Quantum backend and a XACC backend. This basically gives Qristal the ability to compile code written in any combination of frontend langauges into commands that can execute on any simulators or hardware platform.

Up to this point I’ve been deliberately vague about what sorts of IR transformations get done. This is because each transformation is worth a whole blog post of its own — but they generally fall into three categories:

  • circuit optimisation (minimising the number of hardware ops needed to get a result)
  • circuit placement (which qubits in the circuit should get mapped to which qubits of the hardware, taking into account their actual physical connectivity and potentially different performance characteristics)
  • transpilation (converting the gates in a circuit into those that actually exist natively on the target hardware)

Backends for the Qristal SDK

We’re already offering users of Qristal a slew of different backends, from industry staples to cutting-edge tensor network simulations and hardware interfaces:

  • Quantum++ (qpp). Industry staple multi-threaded state vector-based simulator. Both a QB/XACC interface and CUDA Quantum native interface are available.
  • TNQVM + ExaTN. MPI-enabled tensor network simulator for XACC.
  • Qiskit AER. Industry staple GPU-enabled simulator suite with the possibility to include noise.
  • Remote GPU workstation daemon. Set up a Qristal server on a GPU workstation, then compile circuits with Qristal on multiple remote terminals, and send them to the server for execution using an AER backend.
  • AWS Braket. Offload programs written and compiled in Qristal to the AWS Braket system for cloud execution on a range of different simulator and hardware backends.
  • cuStateVec. GPU-enabled state-vector simulator drawing on the NVIDIA cuquantum custatevec library. Both a QB native interface (with noise) and CUDA Quantum native interface are available.
  • MPS GPU-enabled tensor network simulation with noise, based on the matrix product state method and drawing on the NVIDIA cuquantum cutensornet library.
  • Quantum Brilliance diamond hardware. A fully featured offload and control interface to Quantum Brilliance’s room-temperature, rack mounted quantum computers. Tried and tested running hybrid applications on the Pawsey Setonix supercomputer, ranked 17th in the global Top500 supercomputers, and 4th in the Green500.

Even more are on the way in future releases as well… 😉

Getting started with the Qristal SDK

At this point, it’s probably helpful to look at a couple of basic example programs written with Qristal. First, let’s check out the quantum equivalent of ‘Hello world’, a Bell state circuit written in Python that ships with Qristal:

# To learn how to use Qristal, let's run a simple Python example to create a Bell state.

# Import the Qristal core:
import qb.core

# Create a quantum computing session using Qristal:
my_sim = qb.core.session()

# Choose some default session parameters:
my_sim.qb12()

# Set the number of shots to run through the circuit:
my_sim.sn = 1024

# Set the number of qubits:
my_sim.qn = 2

# Choose the simulator backend:
my_sim.acc = "qpp"

# Create the |Φ⁺⟩ component of the Bell state using Hadamard and CNOT gates:
my_sim.instring = '''
__qpu__ void MY_QUANTUM_CIRCUIT(qreg q)
{
OPENQASM 2.0;
include "qelib1.inc";
creg c[2];
h q[0];
cx q[0], q[1];
measure q[0] -> c[0];
measure q[1] -> c[1];
}
'''

# Run the circuit:
my_sim.run()

# Print the cumulative results in each of the classical registers:
print("Bell state |Φ⁺>: ")
print(my_sim.out_raw[0][0])

Here we start by importing the Qristal core and creating a quantum computing session with some default parameters. We then indicate that we want to run a circuit 1024 times, that the circuit should consist of just 2 qubits, and that we want to use the qpp simulator as our backend of choice. We then specify a quantum kernel, written in OpenQASM in this example, run it, and print out the results of the 1024 repetitions.

The kernel starts with two qubits each in the |0⟩ state, applies a Hadamard gate to turn one into an equal superposition of |0⟩ and |1⟩, and then applies a controlled NOT gate to produce the fully entangled state

It then measures each of the qubits and spits out the result to a classical bit register.

Running the code gives an output that looks like:

Bell state |Φ⁺>:
{
"00": 510,
"11": 514
}

Contrast this with the following example. This one is instead written in C++, uses a CUDA Quantum kernel, and computes a 20-qubit GHZ (a maximally entangled state like the Bell state, but with more than 2 qubits):

// Copyright (c) Quantum Brilliance Pty Ltd
#include "qb/core/session.hpp"
#include <string>
#include <iostream>
#include <cudaq.h>

// Define a quantum kernel with CUDAQ at compile time.
template<std::size_t N>
struct ghz {
auto operator()() __qpu__ {
cudaq::qreg<N> q;
h(q[0]);
for (int i = 0; i < N - 1; i++) {
x<cudaq::ctrl>(q[i], q[i + 1]);
}
mz(q);
}
};

int main()
{
// And we're off!
std::cout << "Executing C++ demo..." << std::endl;

// Make a Qristal session
auto my_sim = qb::session(false);

// Number of qubits we want to run
constexpr int NB_QUBITS = 20;

// Add CUDAQ ghz kernel to the current session
my_sim.set_cudaq_kernel(ghz<NB_QUBITS>{});

// Set up sensible default parameters
my_sim.qb12();

// Choose how many 'shots' to run through the circuit
my_sim.set_sn(20000);

std::cout << "About to run quantum program..." << std::endl;
my_sim.run();

// Print the cumulative results
std::cout << "Results:" << std::endl << my_sim.get_out_raws()[0][0] << std::endl;
}

Here we define the kernel in CUDA Quantum as a C++ template, with the template parameter N designating the number of qubits. We then write the kernel with a for loop so that it defines a GHZ state for an arbitraty number of qubits, by applying a Hadamard gate to the first qubit, and then connecting all the qubits using N-1 controlled NOT gates.

Similar to the Python example above, we create a quantum session, set up some defaults, hand the kernel over to the session (in this case specifying in the process that we want a 20-qubit version of the kernel), choose the number of shots to run through, and hit 'Go!'

We then print the results, giving an output that looks like:

Executing C++ demo...
About to run quantum program...
Results:
{
"00000000000000000000": 10021,
"11111111111111111111": 9979
}

A more advanced and even more interesting example is the one at the end of this post, which blends kernel code written in both XASM and CUDA Quantum together with C++ into a single hybrid program.

You want to run a quantum simulation where? On what??

Qristal is specifically designed to allow simulation of hybrid quantum computing in embedded and edge applications, and in on-node quantum supercomputing applications, where whole banks of quantum accelerators work together in parallel. To do this, Qristal must run on an extraordinary range of different hardware configurations, from the world’s biggest supercomputers, to public and private cloud systems, standalone GPU powerhouses like the NVIDIA DGX, desktops, laptops and even smartphones and Raspberry Pis. Classical devices can be mixed and matched with actual quantum hardware, and with other classical devices emulating quantum hardware.

This is facilitated by a wide range of parallelisation options, from multithreading with OpenMP and POSIX threads to interprocess communication via MPI and extensive GPU offloading. Adding to this is a GUI with a cloud interface, the ability to send circuits to the AWS Braket cloud for execution, built-in integration with orchestration frameworks like NextFlow, a server mode that allows for remote execution of circuits on a private GPU workstation, support for Linux, OSX and Windows on x86 or ARM architecture, and Docker containers with the code and all dependencies all pre-built and installed.

Here are some apps we prepared earlier…

Qristal ships with a pre-built library of applications and circuit snippets that you can either just use directly, or incorporate into your own main programs.

  • A basic quantum arithmetic library for adding, subtracting, multiplying, dividing, maximising, minimising and exponentiating quantum states (including across superpositions)
  • Grover’s algorithm, Quantum Fourier Transform, Quantum Phase Estimation (QPE) and Quantum Amplitude Estimation (QAE)
  • QAOA (Quantum Approximate Optimisation Algorithm) and QLS (Quantum Local Search) are hybrid optimistaion algorithms.
  • The Quantum Decoder is a speech-to-text application employing a quantum beam search algorithm.
  • The Variational Quantum Eigensolver (VQE) uses the variational method along with a quantum solution ansatz to solve problems encoded as quantum Hamiltonians.
  • Quantum Machine Learning (QML). Qristal includes the ability to specify circuits with free parameters, which it uses to create trainable machine learning algorithms. The QML application includes an interface to PyTorch for training and network construction.

A cute example is the following application of VQE in C++. Here we compute the ground state energy of the deuteron, using a Hamiltonian written in CUDA Quantum and an ansatz written in XASM:

// Copyright (c) 2022 Quantum Brilliance Pty Ltd
#include "qb/core/cudaq/ir_converter.hpp"
#include "cudaq/algorithm.h"
#include "cudaq/optimizers.h"
#include "cudaq/spin_op.h"
#include "xacc.hpp"
#include "xacc_service.hpp"

int main() {
// And we're off!
std::cout << "Executing C++ demo: Solving Deuteron's ground state energy ..."
<< std::endl;
xacc::Initialize();
xacc::qasm(R"(
.compiler xasm
.circuit deuteron_ansatz
.parameters theta
.qbit q
X(q[0]);
Ry(q[1], theta);
CNOT(q[1],q[0]);
)");

std::cout << "Compiled ansatz with Qristal..." << std::endl;
auto ansatz = xacc::getCompiled("deuteron_ansatz");
std::cout << "QB IR:\n" << ansatz->toString() << "\n";

qb::cudaq_ir_converter converter(ansatz);
std::cout << "Converted ansatz to CUDAQ (Quake IR) ..." << std::endl;
auto &cudaq_builder = converter.get_cudaq_builder();
std::cout << "CUDAQ QUAKE: \n" << cudaq_builder.to_quake();

cudaq::spin_op h = 5.907 - 2.1433 * cudaq::spin::x(0) * cudaq::spin::x(1) -
2.1433 * cudaq::spin::y(0) * cudaq::spin::y(1) +
.21829 * cudaq::spin::z(0) - 6.125 * cudaq::spin::z(1);
std::cout << "Constructed Deuteron Hamiltonian as CUDAQ spin_op: \n";
h.dump();

// Run VQE with the builder
cudaq::optimizers::cobyla c_opt;
std::cout << "Running VQE with Cobyla optimizer! \n";
auto [opt_val, opt_params] =
cudaq::vqe(cudaq_builder, h, c_opt, /*n_params*/ 1);

std::cout << "Ground state energy (expected -1.74886): " << opt_val << "\n";
}

In this example we define a one-parameter ansatz for the deuteron’s wavefunction in XASM, compile it to XACC IR and print the result:

Executing C++ demo: Solving Deuteron's ground state energy ...
Compiled ansatz with Qristal...
QB IR:
X q0
Ry(theta) q1
CNOT q1,q0

We then take that ansatz in XACC IR, and convert it to QUAKE IR (a precursor to QIR):

Converted ansatz to CUDAQ (Quake IR) ...
CUDAQ QUAKE:
module {
func.func @__nvqpp__mlirgen____nvqppBuilderKernel_367535629127(%arg0: !cc.stdvec<f64>) {
%c1_i32 = arith.constant 1 : i32
%c0_i32 = arith.constant 0 : i32
%0 = quake.alloca : !quake.qvec<2>
%1 = quake.qextract %0[%c0_i32] : !quake.qvec<2>[i32] -> !quake.qref
quake.x (%1)
%2 = cc.stdvec_data %arg0 : (!cc.stdvec<f64>) -> !llvm.ptr<f64>
%3 = llvm.load %2 : !llvm.ptr<f64>
%4 = quake.qextract %0[%c1_i32] : !quake.qvec<2>[i32] -> !quake.qref
quake.ry |%3 : f64|(%4)
quake.x [%4 : !quake.qref] (%1)
return
}
}

Next we define a Hamiltonian in CUDA Quantum:

Constructed Deuteron Hamiltonian as CUDAQ spin_op: 
(5.907,0) I0I1 + (-2.1433,-0) X0X1 + (-2.1433,-0) Y0Y1 + (0.21829,0) Z0I1 + (-6.125,-0) I0Z1

Finally, we pass the Hamiltonian and the ansatz in QUAKE form to the CUDA Quantum implementation of VQE. The VQE algorithm iterates classically through different values of the ansatz’s free parameter θ — in this case using the Cobyla optimizer — and re-evaluates the energy (quantumly) each time, until it converges to a minimum. This is the VQE estimate of the ground state energy of the deuteron.

<H> = -0.436290
<H> = 1.620400
<H> = 10.193600
<H> = 4.452701
<H> = -1.593847
<H> = -1.609467
<H> = -1.181321
<H> = -1.715820
<H> = -1.748760
<H> = -1.707972
<H> = -1.737574
<H> = -1.747437
<H> = -1.747692
<H> = -1.748862
<H> = -1.748675
<H> = -1.748804
<H> = -1.748863
<H> = -1.748857
<H> = -1.748865
<H> = -1.748865
<H> = -1.748864
<H> = -1.748864
Ground state energy (expected -1.74886): -1.74886

Next steps

Hopefully this has given you a bit of a taste of what it’s like to use the new Qristal SDK. We’ve release it as an open source project, distributed under the Apache 2.0 license, and you can get it now from GitHub, either as source code or pre-built and ready to go in a Docker container. If this post has whet your appetite, download it and try out some of the examples and exercises. We’d love to hear how you get on!

Pat Scott is the Software Lead for Quantum Brilliance’s Australian team. He was previously a permanent faculty member in Physics at the University of Queensland and Imperial College London, and the founding Spokesperson of GAMBIT.

--

--