Zero-Knowledge Machine Learning (ZKML): Projects Exploring the Space
Over the previous year, there have been significant developments in zero-knowledge technology, and in 2023, we are experiencing a remarkable increase in its adoption across the blockchain sector.
In parallel, the deployment of machine learning (ML) is becoming more intricate. Numerous enterprises are now opting for ML-as-a-service providers (Amazon, Google, Microsoft, among others) to implement complicated, proprietary ML models. With the proliferation of these services, they become progressively more challenging to audit and understand, posing a vital question: how can consumers of these services trust the validity of the predictions provided?
ZKML offers a solution by enabling the validation of private data using public models or verifying private models with public data.
Zero-knowledge (ZK) proofs are a cryptographic mechanism where the prover can demonstrate to the verifier that a given statement is true, without revealing any supplementary information except that the statement is true. The field of ZK proofs has made significant strides on various fronts, from research to protocol implementations and real-world applications.
ZK proofs leverage two primary “primitives,” or building blocks, to enable their functionality. The first is the ability to establish proofs of computational integrity for a given set of computations. The proof is substantially more straightforward to validate than to execute the computation itself, referred to as “succinctness.” Additionally, ZK proofs offer the option to conceal specific parts of the computation while preserving computational accuracy, known as “zero-knowledge.”
If you want to learn more about ZK, I recommend you attend this ZKP MOOC.
Trends in ZK
The State of ZK report, a quarterly publication that examines key developments in the zero-knowledge ecosystem, highlights the trends that have generated the most interest within the ZK community:
The leading use case for ZK is privacy. The zero-knowledge primitive of ZK proofs allows for concealing specific parts of the computation being validated. This capability is particularly advantageous for creating applications that uphold users’ privacy and safeguard their personal data while producing cryptographic attestations. Several noteworthy initiatives in this regard include: Semaphore, MACI, Penumbra or Aztec Network.
ZK for scaling ranks second. Distributed systems, such as public blockchains, possess restricted computational capabilities because every participating node (computer) must run the computations in each block by themselves to validate them. However, by utilizing ZK proofs, we can perform these computations off-chain, generate a ZK proof, and then authenticate this proof on-chain, achieving scalability while maintaining security and decentralization. Exemplary projects include: Starknet, Scroll, Polygon, Zero, Polygon Miden, Polygon zkEVM or zkSync.
Identity also draws attention indicating a growing curiosity in utilizing ZK technology in the realm of identity management. This includes developing proof-of-personhood protocols to create cryptographic attestations. Some notable initiatives in this area include: WorldID, Sismo, Clique or Axiom
When asked about the most exciting new use-cases, it is apparent that the community’s focus is on ZKML, which is considered the most appealing new use-case (besides interoperability or zkBridges). The remainder of this article will focus on ZKML for the purpose of effectively verifying that all computations have been accurately executed, which has far-reaching implications beyond just blockchains.
Why is ZKML drawing attention lately?
Creating zero-knowledge proofs requires significant computational resources, often much more than the original computation. As a result, there are certain computations that are impractical to prove with zero-knowledge proofs because of the time required to generate them. However, recent progress in cryptography, hardware, and distributed systems has made it possible to generate zero-knowledge proofs for increasingly intensive computations. These advances have enabled the development of protocols that can use proofs of intensive computations, expanding the range of applications for which zero-knowledge proofs can be used. A recent study by the Modulus Labs team titled “The Cost of Intelligence” evaluates various existing ZK proof systems against a wide range of models of varying sizes.
As AI technology continues to advance, it becomes more challenging to distinguish AI-generated content from human-generated content. However, zero-knowledge cryptography may hold the potential to solve this problem by enabling us to determine whether a particular piece of content was produced by applying a specific model to a given input, without revealing any additional information about the model or the input. In the case of large language models such as GPT-4, the creation of a zero-knowledge circuit representation could provide a means for verifying their outputs.
The zero-knowledge property inherent in these proofs would enable us to conceal any sensitive parts of the input or the model if necessary. An illustrative example would be the use of a machine learning model on personal data, where a user could obtain the outcome of the model inference on their data without disclosing their input to any external entity (such as in the healthcare sector).
ZKML is still a nascent technology and many use cases have yet to be explored. However, below are some of the most obvious use cases as also highlighted by Worldcoin’s article and Elena Burger’s post.
Computational integrity (validity)
Validity proofs such as SNARKs and STARKs have the capability to demonstrate that a computation has been executed correctly, and this can be applied to machine learning by verifying ML model inference or that a model generated a particular output based on a specific input. The ability to easily prove and verify that the output is the result of a particular model and input combination allows for the deployment of machine learning models on specialized hardware off-chain, while the ZK proofs can be conveniently verified on-chain.
When discussing ZKML, the focus is typically on generating zero-knowledge proofs of the inference step of the ML model, rather than on verifying the validity of the data used to train the model. The latter is already a highly computationally intensive process on its own.
Aside from validity proofs, zero-knowledge cryptography can also be used to preserve privacy in machine learning applications. One example would be to prove that a model has a certain level of accuracy on test data without revealing the weights used. An example of another use case is privacy-preserving inference, where private patient data can be used for medical diagnostics and the sensitive inference, such as a cancer test result, can be sent to the patient without revealing their data to any third party.
ML as a Service transparency (authenticity)
In cases where companies offer access to ML models through their APIs, it can be difficult for users to know whether the provider is actually offering the model that they claim to be providing, since the API is essentially a black box. Validity proofs that are associated with an ML model API would be valuable in providing transparency to the user, as they could verify which model they are utilizing.
Decentralized inference or training
Performing machine learning inference or training in a decentralized way while allowing people to submit data to a public model requires deploying an existing model on-chain or building a new network. Zero-knowledge proofs can be used to compress the model.
To incorporate attestations from external verified parties, such as a digital platform or hardware that can produce a digital signature, into a smart contract running on-chain, one can verify the signature using a zero-knowledge proof and use it as an input in a program. This method can be applied to any digitally attested information, providing a means of verifying authenticity and provenance from a trusted source. Endpoints that generate digital signatures can be verified and used in this way.
Projects exploring ZKML
As advancements in cryptography, hardware, and distributed systems continue to make zero-knowledge proofs feasible for increasingly intensive computations, an increasing number of projects are exploring the use of ZKML. The illustration below provides a non-exhaustive overview of current projects, though it should be noted that there may be some overlap between categories, and that this presentation is simplified for clarity. Additionally, there are numerous open source codebases available for building ZKML applications, indicating a growing interest and excitement in the community.
As ZK technology continues to advance, it is becoming increasingly feasible to prove larger machine learning models on less powerful machines in a shorter period of time. This is due to improvements in specialized hardware, proof system architecture, and more efficient ZK protocol implementations. As a result of these advancements, new ZKML applications and use cases are expected to emerge.
While the big picture use case of ZKML in Web3 is to enable on-chain organizations to run machine learning models, the fast-paced evolution of ZKML offers potential solutions to intricate problems in multiple fields. In my view, the following use cases could arise in this context:
⁃ Decentralized finance (DeFi): using ZKML to validate yield-maximizing strategies or rebalancing of pools for customers. One example of this is RockyBot
⁃ Gaming: using ZKML to validate betting mechanisms or AI-enhanced players. An example of this is Leela vs the World
⁃ Identity: using ZKML to perform AI analysis on user biometric information while ensuring custody of the data. An example of this is WorldID
⁃ Healthcare: ZKML can be utilized in the medical field for disease prediction by running machine learning models over sensitive medical data while preserving privacy
Although ZKML shows great potential, the field is still in its early stages of development. One challenge is that accuracy and fidelity may be compromised during the conversion of a model into a circuit. Another limitation is that the parameters and activations of many machine learning models are encoded as 32-bits for precision, which current zero-knowledge proof systems struggle to represent in the required arithmetic circuit format without significant overhead.
Currently, the field of ZKML is still catching up as zero-knowledge proofs continue to be optimized to handle increasingly complex machine learning models.