Proposal: ML + web3 model deployment survey

devin · September 9, 2022, 3:50pm

Name of Project:
ML + web3 model deployment survey

Proposal in one sentence:
Conduct a hands-on survey of ML model deployment options for web3 use cases

Description of the project and what problem is it solving:

The intersection of ML and web3 is very important, but currently very green (more thoughts on that here). One of the core capabilities that will be needed is model deployment and inference.

The goal of this project is to evaluate a few of the more popular decentralized compute networks for their readiness to support ML model deployment. This will involve going through a live deployment of a CNN trained on MNIST, consuming that model from a client, and assessing the solution.

More specifically, will be evaluating

Ocean compute-to-data
Bacalhau
Golem Network
Fetch.ai

For these properties

economics and monetization
inference or job types
hardware support
model and data formats
model management

Grant Deliverables:

written report on findings and results
code for each deployment experiment

Squad

Devin Conley

Mark · September 16, 2022, 11:06am

I wonder about the privacy and transparency. I guess the model is never accessible to the user of the model ? So the data needs to be sent somewhere and I guess there are privacy guarantees there. It might be interesting to know if the data being input to the model is close to the training data. Something like an autoencoder that is trained up on the training data and I can then see how good it is at replicating my input and that might give me an idea of how trustworthy the model is relative to its reported results.

Independent of the comments above, this is a really good project that can benefit many people. Where will the results be published ?

Thanks

devin · September 17, 2022, 3:13am

Privacy is definitely a pretty weird challenge in web3 (have thought about this a lot). For redundant decentralized compute networks, we would actually have to assume that the model is open source. Otherwise how would additional nodes be able to support the workload? This obviously poses an issue to model developers who want to monetize their work.

If we shift the trust assumptions a bit, we could potentially only run models in trusted environments, but this hurts resiliency of the deployment.

One option here for privacy guarantees on the data side would be to train and deploy a homomorphic encrypted model. But this is probably out of scope for this first project

Decentralized training and validation is another really interesting area of work. I believe Fetch has already implemented some stuff here.

All that said… this is a great question @Mark! I will add a section for evaluating privacy features on each of these networks.

Will be publishing results on twitter, medium, and back here in the forum (+ code in github)

devin · February 10, 2023, 6:55pm

Hey folks! Wanted to share progress on this project

Have been pushing code to a repo here:

so far have knocked out

simple cnn model training
ocean asset deployment (model weights, MNIST dataset, inference algorithm)
ocean compute 2 data job
cnn inference algorithm

some example artifacts on ocean marketplace
https://market.oceanprotocol.com/asset/did:op:0c3c2c9099c67a49128379e5781e48fccb6125d335464d07edcceae78e7f729c
https://market.oceanprotocol.com/asset/did:op:9a4c1fe5fcec7d4071b8afdbb17dcc2c05b1ecf8cf512f556ef8f38116a594c3
https://market.oceanprotocol.com/asset/did:op:733ca2bfe560f6a66cc1e5cc2b6b91799db77e5fdbd41d4e1a402cb944bcdf43

Mark · March 15, 2023, 8:07am

That is great to have an update, thanks! I’m very interested to learn more about what you have learned and how “ready” these approaches are.

devin · March 15, 2023, 3:20pm

Sure thing @Mark! More notes and looking to continue the work here