Proposal: ML + web3 model deployment survey

Name of Project:
ML + web3 model deployment survey

Proposal in one sentence:
Conduct a hands-on survey of ML model deployment options for web3 use cases

Description of the project and what problem is it solving:

The intersection of ML and web3 is very important, but currently very green (more thoughts on that here). One of the core capabilities that will be needed is model deployment and inference.

The goal of this project is to evaluate a few of the more popular decentralized compute networks for their readiness to support ML model deployment. This will involve going through a live deployment of a CNN trained on MNIST, consuming that model from a client, and assessing the solution.

More specifically, will be evaluating

  • Ocean compute-to-data
  • Bacalhau
  • Golem Network
  • Fetch.ai

For these properties

  • economics and monetization
  • inference or job types
  • hardware support
  • model and data formats
  • model management

Grant Deliverables:

  • written report on findings and results
  • code for each deployment experiment

Squad

Devin Conley

5 Likes

I wonder about the privacy and transparency. I guess the model is never accessible to the user of the model ? So the data needs to be sent somewhere and I guess there are privacy guarantees there. It might be interesting to know if the data being input to the model is close to the training data. Something like an autoencoder that is trained up on the training data and I can then see how good it is at replicating my input and that might give me an idea of how trustworthy the model is relative to its reported results.

Independent of the comments above, this is a really good project that can benefit many people. Where will the results be published ?

Thanks

1 Like

Privacy is definitely a pretty weird challenge in web3 (have thought about this a lot). For redundant decentralized compute networks, we would actually have to assume that the model is open source. Otherwise how would additional nodes be able to support the workload? This obviously poses an issue to model developers who want to monetize their work.

If we shift the trust assumptions a bit, we could potentially only run models in trusted environments, but this hurts resiliency of the deployment.

One option here for privacy guarantees on the data side would be to train and deploy a homomorphic encrypted model. But this is probably out of scope for this first project :slight_smile:

Decentralized training and validation is another really interesting area of work. I believe Fetch has already implemented some stuff here.

All that said… this is a great question @Mark! I will add a section for evaluating privacy features on each of these networks.

Will be publishing results on twitter, medium, and back here in the forum (+ code in github)

2 Likes

Hey folks! Wanted to share progress on this project

Have been pushing code to a repo here:

so far have knocked out

  • simple cnn model training
  • ocean asset deployment (model weights, MNIST dataset, inference algorithm)
  • ocean compute 2 data job
  • cnn inference algorithm

some example artifacts on ocean marketplace
https://market.oceanprotocol.com/asset/did:op:0c3c2c9099c67a49128379e5781e48fccb6125d335464d07edcceae78e7f729c
https://market.oceanprotocol.com/asset/did:op:9a4c1fe5fcec7d4071b8afdbb17dcc2c05b1ecf8cf512f556ef8f38116a594c3
https://market.oceanprotocol.com/asset/did:op:733ca2bfe560f6a66cc1e5cc2b6b91799db77e5fdbd41d4e1a402cb944bcdf43

2 Likes

That is great to have an update, thanks! I’m very interested to learn more about what you have learned and how “ready” these approaches are.

Sure thing @Mark! More notes and looking to continue the work here

1 Like