Building web3 social media based recommendation system curriculum/ toolbox

lesDecroissant · June 15, 2022, 7:54pm

By Dhruv Malik , Amit Malik.

About :

Since the first demonstration of search indexing and semantic reasoning by Amazon and Google , recommendation systems have come long way in not only evolving the way of making indexing and searching the ontologies of various products , ideas , services etc efficient and scalable , but also evolved the new era of Information Age by creating an value chain for other supporting subjects , primarily the information retrieval and AI .

Thus recommendation systems as a whole represents the “Pandoras box ” for uncovering the true promises of the decentralised data science for web3 dapps . From the excellent article by Algolia, we can describe this to be vertical of :

(1)Preparing Data Sources

(2) Feature Engineering and Feature Store

(3) Machine Learning Models,

(4) Predictions

5)User inputs ,

(6) Results,

(7) Evaluation

(8) AI Ethics.

Each of the domains are vast enough to build a new data economy for web3 , but there comes the challenge in reducing the complexity for an core contributor in the data science space ( data scientist , analyst and web3 developers ) to know about the different tools and services to quick build their test applications on top of the

Thus we want to propose building a set of mini projects in the space of decentralized social media (lens protocol) as a starting point to create learning resources for the developers and researchers in the field.

by building a recommendation system for textual data across domains (on topics like user profile description matching, fighting bots , checking discussions following with community guidelines etc ) and then building a demo of the results prediction with the real data ,and finally explaining the whole progress as a jupyter notebook .

Aim of proposal

Our idea is to educate is to build an training guidelines and content to work as data science toolbox , similar to the approach being used by OG devs like patrick collins solidity course and Austin griffith’s Scaffold-eth combined with the expressivity of the data science models like in kaggle .
And the second pillar of our grants is adherence to the web3 tech stack for sourcing data (ipfs , subgraphs , ocean data pools etc ) to promote their utilisation .

Tech stack and workflow :

For web3:

We will be building on the NPM + TS/JS frameworks (hardhat + ethers ) for hosting the application

Using python frameworks:

for data preparation , developing the models for training and inference on the given data .

⁃ 1. Crawling the data by reutilising the scripts by lens-examples .

⁃ 2. Then running the data cleaning and basic inference tasks on the Jupiter notebooks or pre-trained ML model zoo libs like hugging face .

⁃ 3. Then based on the results , trying to transfer the training weights / model understanding to efficient browser based ML development framework like tf.js

⁃ 4. Documenting the results and working on basic examples .

⁃ 5. And then discussing the remaining verticals that are not practical for now , but are in active research .

Optional
Ideally our possibility will be to try giving one use case which gives better than benchmark results , on either of the prominent web3 social media / search engine platforms like Lenster , mask network plugin integration , but showing the results in a testnet environment .

Team members intro :

Dhruv Malik : web3 developer , specializing in smart contracts protocol development , system design and also passionate in following trends in Mlops and continual learning models (GANS , Reinforcement learning , etc).
Experience in TS / npm , web3 frameworks like web3js , ethers and foundry .
Amit malik : data science guy with 2+ yrs of exp in the space , but also eager to enter the web3 space .

Amount : 1000 $

Timeline (1 month)

scopes :

Writing up an whitepaper to describe the use cases we will be implementing the recommendation system along with tech stack - 1 week
Then building first use case with the report (generally an jupiter notebook for prediction of the) - 1.5 - 2 weeks
Then based on the feedback , implementing another use case 1-1.5 week .

Challanges

Given that its first grant work in data science work , we will be needing feedback from the data-scientists and other devs to validate our actual results of models and tooling, and also finally insuring that we adhere to the timelines foremost and providing an great resource for the community .

thanks for reading the proposal , me and @amit are available on the thread to answer your queries .