Client-side (offline-first) Conversational (multilingual) Machine Translation

restandvest2 · January 11, 2023, 2:44pm

Name of Project: Client-side (offline-first) conversational (multilingual) Machine Translation

Proposal in one sentence: Be able to use conversation mode with (multilingual) machine translation models in offline mode.

Description of the project and what problem it is solving:
Client side offline-first ML has really picked up pace off-late, due to various reasons such as more and more users realising importance of privacy, better security and also better reliability/availability of models. While many solutions exist for deploying smaller models client-side such as tensorflow.js, onnyx for web etc - unfortunately they are still limited to using truly deeper and bigger ML models client-side. I would like to target one such popular usecase which we use widely: offline-first Machine Translation (MT). MT has a wide applicability in our daily workflows, be it while browsing or whenever we travel to foreign countries which use a different language of communication than our native one’s to name a few.

There have been quite a lot of recent SoTA advancements in the sequence-to-sequence modelling space for MT such as mBART etc, but it remains to be seen if they can be utilised offline in a user friendly way to be a genuine alternative to other privacy-intrusive services such as Google translate etc. By virtue of this grant, I plan to explore the same. A super cool feature that I’m planning for this grant is by stitching a robust deep ML speech recognition system (such as OpenAI Whisper) with a translation model for a hands-free translation experience in a good UX. Just say the sentence, and poof you have the translation - super useful when you want to converse with people in another country!

**Grant Deliverables:**choose deliverables you can complete in a month’s worth of part-time work

Exploration of breadth of models that can be utilised specifically for client-side MT (with constraints of latency & memory) and speech recognition.
A tool to achieve support for offline-first conversational (multilingual) MT.

Brief description of how I will approach the project:

First, I will do a survey of the best possible models that can be leveraged given the constraints of offline MT. I will evaluate during this phase, if any existing models can be used out of the box, or if model compression/distillation of any sort is needed. This will tentatively take about couple of weeks. Post that until the end of the grant, I will move on-to building the tool with the best possible UX and constraints. I will evaluate what best libraries to use here again with constraints of offline-first MT, and the framework to build the tool that I have in mind (as proposed above) as well.

Squad

Squad Lead:

Discord: restandvest#9326

pfedprog · January 13, 2023, 3:01am

This is a great project proposal! The idea of having an offline-first, conversational machine translation is interesting and useful. You have given a detailed description of the project and the problem it is solving. You have also provided a list of deliverables that you can complete in a month’s worth of part-time work. The only thing missing is a timeline for completing the deliverables and a plan for how you will approach the project. It would be helpful to include these details in your proposal.

restandvest2 · January 17, 2023, 6:08am

Hi @pfedprog , thanks for your suggestion! First up - apologies about the delay in responding, as I was away from my laptop for last couple of days. I’ve edited and updated the proposal above with a brief description of the details that you requested for. I hope you can understand - going too deep into the plan is a bit difficult as it depends on multiple related factors which I will explore during my project. Hope the details help you?

Please let me know if you need any other details of any sort.