OneShotNLP(NER) : Perform NLP tasks in multiple languages using only one example


Proposal: Perform NLP tasks (NER) with just an example in one language and perform it in multiple other Language

Description: We all saw how LLMs are changing the NLP space. The objective of this project is to replicate their work on individual tasks or atleast train a model in a language and it should be usable in many others.

The idea here is using the shared embedding space. Translation models without English as an intermediary transformation, tend to project similar sentences in multiple languages near in the embed space or at least with the same distribution (if not trained with normalization). So we can utilise this property of them to perform the desired task. We also have sentence level multilingual embedding models which can do the trick.

So for NER, it’s hard to zero-shot to other languages because of token nonuniformity. But assuming the overall embedding of the entity is the same even if the number of tokens are different, I think we can do this with the spans instead of seq-seq.

[previous grant results in Grant1-pics – Google Drive, but my current project is unrelated to this]

Grant Deliverables:

  • One-shot NER, where you can just give an example in one language and perform it in multiple languages.
  • A model where you can train in one language and it should work on others. (robust than previous one)


Discord Handle:Jaswanth#7375
Twitter : @NLPforEDU