Blurr - an open source library for fast.ai developers who want to train Hugging Face transformers

ohmeow · August 7, 2022, 7:35pm

Name of Project:

Proposal in one sentence:

blurr is an integration library that provides fast.ai developers everything they need to train, evaluate and deploy transformer specific models. It currently has 227 stars and github and is cited in numerous academic publications.

Description of the project and what problem is it solving:

Hugging Face transformers and fast.ai have emerged as key libraries in developing and deploying deep learning models. However, nothing natively exists in fast.ai that allows developers to use the library to train transformer models made available by Hugging Face. blurr is the bridge between the two.

In particular, blurr is an open source library designed to be familiar to fast.ai developers who are just starting out with the course and/or the fast book. It is designed specifically for these individuals who then want to train Hugging Face models for the core NLP tasks supported by the transformers library.

blurr currently supports the core NLP tasks of sequence classification, token classification, extractive question answering, language modeling (MLM and Causal LM), summarization, and translation. In addition to ensuring that new transformer architectures are supported by blurr, I also want to extend the library to support vision, audio, and perhaps RL tasks as well.

Grant Deliverables:

Deliverable 1: Update the library to support the latest versions of Hugging Face and fast.ai, including updates the library documentation and website. Core NLP tasks for currently supported architectures full tested. Will include notebooks that can be run in colab or locally as part of the documentation.
Deliverable 2: Update the library to support all transformer architectures currently available in Hugging Face. Include API to make it easy to publish models/tokenizers to the Hugging Face Hub.
Deliverable 3: Include initial support for vision transformers. Will include notebooks that can be run in colab or locally as part of the documentation.
Deliverable 4: Include initial support for audio focused transformers. Will include notebooks that can be run in colab or locally as part of the documentation.

Squad:

Wayde Gilliam (ohmeow, owner) is a full stack web/ML developer currently working at UC San Diego and providing educational/consulting services via his company, ohmeow.com. He is an active contributor and member of the fast.ai community, author of the ohmeow-blurr library, and co-lead for the Weights & Biases sponsored Hugging Face and fast.ai study group. You can find him on twitter @waydegilliam and on github @ohmeow.

wiskdev · August 8, 2022, 2:43pm

Excellent proposal! Would also suggest (as either deliverable or something else) to add Colabs and/or Jupyter notebooks as part of the documentation for an easier getting started experience.

Big fan of both projects, really like the initiative!

ohmeow · August 8, 2022, 5:25pm

Thanks for the props!

Added your suggestions as they’re pretty easy to setup given the library is built using nbdev.