Docs for TRLX: the neXt generation Reinforcement Learning library for Transformers based architectures

mistobaan · January 10, 2023, 5:03am

Name of Project: trlx v0.5 Documentation and Examples

Proposal in one sentence: Improve the accessibility of trlx v0.5, a python library for fine-tuning language models using reinforcement learning, by adding more documentation and practical examples. GitHub Repo

Description of the project and what problem it is solving: trlx is a valuable tool for organizations using representation learning and reinforcement learning to study human preferences at scale. However, it can be difficult for new users to understand how to use the library and apply it to their projects without clear documentation and examples (see trlx @ readthedocs). This project aims to improve the accessibility of trlx by adding more documentation and practical examples, making it easier for a broader range of practitioners and engineers to learn how to use the library and benefit from its capabilities.

trlx is one of the many open source efforts of CarperAI

This aligns with the mission of the Algovera Foundation to support the development of decentralized AI products and provide resources for AI teams.

Grant Deliverables

Improved documentation for trlx v0.5, including installation instructions, usage examples, and explanations of key concepts and functions
At least 3 new practical examples showcasing the use of trlx in real-world scenarios
Updated README and code comments to reflect the updated documentation better.
Creation of a new tutorial or video walkthrough of trlx for new users to follow along with

Overall, the goal of these deliverables is to make trlx more accessible and easier to use for a wider range of practitioners and engineers. By providing clear documentation and practical examples, users will be able to understand better the capabilities and limitations of trlx and how to apply them to their own projects.

Squad Lead: Fabrizio Milo: I have a Master in Computer Science and I have been helping many open source projects in the AI field from the early days of tensorflow to the latest gpt-neox codebase (see user Mistobaan on github). I am passionate about AI and all the amazing thing is enabling. I am all in to accelerating this process.

Twitter handle: @fabmilo
Discord handle: mistobaan#2737

Mark · January 11, 2023, 1:33pm

This is a great project. I wonder if it makes sense to reach out to the BLOOM project? Do you know if they have anything similar?

silentspring30 · January 11, 2023, 11:49pm

Hi Fabrizio, great presentation yesterday. Thank you for all the open-source work you do!! Pretty impressive. I’d like to learn more about trlx.

I was wondering if the deliverables might be too ambitious for the small grant. Perhaps break it down into two months and two grants? Also are you planning to write it all yourself and produce the videos or trying to find someone from the community?

A good approach is to write and use a tutorial template yourself and then pass it on to someone who can do finetuning.

mistobaan · January 12, 2023, 8:18pm

Thank you! If you want to learn more trlx, then you are in luck! the outcome of this proposal will help you a lot

I was wondering if the deliverables might be too ambitious for the small grant. Perhaps break it down into two months and two grants? Also are you planning to write it all yourself and produce the videos or trying to find someone from the community?

For the documentation part I will use openai codex to quickly create the barebone documentation which I and the trlx maintainers will review.
Writing good examples is definitely the most laborious part as you want to have few at various levels of difficulty. I like the template idea so probably that would level 0 and will include the video on how to setup a basic workspace with trlx and how to install it and at run a toy model on a toy dataset (I am thinking nanoGPT of karphaty on a toy dataset tbd probably a command line completion dataset 8) .

Is my first request for grant to this community so I tried to have a strong value proposition but having another month of grant definitely will help and motivate. There is tons of very exciting work to be done in the next weeks!

Let me know if you want me to break it down this proposal in two parts or any advice to make it successful is welcome

Thanks

mistobaan · January 15, 2023, 7:10am

as far as I know, they only have the language model. Someone could implement a way to use it on top of trlx. they should be compatible with some glue code. needs more investigation if someone feels the hitch.

Mark · January 15, 2023, 7:45am

The repo summary is "A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF) " Is TRLX intended to also be used for pre-training/fine-tuning (without RLHF) ? There is a project TRL but it is not clear to me if TRLX is derived from that ?

mistobaan · January 19, 2023, 3:51pm

Yes, you would have to just have to configure a no-op for the RLHF part. It should be supported nicely anyway is the exact same pipeline without the final additional step of RL HF.