Docs for TRLX: the neXt generation Reinforcement Learning library for Transformers based architectures

Name of Project: trlx v0.5 Documentation and Examples

Proposal in one sentence: Improve the accessibility of trlx v0.5, a python library for fine-tuning language models using reinforcement learning, by adding more documentation and practical examples. GitHub Repo

Description of the project and what problem it is solving: trlx is a valuable tool for organizations using representation learning and reinforcement learning to study human preferences at scale. However, it can be difficult for new users to understand how to use the library and apply it to their projects without clear documentation and examples (see trlx @ readthedocs). This project aims to improve the accessibility of trlx by adding more documentation and practical examples, making it easier for a broader range of practitioners and engineers to learn how to use the library and benefit from its capabilities.

trlx is one of the many open source efforts of CarperAI

This aligns with the mission of the Algovera Foundation to support the development of decentralized AI products and provide resources for AI teams.

Grant Deliverables

  • Improved documentation for trlx v0.5, including installation instructions, usage examples, and explanations of key concepts and functions
  • At least 3 new practical examples showcasing the use of trlx in real-world scenarios
  • Updated README and code comments to reflect the updated documentation better.
  • Creation of a new tutorial or video walkthrough of trlx for new users to follow along with

Overall, the goal of these deliverables is to make trlx more accessible and easier to use for a wider range of practitioners and engineers. By providing clear documentation and practical examples, users will be able to understand better the capabilities and limitations of trlx and how to apply them to their own projects.

Squad Lead: Fabrizio Milo: I have a Master in Computer Science and I have been helping many open source projects in the AI field from the early days of tensorflow to the latest gpt-neox codebase (see user Mistobaan on github). I am passionate about AI and all the amazing thing is enabling. I am all in to accelerating this process.

  • Twitter handle: @fabmilo
  • Discord handle: mistobaan#2737
2 Likes

This is a great project. I wonder if it makes sense to reach out to the BLOOM project? Do you know if they have anything similar?

Hi Fabrizio, great presentation yesterday. Thank you for all the open-source work you do!! Pretty impressive. I’d like to learn more about trlx.

I was wondering if the deliverables might be too ambitious for the small grant. Perhaps break it down into two months and two grants? Also are you planning to write it all yourself and produce the videos or trying to find someone from the community?

A good approach is to write and use a tutorial template yourself and then pass it on to someone who can do finetuning.

Thank you! If you want to learn more trlx, then you are in luck! the outcome of this proposal will help you a lot :smiley:

I was wondering if the deliverables might be too ambitious for the small grant. Perhaps break it down into two months and two grants? Also are you planning to write it all yourself and produce the videos or trying to find someone from the community?

For the documentation part I will use openai codex to quickly create the barebone documentation which I and the trlx maintainers will review.
Writing good examples is definitely the most laborious part as you want to have few at various levels of difficulty. I like the template idea so probably that would level 0 and will include the video on how to setup a basic workspace with trlx and how to install it and at run a toy model on a toy dataset (I am thinking nanoGPT of karphaty on a toy dataset tbd probably a command line completion dataset 8) .

Is my first request for grant to this community so I tried to have a strong value proposition but having another month of grant definitely will help and motivate. There is tons of very exciting work to be done in the next weeks!

Let me know if you want me to break it down this proposal in two parts or any advice to make it successful is welcome

Thanks

as far as I know, they only have the language model. Someone could implement a way to use it on top of trlx. they should be compatible with some glue code. needs more investigation if someone feels the hitch.

The repo summary is "A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF) " Is TRLX intended to also be used for pre-training/fine-tuning (without RLHF) ? There is a project TRL but it is not clear to me if TRLX is derived from that ?

Yes, you would have to just have to configure a no-op for the RLHF part. It should be supported nicely anyway is the exact same pipeline without the final additional step of RL HF.

1 Like