August 7, 2024

Coactive Learning for Large Language Models

Coactive learning, leveraging implicit user feedback to efficiently personalize and improve large language models (LLMs).

Introduction

Large language models (LLMs) have become tremendously popular tools, especially in writing tasks. These applications often generate text outputs that end users can either accept or modify, improving efficiency in various applications such as email composition, customer service responses, and report generation. However, to truly meet user needs, these models must be personalized and contextually adapted. Enter coactive learning—a novel approach for training LLMs using implicit feedback, introduced by Aaron David Tucker, Kiante Brantley, Adam Cahall, and Thorsten Joachims from Cornell University. Read the full study here.

The Coactive Learning Model

Coactive learning leverages the edits users make to the text generated by LLMs. Instead of relying on gold-standard responses for supervised training, coactive learning only accepts that the user’s edited text is an improvement over the original. This model harnesses implicit feedback, which is naturally abundant in many applications that allow user edits to the model outputs, to personalize LLMs effectively.

The researchers have developed CoRLL (Coactive Reinforcement Learning from User Feedback), the first coactive learning algorithm for LLMs. Unlike conventional reinforcement learning from human feedback (RLHF), which demands explicit pairwise preference labels, CoRLL uses the implicit improvements users make as feedback. This significantly reduces the need for additional labeling efforts, making the learning process more efficient and user-centric.

Empirical Evidence and Applications

The study conducted demonstrates that CoRLL can work very effectively compared to traditional reinforcement learning from human feedback (RLHF) techniques, even in scenarios with noisy or weak feedback. Their experiments across various benchmarks, including IMDB Positive Sentiment, TL;DR summarization, and Helpful and Harmless Assistant tasks, reveal that CoRLL learns effectively from implicit feedback.

The Role of Implicit Feedback

Implicit feedback is crucial in the coactive learning framework. Every time a user edits a text generated by an LLM, they provide a valuable signal about their preferences. CoRLL interprets these edits as preference feedback, allowing the model to learn and adapt to individual user styles and requirements. This approach not only improves the personalization of LLMs but also ensures that the models remain aligned with user expectations without the need for extensive explicit labeling.

Harnessing Implicit Feedback for Enhanced Personalization

One of the main findings of the study is that it shows that human edit data can be a valuable source of feedback that does not incur the additional labeling effort of duelingfeedback.

Nebuly is a platform designed to facilitate the collection and utilization of implicit feedback, aligning perfectly with the findings of the study and principles of coactive learning. Nebuly is built to facilitate the workflow of capturing and integrating implicit feedback into your LLM-powered product. You can effortlessly gather valuable feedback from user interactions without requiring additional manual labeling efforts. This feedback is then used to fine-tune LLMs, ensuring they are personalized to meet specific user needs.

The ability to capture and leverage implicit feedback is highly effective for developers and organizations aiming to enhance their AI systems' responsiveness and accuracy. Nebuly automates the feedback collection process and offers a solution to integrate the findings back to your products enabling faster learning rates and better model performance.

Conclusion

Coactive learning represents a significant opportunity to advance the training and personalization of large language models with an efficient and user-centric approach.

With the right tooling these benefits are easily attainable for many, providing an intuitive method to harness implicit feedback and drive continuous improvement in LLM capabilities.

If you’d like to learn more about Nebuly, please book a meeting with us HERE.

Other Blogs

View pricing and plans

SaaS Webflow Template - Frankfurt - Created by Wedoflow.com and Azwedo.com
blog content
Keep reading

Get the latest news and updates
straight to your inbox

Thank you!
Your submission has been received!
Oops! Something went wrong while submitting the form.