11:15 AM PDT
2:15 PM EDT

Teaching large language models to zip their lips

Andrew Carr

Senior applied research scientist
Gretel AI

Language models, despite their rapid advancements, can still leak sensitive information due to their training process. To address this, researchers have explored differential privacy (DP), which offers a mathematical privacy guarantee but faces challenges in implementation. Reinforcement Learning from Privacy Feedback (RLPF) is a novel approach that replaces human feedback with privacy measures and uses reinforcement learning to enhance a model’s multi-task capabilities while reducing data leakage. This research opens up opportunities for applying RLPF to various problems and paves the way for more secure AI systems.


