In the wake of surprisingly rapid progress in large language models (LLMs) like GPT-4, some experts have predicted that AI systems will be able to outperform human professionals at virtually all tasks within decades. Other experts are skeptical — they argue that LLMs’ capabilities have been overstated, and expect the technology to make a modest impact before running up against fundamental limitations.
To help build scientific understanding in this area, Open Philanthropy is looking to fund projects that will help us understand the capabilities and impacts of systems built from large language models (LLMs).
We are doing this through two separate requests for proposals (RFPs) — one on benchmarking LLM agents, and the other on studying and forecasting the impacts of LLM systems.
Anyone is eligible to apply, including those working in academia, nonprofits, or independently; we are also open to making restricted grants to projects housed within for-profit companies. We will evaluate applications on a rolling basis. See below for more details.
Benchmarking LLM agents
Through this RFP, we aim to fund benchmarks that measure how close LLM agents can get to performing consequential real-world tasks.
LLM agents are very new, and their impact has been limited so far, but well-functioning agents could have much more wide-ranging applications than LLM chatbots like GPT-4 or Claude. By the same token, they could pose more extensive risk than chatbots — executing plans, rather than merely creating them.
We hope to understand these potential outcomes by funding benchmarks that will reliably indicate whether and when LLM agents will be able to impact the world on a very large scale — for example, by replacing or outperforming humans in professions which account for a large share of the labor market.
See this page for the application link and more details on the RFP.
We also hosted a webinar to answer questions about this RFP on November 29 2023; the recording is here and the slides are here.
Studying and forecasting the real-world impacts of LLM systems
Through this RFP, we aim to fund a broad array of research projects (aside from benchmarks for LLM agents) that might shed light on what real-world impacts LLM systems could have over the next few years.
Examples of ideas that could make for a strong proposal:
- Conducting randomized controlled trials to measure the extent to which access to LLM products can increase human productivity on real-world tasks.
- Polling members of the public about whether and how much they use LLM products, what tasks they use them for, and how useful they find them to be.
- Eliciting expert forecasts about what LLM systems are likely to be able to do in the near future and what risks they might pose.
See this page for the application link and more details on the RFP, including many additional examples of proposals that might interest us.