Open Philanthropy recommended a gift of $622,167 to the University of Oxford to support the development of a benchmark to test whether large language models can replicate results from computational science and engineering papers published on arXiv.org. The project will be led by Professor Jakob Foerster.
This gift was funded via a request for proposals for projects benchmarking LLM agents on consequential real-world tasks. This falls within our focus area of potential risks from advanced artificial intelligence.