Open Philanthropy recommended a grant of $1,000,000 over two years to Harvard University to support research led by Martin Wattenberg and Fernanda Viégas on artificial intelligence interpretability, controllability, and safety. Their research will focus on the extent to which large language models have developed internal models of the user and of themselves as distinct agents.
This falls within our focus area of potential risks from advanced artificial intelligence.