Key writings on AI development from Open Philanthropy staff

Published: October 30, 2024 | by Open Philanthropy Staff

Recent years have seen rapid advances in artificial intelligence, from large language models demonstrating human-level performance on various benchmarks to AI systems mastering complex games and aiding scientific discoveries. These developments have led many experts to predict that transformative AI — systems that could match or surpass human intelligence and dramatically alter the course of human civilization — may arrive earlier than previously thought. While such technology could bring tremendous benefits, it also introduces unprecedented risks that warrant serious consideration.

As progress in AI has accelerated, we’ve seen a corresponding spike in interest in Open Philanthropy’s work on AI. Our work often focuses on mitigating potential risks from AI development, though we also recognize that fields such as health care, economic development, and scientific research stand to be transformed positively by AI. While this compilation doesn’t cover these benefits in detail, they form an important part of the broader context in which we approach AI development.

Below, we’ve curated some of the most notable writings and interviews on AI by current and former Open Philanthropy staff from the last decade. They are not intended to represent Open Philanthropy’s “institutional view” on AI. No such view exists — our program staff, as well as our grantees, often disagree on many aspects of these debates.

While some of our older writings (pre-2023) don’t reflect the latest developments in AI, we’ve included them because some of their core insights remain relevant and illustrate the evolution of our thinking. We’ve explicitly noted where we’ve made significant updates to older posts, but generally encourage readers to prioritize our more recent work for the most up-to-date analysis.

1. Risks from transformative AI

Open Philanthropy works in areas where we think our funding will do the most to help others. These materials explain why we have long supported work to reduce risks from advanced AI and help society prepare for major advances. Note that the first two posts were published in 2015-2016, around the launch of our grantmaking work on AI.

1.1 Cause Report: Potential Risks from Advanced Artificial Intelligence (2015)

This shallow investigation into AI risk marked our first brief look at the area as we decided whether to prioritize it. The “Our views…” post from 2016 supersedes this report, but this is a great place to start if you’re interested in our earliest work on the topic.

1.2 Some Background on Our Views Regarding Advanced Artificial Intelligence (Holden Karnofsky, 2016)

We started funding work on risks from transformative AI in part because we thought AI had the potential to be transformative. In this 2016 piece, Holden Karnofsky — who co-founded and led Open Philanthropy for many years before leaving in 2024 — offers a definition for “transformative AI,” and presents arguments for a >10% chance that it will be developed by 2036.

1.3 Existential Risk from Power-Seeking AI (Joe Carlsmith, 2023)

Should we expect highly intelligent AI systems to be built? If so, could such systems seek power over humans in ways that could profoundly destabilize society? Senior Research Analyst Joe Carlsmith explores these questions in his essay, and also examines what he views as the core argument for believing that AI systems could indeed pose a threat to humanity. To quote Joe: “Intelligent agency is an extremely powerful force, and creating agents much more intelligent than us is playing with fire.”

You can also read a longer version of this report, or watch a shorter video presentation.

1.4 Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover (Ajeya Cotra, 2022)

How likely is takeover from misaligned AI? Senior Program Officer Ajeya Cotra argues that it’s more likely than not if we assume companies race ahead to train the most powerful systems they can, using what she considers the most straightforward vision for developing transformative AI.

1.5 Cold Takes on AI (Holden Karnofsky, 2021-2023)

Outside his work at Open Philanthropy, Holden published a large body of work about risks from transformative AI on his blog, Cold Takes. There are more posts than we have space to summarize here, but below are a few of our favorites.

1.5.1 “Most important century” series (2021)

In his “Most important century” series, Holden argues that the 21st century could be the most important ever for humanity, via the development of advanced AI systems that could dramatically speed up scientific and technological progress (and lead to a deeply unfamiliar future). Here is a detailed summary of the series.

1.5.2 How we could stumble into AI catastrophe (2023)

In this piece, Holden lays out a handful of hypothetical scenarios to illustrate how quickly developing transformative AI could result in global catastrophe. He also notes a few key ways humanity could develop more robust safeguards.

1.5.3 High-level hopes for AI alignment (2022)

Here, Holden discusses some of his high-level hopes for how we can avoid the central risks of misaligned AI. He discusses three key possibilities for navigating alignment:

“Digital neuroscience”: Perhaps we’ll be able to read the “digital brains” of AI systems, so that we can know what they’re “aiming” to do directly (rather than having to infer it from their behavior).
Limited AI: Perhaps we can make AI systems safe by limiting their capabilities in various ways.
AI checks and balances: Perhaps we’ll be able to employ some AI systems to critique, supervise, and even rewrite others.

1.5.4 How might we align transformative AI if it’s developed very soon? (2022)

Imagine that an AI company thinks it can develop transformative AI within a year. Under this basic premise, Holden gives his understanding of the available strategies for aligning transformative AI if the company is right, and explains why these strategies might or might not work.

2. AI timelines

When we say that AI could be transformative, one natural response is: by when? Do we mean in ten years, or one hundred? These materials attempt to address this question, focusing specifically on forecasting the progress of systems capable of performing almost any intellectual task at “human level” or better.

2.1 Forecasting transformative AI with biological anchors (Ajeya Cotra, 2020)

When could AI systems automate away the hardest and most impactful tasks humans do today? In this report, Ajeya estimates how much compute might be required to train such a model, and how three factors — better algorithms, cheaper compute, and available funding — might affect when leading AI companies (or governments) can afford to train such systems.

2.2 What a Compute-Centric Framework Says About Takeoff Speeds (Tom Davidson, 2023)

“In the next few decades,” writes Senior Research Fellow Tom Davidson, “we may develop AI that can automate ~all cognitive tasks and dramatically transform the world.” Tom attempts to answer how long the process might take. He centers his inquiry around a specific question: Once AI can readily automate 20% of cognitive tasks, how much longer until it can automate 100%?

2.3 Could Advanced AI Drive Explosive Economic Growth? (Tom Davidson, 2021)

Since 1900, the economies of the world’s wealthiest countries have grown at about 2% annually. Yet Tom finds it plausible that by 2100, this figure could be ten times larger, given the possible development of AI systems “capable enough to replace human workers in most jobs.” In this report, Tom shares how he arrived at this conclusion, explains Open Philanthropy’s interest in AI timelines, and responds to potential objections. Appendix H links to multiple reviews of the report by economists, including back-and-forth discussions between Tom and some of the skeptical reviewers.

Note the “How to read this report” section at the beginning, which offers multiple options based on your level of technical background.

2.4 Semi-Informative Priors over AI Timelines (Tom Davidson, 2021)

What can historical examples of technological development (e.g., some STEM technologies) tell us about how AI might develop? Rather than focus on technical questions about AI, Tom uses this “outside view” approach to better understand AI’s possible trajectory. He also considers how different R&D variables — such as growing inputs of researcher time and compute — might speed up progress.

To read the full report (as opposed to the blog post linked above), see here.

2.5 How Much Computational Power Does It Take to Match the Human Brain? (Joe Carlsmith, 2020)

AI systems and human brains are hard to compare, but even rough analogies are helpful if we want to predict when AI systems might reach “human level” for many tasks. Joe takes up this difficult task in his report, using four different methods to estimate the level of computational power needed to match a human brain’s capabilities.

This blog post helpfully summarizes the report’s key points.

3. Further reading and resources

3.1 More on AI timelines

In 2022, Ajeya returned to her forecasting report and detailed how some of her predictions had changed since publication.
Holden published a summary of Ajeya’s report, as well as commentary on its strengths and weaknesses.
Holden discussed Tom’s semi-informative priors report in a larger Cold Takes piece about the burden of proof in forecasting transformative AI.
This post links to Ajeya discussing her report in podcasts and conference talks, along with other summaries of the report and commentary from AI researchers outside of Open Philanthropy.

3.2 What is it to solve the alignment problem? (Joe Carlsmith, 2024)

The “alignment problem” — the challenge of ensuring that AI agents act in ways intended by the developer — is one of the field’s most fundamental issues. Suppose we’re able to “solve” alignment: what would that look like? Joe proposes four conditions:

1. A dangerous form of AI takeover has been avoided.

Superintelligent (and dangerous) AI agents have been built.
Society has access to the main benefits of superintelligent AI.
Humanity is able to elicit a significant portion of those benefits from some of the superintelligent AI agents described in (2).

3.3 Planned Obsolescence (Ajeya Cotra and Kelsey Piper, 2023)

Ajeya co-runs a blog with Vox journalist Kelsey Piper, covering a variety of concepts and questions around AI risk. Most of these posts are aimed at a general audience and can be understood without much context on AI. One highlighted post: “Aligned” shouldn’t be a synonym for “good”. Ajeya says, “Perfect alignment just means that AI systems won’t want to deliberately disregard their designers’ intent; it’s not enough to ensure AI is good for the world.”

3.4 Scheming AIs: Will AIs fake alignment during training in order to get power? (Joe Carlsmith, 2023)

Will advanced AIs fake alignment during training to gain power later? In his 2023 report, Joe concludes that “scheming,” as he calls it, is a “disturbingly plausible outcome.” In addition to assessing the overall likelihood of scheming, Joe discusses avenues for further research.

For more on scheming AIs, check out Joe on the Hear This Idea podcast or watch a presentation he gave about the report.

3.5 Information security careers for GCR reduction (Luke Muehlhauser and Claire Zabel, 2019)

As AI technology continues to improve, leading AI companies will likely be targets of increasingly sophisticated and well-resourced cyberattacks that seek to steal intellectual property. In fact, this is already happening.

In this 2019 post, Muehlhauser and Senior Program Officer Claire Zabel briefly make a case for why information security skills could be useful in reducing global catastrophic risks (GCRs), including risks from advanced AI. They speculate that GCR-focused infosec jobs may become common by the end of the decade, and that a lack of qualified candidates may become a serious limiting factor for certain research areas. In 2024, the key points of this post are still applicable — Muehlhauser and Zabel continue to think that GCR-motivated infosec talent is undersupplied.

3.6 Let’s use AI to harden human defenses against AI manipulation (Tom Davidson, 2023)

The capacity of large language models to act deceptively is a significant problem. In this post, Tom suggests a somewhat surprising line of defense: we train said models to act deceptively. If we can learn more about how they deceive, we’ll be better able to detect and defend against those techniques moving forward.

3.7 Otherness and control in the age of AGI (Joe Carlsmith, 2024)

How should agents with different values relate to one another? What are the ethics of seeking and sharing power? In his “Otherness and control in the age of AGI” series, Joe explores these questions in the context of increasingly powerful AI systems, arguing that they are important to the discourse about existential risk from misaligned AI.

3.8 Podcasts

Ajeya recently appeared in podcasts by the New York Times, Freakonomics, and 80,000 Hours to discuss existential risk from AI, incorporating ideas from her work on AI takeover scenarios.
Holden has discussed his views on risks from transformative AI on the 80,000 Hours (2021, 2023) and Dwarkesh Patel (2023) podcasts. As a current Visiting Scholar at Carnegie California, he continues to write about these issues.
Tom appeared on the 80,000 Hours podcast (2023) to discuss his compute-centric framework and other relevant work. He also appeared on the InFi podcast (2024) to discuss the likely progression of AI.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.