In February 2018, Open Philanthropy announced new openings for “generalist” Research Analyst (RA) roles, and we have since hired 5 applicants from that hiring round. This was one of our top priorities for 2018.
In this post I summarize our process and some lessons learned. I am hoping that in addition to our general audience, this post might be useful to others looking to hire a similar talent profile, and to potential future generalist RA applicants to Open Philanthropy.
Process overview
Sourcing: For this round, we experimented with many different outreach methods. In addition to posting the job ad to our website, we promoted our RA roles via an episode of the 80,000 Hours podcast, social media, personal outreach to promising individuals, emails to university departments, emails to a few special mailing lists, and more. As a result, many applicants reported hearing about our roles through multiple sources.
Application process: Our default process included a résumé screen, 3 remote work tests, an interview (usually remote), and finally a 3-month in-person trial. We tried to make that time investment feasible for as many applicants as possible by compensating people for the time spent on our remote work tests, and by offering a salary, health insurance, housing and moving assistance (for those not already local), and other benefits to those invited to our trial period, including 3 months of severance for those not made an offer after the trial period. We recognized that our process was still very demanding despite these efforts to make it feasible, but ultimately we felt it was best for us and for candidates, because an application process that simulates many aspects of the eventual role would allow both us and the candidates to make informed decisions about working together. In the end, fewer than 1% of applicants explicitly chose to withdraw at some point during the process, which might mean we missed out on fewer qualified candidates (as a result of our unusually demanding process) than we had initially feared. We’re grateful that so many candidates took the time to move through such a demanding application process.
Trial period: Our 3-month trial program, for 12 people, began in September 2018. The program was a mix of work projects, seminars, educational group activities (e.g. a customized forecasting workshop from Good Judgment Inc.), and a few recreational group activities (e.g. at House of Air) and guest speakers. The work projects aimed to simulate RA work. Nearly every trialist did one large project evaluating the likely impact of our Farm Animal Welfare program, plus one or more additional projects customized to their most plausible specialization(s) within the broad category of RA-like work. We ultimately hired one applicant without a trial period (due to their outstanding performance on our work tests), and hired 4 more after the trial period.
Matchmaking: Given how many strong applicants were not subsequently made a job offer, we offered to help connect some of the strongest applicants to other job openings we know about through our network. This work is ongoing, and we probably won’t have a sense of how many hires it leads to for another several months.
Lessons learned
(Not everything below is a “consensus view” within the RA hiring committee, but we do agree on the italicized high-level points.)
The pool of available talent is strong, but our current ability to assess and deploy generalist RAs is weak: In my view, perhaps the biggest overall lesson was that the pool of available and interested talent is quite strong — more than a hundred applicants had very strong resumes and seemed quite aligned with our mission — but our current ability to immediately assess and deploy this base of available talent is weak, for the reasons described below. In short, we learned that (a) our work needs to be better defined, and we need better training materials, before we are able to take on board most of even the very strongest applicants, and that (b) it is difficult and time-consuming for us to identify which applicants are able to contribute in this way right away. Because of this, we think that working on (a) and (b) over the next few years is one of the best things we can do to improve our long-run generalist RA hiring capacity. I suspect that multiple candidates that we were not able to take on in our most recent round have the potential to excel in roles at Open Philanthropy in the future.
Difficulty of assessing applicants’ fit given the nature of our generalist RA work: Despite our time-intensive application process and trial period, in most cases we didn’t feel we had a good read on which candidates would be a good fit for our generalist RA roles until roughly 2 months into the in-person trial. We’ve learned many things we can improve about our screening process and our trial periods, but even for most late-stage candidates, I doubt we can get a good read on their immediate fit for this work much more quickly than 4-6 weeks into a trial (at this time). In part this is due to the difficulty of designing good work tests (see below), but it is also due to the nature of the work we have been hiring generalist RAs to do, much of which is work we haven’t done before, and thus isn’t well-defined and can’t be explained by reference to past examples. For example, we want some RAs to help us evaluate the past performance of our grantmaking portfolios, and investigate key premises of our current approach to cause prioritization, and we are still in the process of discovering what those functions should look like at Open Philanthropy.
Difficulty of designing good work tests: We put substantial work into designing a variety of “work tests” — both for the application process and for the trial program — but we think there was a lot of variance in how informative and fair these tests were for evaluating different applicants. In the application process, we used a combination of (a) tests adapted from GiveWell’s application process, and (b) a new test aiming to simulate the process of evaluating a potential grant. We ended up feeling that the new work test was somewhat diagnostic of an applicant’s fit for our RA role, but also may have unduly rewarded past familiarity with particular ideas. (We knew in advance this would be a problem, but didn’t have time to develop and test a more diagnostic “stage 3” work test.) We felt the tests borrowed from GiveWell were more fair and reliably informative (GiveWell had developed them and refined them over the course of years, and we had clear expectations for what constituted good performance), but less directly relevant to our work. We faced similar tradeoffs in designing work tests for the trial period, and felt that some of our work tests ended up being frustrating to work on and low-information, despite our attempts to make them maximally concrete and relevant. We think it could take several more “testing the tests” cycles to arrive at work tests we’re happy with, and we have now contracted a small number of people to help us with this.
Difficulty of trialing people at scale: Our trial program was designed to give a large number of applicants a chance to demonstrate immediate strong fit for our generalist RA work. To allow for so many trialists in 2018, we decided to run many trials simultaneously (12, in this case), and with a large amount of standardized work, including one large work project that nearly all trialists worked on simultaneously, and many whole-group discussions about specific topics. Unfortunately, there were several aspects of the trial program that didn’t go as well as we’d hoped. For example, (1) we learned from trialists that the program was more stressful than we’d realized it would be, and (2) the lack of training materials and work product examples (see below) was a bigger problem than we’d hoped, leading to a lack of clarity about how we planned to evaluate each trialist and project. For these reasons and others, we likely won’t be running another trial program at this scale anytime soon. In the near-term, we will probably instead trial a much smaller number of RA applicants per year, with each trial period more customized to each trialist.
Need for guides, training materials, and work product examples: During the trial period it became clear to me that trialists would benefit from more guides (e.g. about expectations, management approach, etc.) and training materials (e.g. on how we construct impact estimates, reasoning transparency, etc.) than we had time to develop prior to (or during) the trial program. Another challenge was that because much of the work we were hiring for is work we haven’t done before but expect to do more of in the future (see above), we didn’t have completed past examples to show to trialists, and this made it difficult for trialists to have a sense of what a completed piece of work should look like. We have begun to develop such training materials and finished work product examples, but completing them could require years of work.
Sources of our best-performing applicants: In this round, our best-performing applicants came overwhelmingly from our pre-existing networks, and personal outreach often seems to have made a difference with respect to whether someone chose to apply or not. Of the 58 applicants who performed best on our work tests, 90% came to our job ad via multiple sources in our pre-existing networks, 0% came from proactive “cold” outreach, and 26% “plausibly” (≥25% chance, in my view) would not have applied if not for proactive encouragement from Open Philanthropy. Of the 17 trial invitees, at least 10 came to the job ad via multiple sources in our pre-existing networks, and 7 “plausibly” would not have applied if not for proactive encouragement from Open Philanthropy. Of our 5 eventual hires, at least 2 came to our job ad via multiple sources in our pre-existing networks, and 3 “plausibly” would not have applied if not for proactive encouragement from Open Philanthropy.
Diversity: One sourcing failure is that our applicant pool was not as diverse as we would have liked. We tried especially hard to reach out to potential candidates from underrepresented backgrounds (e.g. via Jopwell and individual outreach), and we worked to minimize potential bias in our process (e.g. via blind grading of work tests), but even still it was difficult to maintain a balanced pipeline given the initial applicant pool. Improving diversity and inclusion at Open Philanthropy continues to be a priority for a variety of reasons; e.g. we don’t want to miss out on strong contributors because they expect they’d feel out of place at Open Philanthropy. In the future we plan to experiment with additional changes to our recruiting process, including sourcing, that might help us improve on this front.
International applicants: Roughly 40% of all applicants, and 30% of the 58 applicants who performed best on our work tests, indicated they would need a visa to work in the United States. In 2018 it was fairly difficult for us to get work visas for promising applicants, but since then we have worked to improve our options for hiring international applicants, and have made substantial progress on this front.
What can someone do now to become a stronger fit for future Open Philanthropy generalist RA openings?
Below are some personal ideas, which might or might not be worth the required time investment if you might be interested in a generalist RA-like role at Open Philanthropy in the future, and which are not necessarily endorsed by everyone on the RA hiring committee. In no particular order:
- Practice writing explanations of your thinking on complicated topics, and aim for strong clarity, succinctness, and reasoning transparency. Post these explanations to the web if you can, so that you can get feedback from readers.
- Familiarize yourself with basic concepts in cost-benefit analysis and effective altruism.
- Learn how to critically evaluate published evidence. E.g. see these GiveWell blog posts: Our principles for assessing evidence, How we evaluate a study, and Surveying the research on a topic.
- Be familiar with the basics of descriptive and inferential statistics, Bayesian reasoning, and microeconomics (supply and demand, comparative advantage, elasticity, marginal thinking, value of information, etc.).
- Practice making Fermi estimates.
- Do calibration training, both with trivia questions (e.g. see our app) and ideally also with real-world questions such as those featured on public prediction tournaments like PredictIt or Good Judgment Open.
We are very grateful for all of the time people put into applying, and are very excited about our new hires. We’re reflecting on these lessons publicly because we think we learned a lot and hope that doing so will help us and others hiring in a similar space do a better job with the process in the future.