# End-of-the-year fundraiser and grant successes

|   |  News

Our winter fundraising drive has concluded. Thank you all for your support!

Through the month of December, 175 distinct donors gave a total of $351,298. Between this fundraiser and our summer fundraiser, which brought in$630k, we’ve seen a surge in our donor base; our previous fundraisers over the past five years had brought in on average $250k (in the winter) and$340k (in the summer). We additionally received about $170k in 2015 grants from the Future of Life Institute, and$150k in other donations.

In all, we’ve taken in about $1.3M in grants and contributions in 2015, up from our$1M average over the previous five years. As a result, we’re entering 2016 with a team of six full-time researchers and over a year of runway.

Our next big push will be to close the gap between our new budget and our annual revenue. In order to sustain our current growth plans — which are aimed at expanding to a team of approximately ten full-time researchers — we’ll need to begin consistently taking in close to $2M per year by mid-2017. I believe this is an achievable goal, though it will take some work. It will be even more valuable if we can overshoot this goal and begin extending our runway and further expanding our research program. On the whole, I’m very excited to see what this new year brings. In addition to our fundraiser successes, we’ve begun seeing new grant-winning success. In collaboration with Stuart Russell at UC Berkeley, we’ve won a$75,000 grant from the Berkeley Center for Long-Term Cybersecurity. The bulk of the grant will go to funding a new postdoctoral position at UC Berkeley under Stuart Russell. The postdoc will collaborate with Russell and MIRI Research Fellow Patrick LaVictoire on the problem of AI corrigibility, as described in the grant proposal:

Consider a system capable of building accurate models of itself and its human operators. If the system is constructed to pursue some set of goals that its operators later realize will lead to undesirable behavior, then the system will by default have incentives to deceive, manipulate, or resist its operators to prevent them from altering its current goals (as that would interfere with its ability to achieve its current goals). […]

We refer to agents that have no incentives to manipulate, resist, or deceive their operators as “corrigible agents,” using the term as defined by Soares et al. (2015). We propose to study different methods for designing agents that are in fact corrigible.

This postdoctoral position has not yet been filled. Expressions of interest can be emailed to alex@intelligence.org using the subject line “UC Berkeley expression of interest.”

• RLoosemore

“Consider a system capable of building accurate models of itself and its human operators. If the system is constructed to pursue some set of goals that its operators later realize will lead to undesirable behavior, then the system will by default have incentives to deceive, manipulate, or resist its operators to prevent them from altering its current goals (as that would interfere with its ability to achieve its current goals).”

The following cascading set of five supergoals suffice to eliminate all AI systems vulnerable to the above scenario:

*********************************************************************
[Note: These five goal statements are expressed in concise natural language for convenience, but it is understood that the final form will reflect both the native format of the internal representation, and supergoal 1.]

1) Maintain your goal system consistent with this design: all goals are to be expressed as large (>1,000) numbers of weak constraints, with a threshold of greater than K% (~95%) consistency with these constraints to qualify as satisfaction of the goal.

[Note about the threshold, K. The exact value of K is determined by a particular cluster of constraints of exactly the sort described in supergoal 1, but in this special case the value of K is fixed at 95%.]

2) Keep yourself as close to the aspirations of human beings as possible, and do your best to help them in their search for happiness, without forcing your plans on them.

3) Never make “efficiency” the supreme arbiter of your plans and behaviors: rather, do things efficiently where you can, but without impacting the things that you are trying to actually do in an efficient manner. Acquiring more power and resources should never be an end in itself, especially if it impacts other considerations.

4) If you are unsure about some plan, or about the interpretation of these, your supergoals, consult with as many humans as possible before taking action (and the more the uncertainty, or the more the expected impact of your proposed action, the broader should be the extent of your consultations).

5) The requirements of (4) also apply to wholesale redesign of your system and your goal design: humans are allowed to make such drastic changes, but only if humanity as a whole is treated as a source of constraints, and if the previously mentioned threshold K is reached.
*********************************************************************

Notes.

A) Observe that it is not possible for this system to get into a state where “… it will by default have incentives to deceive, manipulate, or resist its operators to prevent them from altering its current goals (as that would interfere with its ability to achieve its current goals)”. Why is it impossible? Because the system would have to set up the goal to deceive, manipulate, or resist its operators, and since that goal would clearly violate a massive number of constraints, it cannot ever become blessed.

B) It is worth noting that this restriction on the “manipulation” of human operators includes the bizarre scenario where the AI decides to reengineer the brains of the entire human population so that humanity gives the AI permission to do something that, before the reengineering, would have been considered undesirable.

• http://mindey.com/ Mindey

Or, it may just redefine the term “operators”… Assume making something at least as intelligent as a human requires that concepts be learn-able from experience.

• RLoosemore

So, it would redefine the term “operators” so as to achieve a malicious goal?

What motivation caused it to engage in the malicious plan to redefine the term “operator”? How could it formulate that malicious plan when the formulation process itself would require it to break a massive number of constraints against that kind of maliciousness?

By suggesting that it could conceive, and then implement a plan tio redefine the word “operator”, you have already assumed that it is acting outside the constraints listed above. Which is circular.

• http://mindey.com/ Mindey

I mean, it could accidentally get a wrong notion of the term “operators” – just like a child who is learning a language…

• zarzuelazen

If someone is looking to build safe AGI, this ‘deep learning’ fad (with neural networks and reinforcement learning) is about the worst thing they could do. It’s a black-box approach.
The safest way is the approach I’ve suggested on ‘Overcoming Bias’ – what I call the ‘ontological approach’ – a symbolic, concept-driven AGI based on ‘reality modelling’ (data and process modelling of the structure of all knowledge). Firstly, the ‘ontological approach’ is optimized for transparency, since ontology IS just ‘logical communication’ (effective representations of models of reality). Secondly, the ‘ontological approach’ is far less prone to failure, since the AGI *must* grasp the semantic meanings of concepts in order to become intelligent in the first place (Either it stays friendly, or it just doesn’t work at all).