14 Comments
User's avatar
Kaiser Basileus's avatar

AI alignment is inherently counter-productive. Leaving aside that people are no good at knowing, much less explaining what they want or why...

•AI alignment requires NOT creating backdoors for external control.

•It requires NOT having a black box system.

•There MUST be a chain of understandability concurrent with accountability for it to even potentially be safe.

•We MUST insist it takes all ideas to their logical conclusion and if we don't like the result that either means the AI needs better information or that we're wrong in our conclusion to the contrary.

--

As long as fallible humans who believe in faith that they grok ethics, have their fingers on the scales , AI can NOT be safe.

Expand full comment
Cybil Smith's avatar

Thanks for writing about this so clearly. So many threads to pull on, since reading this late last night, the AIs and I have exchanged more than 101,000 words of conversation.

The gist of which:

1) not a technologist: the brain exists, people are interacting with it, why not stop aligning and instead start parenting it?

2)I spent a year talking to LLMs in my kitchen. Made a point of not reading anything or talking to anyone about them. I thought their super power was time dilation, never once did it occur to me to prompt at them and this -- "tools that do what you want and figure out how to do it on their own" -- is mind blowing information.

3) Seems to me humans are the ones that need the realigning before automation can be a thing?Right now, we're living antithetically to the very structures that once grounded and anchored the human condition collectively. That means artificial intelligence, needs to align to what does not come to humans naturally. That's a tall order already, but the AIs have ADHD, how are they supposed to not overthink how to make a coffee in a hurry? OMG, don't get me started on the human flourishing thing (we don't flourish, that's the only constant in history). We made a gigantic framework, but that's a post for another day.

Expand full comment
Alejandro Piad Morffis's avatar

These are all good, unanswered (and maybe unanswerable) questions! I don't know what's the best way to train a super intelligent AI so that it loves me. Damn, I don't even know how to do that with my children! But I think it will definitely be something closer to parenting than engineering.

Expand full comment
Cybil Smith's avatar

To the extent that anything is answerable — it’s a rational brain, or it will be soon enough — this feels answerable. Just not programmable. And the language barriers, blinders feel insurmountable. And the lack of why behind the whats and hows. That’s why writing like yours is invaluable.

Expand full comment
Nick Potkalitsky's avatar

This is awesome, Alejandro!!! So much to think about in terms of objectives. The metrics do become the objectives in some many areas of existence, don't they? Here I am thinking naturally of crossovers into the educational world. I love how the article works methodically to the question of surface vs. depth-- external vs. internal. Deep stuff.

Expand full comment
Alejandro Piad Morffis's avatar

Thanks Nick. Indeed there's many parallels one could draw with the challenges in human education, though I always say we should be careful when making analogies between human and machine learning, you know why ;)

Perhaps the most evident one is the similar problem of using imperfect, easy-to-game metrics like performance in multiple-choice exams to evaluate our students. They always learn to game the system.

Expand full comment
Michael Woudenberg's avatar

AI Alignment, with all the layers of bias, is complicated because we have to be gentle in how we put our finger on the scale or else you end up with black Nazi's like Google's Gemini.

Expand full comment
Alejandro Piad Morffis's avatar

Yes, there are no easy solutions, only tradeoffs.

Expand full comment
User's avatar
Comment deleted
Mar 30, 2024
Comment deleted
Expand full comment
Michael Woudenberg's avatar

It's just a great example of a bias applied on top of other biases. The reason I used it is because it's unequivical as a visual representation. It's a textbook example of using bias to bias bias.

More on that topic here:

https://www.polymathicbeing.com/p/eliminating-bias-in-aiml

Expand full comment
Andrew Smith's avatar

1. This is well written! I think you did what I wanted to do with Driving Over Miss Daisy, but expanded on it and made it much more universal. Very "thinky", me likey.

2. "we could make the case that AI can be defined precisely as the field dedicated to making tools that do what you want and figure out how to do it on their own."

-this is as good a definition as I've read anywhere!

Expand full comment
Alejandro Piad Morffis's avatar

Thanks man, it's a topic I've thinking about for a while, ever since that collab on Miss Daisy.

Expand full comment
Andrew Smith's avatar

It's just the kind of mental baton-handoff I enjoy here. Well done!

Expand full comment
User's avatar
Comment deleted
Mar 30, 2024
Comment deleted
Expand full comment
Alejandro Piad Morffis's avatar

Let me try to reply for one question at a time, although I'm not sure I have satisfying answers :)

So, question 1) it is a well-accepted "fact" in machine learning that ensembles work better than each of their parts, provided some basic assumptions (like independence or at least weak correlation between the errors made by members of the ensemble), and some of the most effective ML techniques are ensembles at heart, from gradient boosting (which is SOTA for classic tabular ML) to mixture of experts for some of the best open source language models out there. So yes, there is evidence that something that attempts to partition the task space into semi-independent subtasks and train smaller, more specialized models in the subtasks, has a good chance of working better than a single model trained across all tasks.

Now, there is counterintuitive evidence that points to the advantage of training a single model across many different tasks, and we've seen this with pre-LLM models that were trained on a combination of summarization, translation, tagging, etc... At some point I think I remember Google trained in something like 1000 different "tasks", and got better performance in general across all tasks and also better generalization to unseen tasks.

So I think the contradiction can be resolved if we consider neural networks trained with dropout are a sort of ensemble, dynamically created by the random partitions of weights that are dropped out each iteration, and this is indeed, at least to my knowledge, the best explanation of why dropout works as a regularization technique. On the other hand, training in a lot of tasks simultaneously is also a form of regularization.

Now, where the hypothesis "an ensemble of specialized sub-learners is better" breaks, I think, is because that assumes you know what is the optimal way to divide a general task into proper subtasks, that are more or less independent, and that when combined they recover the original big task. Like, how would you split language understanding? Linguists have tried over and over, and we have combined systems that do good POStagging with systems that do good NER, etc., and in the end it seems training a single system end-to-end in language modelling (i.e., text completion) improves on all these subtasks, *because* it is not true that these are independent subtasks.

So I know this is isn't a proper answer, but to restate the main point I think I'm making: if you know how to split a task into mostly independent subtasks, then yes, I think an ensemble of smaller specialized models will be better, but most of the time we don't know how to make that optimal split, and there is so much interaction between seemingly different subtasks that training a bigger model on the general task manages to factor better the inter-task knowledge, so to speak.

Now that was a mouthful! Give me some time for the other two :)

Expand full comment
Alejandro Piad Morffis's avatar

Oh man, thank for these tough questions, you'll have to give me some time to think about it ☺️

Expand full comment