Humanoids.fit

A few thoughts on Chat-GPT and Robotics:

A while ago, when I was in Greece before the Chat-GPT era broke into our lives, I was doing my internship. Back then I was fascinated - like many of us - by the connection between language and mind. I wondered if it would ever be possible to model this relationship with our current tools. How could a computer possibly capture the nuances of language, I thought, especially since language forms our inner dialogue? In a sense, capturing these nuances would mean creating a window into our mind itself.

2 years later, Chat-GPT came out. To be honest, I declined it before even trying. I had past interactions with older GPT versions and I felt like it was too stochastic and 'just' completing a random text with a bit of a structure. Fortunately, people much smarter than me understood that the completion mechanism is an indication to something much bigger. Some sharp-minded researchers from OpenAI understood that the 'bit' may become an 'a lot' - because that 'bit' that could be achieved by the forward-propagation only, could increase to be an 'a lot' by back-propagation.

If you can achieve that 'bit' without training, won't you be able to achieve at least that by training?! In the past weeks, I have been wondering if Ilya thought that while training his model on that few-shot CoT prompt... back then in 2019...

These days I'm wondering, is it possible for a Robot mind.. to be the Chat-GPT itself??

This thought led me to recently explore robotics, where I discovered some fascinating work by Sergey Levine that connects these dots.

Actuate 2024 | Sergey Levine | Robotic Foundation Models

Foxglove

His research shows how GPT models could actually serve as a robot 'mind' through careful training on diverse data from laboratories worldwide. His approach uses Chain of Thought (CoT) processing that combines both images and text, where some tokens represent specific actions.

The model remains auto-regressive, processing both visual and textual inputs through a shared language space - images are embedded and transformed through a learned matrix to align with the text representations. While this method requires careful data preparation unlike the base training, it opens up exciting possibilities for creating more generalized robotic intelligence. There are even methods being developed to convert robotic information into auto-regressive data before training, similar to how language models learned from the internet, potentially allowing for much deeper training.

Is it possible for a Robot mind.. to be the Chat-GPT itself??

Actuate 2024 | Sergey Levine | Robotic Foundation Models

Discuss this post on LinkedIn