We value your privacy. We use cookies to enhance your browsing experience, serve personalized ads or content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies. Read our Privacy Policy for more information.

Engineering

Blogs

Smart robotics: when AI powers autonomous task execution

Monday, May 6, 2024

Stefanos Peros

Software engineer

Robotics and AI developed from different roots and disciplines yet are particularly intertwined. Advancements in AI services are revolutionizing the interface between humans and technology, mirroring the transformative journey of the Internet of Things (IoT) that has enabled hardware devices to become more interconnected and intelligent. We recently embarked on an exciting journey to develop Budd-E (Figure 1), our AI-powered remote control robot that you can see in action! In this blog post, we dive deeper into how we enable users to command the robot to perform tasks using written commands (unstructured text).

Figure 1: Meet Budd-E, our AI service robot.

Overview

At its core, Budd-E consists of a Raspberry Pi 3B+, a compact (size of a hand palm), low-cost computer that interfaces with the rest of the hardware, which includes four wheel and two servo motors, a buzzer, camera, ultrasonic sensors, speaker and LEDs. We began with this kit as our foundation and expanded upon it to develop Budd-E's AI capabilities.

Out-of-the-box, the application consists of a client, running on any mobile device or computer, and a TCP server that runs on the Pi, both programmed in Python. The client consists of a UI that enables users to manually control the robot through various buttons, much like the transmitter of a small remote control car. Under the hood, each button corresponds to an encoded command that is sent to the TCP server on the Pi to process.

Execute arbitrary tasks

We started off with the following idea: express tasks using unstructured input text, then parse them using a Large Language Model (LLM) into a finite set of well-defined instructions. As such, we described possible (COMMAND, DURATION) pairs in our system prompt, as shown in Figure 2:

Figure 2: System prompt to map unstructured text to robot commands.

For the underlying LLM model, we used OpenAI’s GPT-4 together with LangChain, an open-source framework for building LLM-powered applications, to build the prompt and process the output. Note the ‘reverse’ calibration of the robot that is happening in this prompt by specifying the duration of each command respectively. For example, it takes Budd-E two seconds to move forward one meter (e.g. due to its relatively small wheels and motors), but for a different robot the same distance could be covered in less or more time.

Figure 3: Overview of user flow from input to robot action.

With our approach, a user simply inputs their request in human language, which is translated into actions for the robot. This is quite impressive, given the much higher overhead of traditional approaches where users need to define their actions in advance using Planning Domain Definition Languages, which utilize graph-based search algorithms that find the optimal sequence of actions.

After breaking down arbitrary tasks to a structured set of commands, the next step was to map each of these commands to the corresponding TCP packet to instruct the robot. We were able to do so by identifying the exact instructions that were being sent whenever the user clicks on the various interface buttons. It is important to note that our solution completely shields the robot from additional complexity since no changes are required to its source code: the robot is unaware of how the instructions were generated. Finally, we extended the UI with a text input field for users to describe the tasks. The complete flow from user input to robot action is shown in Figure 3.

Closing statement

Just as IoT brought the power of internet connectivity to a myriad of devices, enabling them to communicate and perform more complex tasks, the integration of natural language processing capabilities into hardware devices represents a parallel leap forward. Through our journey with Budd-E, we have showcased a small yet powerful proof of concept that demonstrates the vast potential of AI to make hardware devices more intuitive and accessible to humans. Stay tuned for our upcoming blog posts, where we'll delve into the other exciting AI capabilities that we've embedded within this robot.