Most “AI + robotics” posts are still simulations and slides.
This one isn’t.
This is the story of how we wired up a Raspberry Pi CM-5, a Raspberry Pi Pico W (running MicroPython), a small OLED display, a green LED, a camera, and an RTX 5090 into a system where:
- an AI can write and deploy code to a real microcontroller,
- run it, log the result, and iterate like an engineer,
- and use vision and sensors to ground that code in the real world.
It’s “hello world”, but instead of printing a string, the world actually blinks back.
Architecture in Short
- Pico W connected via USB to a CM-5 / Pi 5.
- MicroPython runs on the Pico.
- CM-5 runs:
-
mpremotefor direct serial access, - a small Flask API (
pico_api.py) exposing:- code execution, file upload/delete, reset, run logs.
-
- Any agent or AI can:
- upload new code,
- execute it,
- read stdout/stderr and classify errors,
- store each run as JSON for later reasoning.
The system forms a programmable bridge between intelligence and hardware.
Hardware Setup
- Raspberry Pi Pico W (or Pico 2 W) connected via USB.
- OLED SSD1306 I²C display:
- SCL → GP15, SDA → GP14
- Green LED → GP16 (with resistor).
- CM-5 / Pi 5 runs Python 3 and Flask.
Copied!sudo apt install python3-venv python3 -m venv .venv source .venv/bin/activate pip install flask mpremote
Verify connection:
Copied!mpremote connect auto >>> # MicroPython REPL should appear
The Pico API (pico_api.py)
A lightweight Flask server wraps mpremote and gives the AI a safe, auditable way to manipulate the board.
Key Endpoints
| Method | Path | Description |
|---|---|---|
POST | /pico/runs/exec_snippet | Run a MicroPython snippet |
POST | /pico/runs/exec_file | Execute existing file |
PUT | /pico/files | Upload / overwrite file |
GET | /pico/files | List files |
GET | /pico/files/content | Read a file |
DELETE | /pico/files | Remove file |
POST | /pico/reset | Soft / hard reset |
GET | /pico/info | FS info |
GET | /pico/runs | List logged runs |
Every run is logged as a JSON file in ./pico_runs, including code, stdout, stderr, status and duration.
The AI can then analyse those logs to learn what worked, what failed, and why.
Hello, LED
Copied!curl -X POST http://cm5.ipaddress:8000/pico/runs/exec_snippet \ -H "Content-Type: application/json" \ -d '{ "code": "import machine, time\nled = machine.Pin(16, machine.Pin.OUT)\nled.value(1)\nprint(\"LED ON\")\ntime.sleep(1)\nled.value(0)\nprint(\"LED OFF\")", "timeout": 3.0 }'
Result:
Copied!{ "ok": true, "status": "success", "stdout": "LED ON\nLED OFF\n", "duration_ms": 1012.5 }
A real LED just blinked because of a JSON request.
And that tiny event is where machine learning meets machine doing.
Deploying Code as Files
PUT /pico/files uploads a file:
Copied!curl -X PUT http://cm5.ipaddress:8000/pico/files \ -H "Content-Type: application/json" \ -d '{ "path": ":experiments/led_test1.py", "content": "from machine import Pin\nimport time\nled = Pin(16, Pin.OUT)\nled.value(1)\ntime.sleep(1)\nled.value(0)\n" }'
Check on device:
Copied!mpremote connect auto fs ls experiments
And execute via:
Copied!exec(open("experiments/led_test1.py").read(), )
All remotely managed, fully logged.
Demo: LED + OLED “SOS”
Copied!curl -X POST http://cm5.ipaddress:8000/pico/runs/exec_snippet \ -H "Content-Type: application/json" \ -d '{ "code": "from machine import Pin, SoftI2C\nimport ssd1306, time\n\nled = Pin(16, Pin.OUT)\ni2c = SoftI2C(scl=Pin(15), sda=Pin(14))\noled = ssd1306.SSD1306_I2C(128, 64, i2c)\n\n# Morse helpers\ndef dot(): led(1); time.sleep(0.2); led(0); time.sleep(0.2)\ndef dash(): led(1); time.sleep(0.6); led(0); time.sleep(0.2)\n\ndef sos():\n for ch in ['S','O','S']:\n (dot,dash)[ch=='O'](); (dot,dash)[ch=='O'](); (dot,dash)[ch=='O'](); time.sleep(0.4)\n\noled.fill(0); oled.text('SOS',45,10); oled.show()\nsos()\noled.fill(0); oled.text('DONE',40,30); oled.show()", "timeout": 10.0 }'
Continuous Behavior via main.py
Upload a main.py with a breathing LED and progress bar:
Copied!curl -X PUT http://cm5.ipaddress:8000/pico/files \ -H "Content-Type: application/json" \ -d '{"path":":main.py","content":"<...code omitted for brevity...>"}' curl -X POST http://cm5.ipaddress:8000/pico/reset -H "Content-Type: application/json" -d ''
Now the Pico boots autonomously, displaying a “breathing” LED and progress bar — a visual heartbeat.
Fast vs Deep Perception
Once the physical control layer works, we add sight and meaning.
1. Edge Perception — CM-5 + YOLO
The CM-5 continuously processes its Picam feed using a lightweight YOLO model.
It looks for triggers like a cat, a human, movement, or something new in the room.
Only when an event is interesting enough does it escalate.
2. Deep Scene Understanding — RTX 5090 + LLM Studio
When a frame is promoted, it’s sent to a GPU box running LLM Studio.
There, a large model performs full scene parsing:
- Who’s in the room, what are they doing?
- What objects, text, or interactions exist?
- What’s changed since the last observation?
The output becomes structured world data — not just pixels, but facts:
Copied!{ "time": "2025-11-07T19:32:00Z", "entities": [ {"type":"person","count":2,"actions":["typing","talking"]}, {"type":"cat","count":1,"action":"sleeping"} ], "objects":["desk","keyboard","monitor","coffee_cup"], "summary":"Two people working at a desk with a cat nearby." }
This world model lives as a JSON state the AI can query, reason over, and extend.
The CM-5 thus acts as a bouncer, and the RTX 5090 as the scribe and philosopher.
Ephemeral Micro-Apps on the Pico
In traditional robotics you flash one giant firmware with all libraries baked in.
In this design, the Pico becomes a throw-away runtime.
The AGX Orin (the planner) can command the CM-5:
“Deploy this 3 KB script to the Pico, connect to that Wi-Fi,
read those sensors, send data back, then self-wipe.”
Each mission uses a new, minimal piece of code:
- No heavy motor or display libraries unless required.
- Lower memory footprint.
- Easier reasoning for the AI (“I only need code for this goal”).
- Higher security (no permanent broad-capability firmware).
It’s like micro-containers for embedded devices — single-purpose, temporary, and fully auditable.
Autonomous Coordination
The full system ties together like this:
| Role | Hardware | Responsibility |
|---|---|---|
| Vision / Perception | CM-5 + Picam | Detect motion, cats, people; decide when to escalate |
| Deep Analysis | RTX 5090 LLM Studio | Parse images in detail, update world model |
| Planning / Orchestration | AGX Orin | Decide where to go, which sensors to use, generate new Pico code |
| Execution / Interface | Pico W nodes | Run short-lived code snippets for local sensing or actuation |
Even if the CM-5 is 100 km away, the Orin can drive it over VPN/Wi-Fi,
and through it, deploy fresh code to local Pico nodes — dynamically exploring new networks or environments.
Over time the AI learns:
- which pins map to which hardware,
- which networks are reachable,
- what each sensor measures,
- and how to optimise its own code for speed, noise or accuracy.
It’s a self-organising embedded ecosystem.
Why This Matters
This isn’t just about blinking LEDs, it’s about building a living, programmable feedback loop between thought and world.
- The AI writes code (agency).
- The hardware executes it (embodiment).
- Sensors and cameras respond (perception).
- Logs and analysis refine the model (learning).
And the entire chain is open, inspectable, and under your control.
Every new node, every run, every image adds to its understanding of reality until you no longer have a static device, but a small, distributed intelligence
that can see, act, reflect, and adapt.
Next up: Part II – World Ingestion and Semantic Mapping,
where we’ll show how the AI merges vision, sensor streams and logs into one evolving world model.
I know, you want more code and see it do what it does. If I have time, I will clean up the github repo. But as soon as things work, I change everything and add something new. Everything is always a work in progress…


Leave a Reply