Gemma 4 on Raspberry Pi 5: A Surprisingly Usable Local AI Setup

In this video, I push a Raspberry Pi 5 to its limits by running Google’s Gemma 4 language model on it — no cloud, no powerful GPU, just a tiny single-board computer with 8 GB of RAM.

——–
CONNECT WITH ME
📱 Twitter / X: https://twitter.com/Zero_to_MVP
👨‍💻 Linkedin: https://www.linkedin.com/company/zero-to-mvp/
——–

⏱ Timestamps:
00:00 — Intro
01:13 – Installing LM Studio CLI
01:44 – Setting up SSD storage for models
02:08 – Gemma 4 E2B model overview
03:09 – Loading the model and API server
04:15 – Enabling local network access via socat
05:47 – Connecting the model to the Zed editor
06:55 – Coding performance test (Python)
08:01 – Creative task test (Web app ideas)
08:46 – Final results and conclusion

🔗 Related:
▶ Running Gemma 4 on MacBook and Desktop — https://youtu.be/T6AvsQVSL74?si=g1TxYDx2UPSCkd_N

📌 Tools used in this video:
– Raspberry Pi 5 (8 GB)
– LM Studio CLI
– Google Gemma 4 E2B
– tmux
– htop
– socat
– Zed Editor

👍 If you found this helpful, please like and subscribe for more dev tools and programming content!

#RaspberryPi #RaspberryPi5 #Gemma4 #LocalAI #LLM #Gemma #AI

View 31 Comments

31 Comments

@CryptoNewsTV on April 10, 2026 9:17 am

how long can it sustain usage like that? I would imagine this is a fast way to "cook" your pie…
@carlt.8266 on April 10, 2026 12:30 pm

Is LM Studio CLI not capable of showing tokens per second for an output? The GUI version is capable of it.
@agentmith on April 10, 2026 12:55 pm

There's a new AI hat that came out a few months ago that has dedicated compute for this, but only certain architecture models load and they're primarily for image recognition. Still worth looking into, just keep in mind it doesn't work for everything.
@Masivcas on April 10, 2026 1:56 pm

What a fantastic way of explaining the chain of thoughts that led you to get it working. As a non linux expert, this was really satisfying to watch from a learning point of view. And of course, what an interesting content you shared, congrats.
@RainDustCode on April 10, 2026 5:51 pm

well done young padawan
@charlesdean03 on April 11, 2026 12:22 am

BRO you know ollama has free (with limitation of token) cloud model with full 31B para!
@omsingharjit on April 11, 2026 2:14 am

Try usb based TPU if Possible
@TehGM on April 11, 2026 7:54 am

Wonder how much better it'd run if you used GGUF version. I'm no expert, but I'm guessing you could get better speeds even on a Pi. Could be useful for LLM that's used by some other tools in home network.
@eplusplus on April 11, 2026 9:19 am

very nice, I'll try this out!
@Markov39 on April 11, 2026 11:04 am

these generation speeds are not usable for coding. Maybe for a full autonomous background agent
@andresgardiol8111 on April 11, 2026 2:14 pm

Great video!
@kevriver1011 on April 11, 2026 4:30 pm

calm and informative narration, I love it! p.s. Did you check tool call works on your rpi? 👀
@DanielHa-n5j on April 12, 2026 12:17 am

와우
@FroggyTWrite on April 12, 2026 1:49 am

any chance you can test out a SBC with an NPU like ones with the rk-3588? it would be cool to see how much more performance you could get leveraging the NPU
@garic4 on April 12, 2026 5:12 am

Amazing video, and I will explain why: you made a video that answered directly the Title, no clickbait rage sht. You actually showed the config from the beginning, and got it up aand running. You showed options, and showed what YOU use. You provided the RELEVANT information about this, actual real inference speed, capabilities and performance. Tje video was short, straight to the point, no bunch of ai slop description nonsense to make it past 10min.
I'd suggest examples of real useful application tests for this setup as this is extremely interesting and validade the purpose of a Rasberry Pi for local inference – on the top of my head, I would be looking for using gemma-e2b for voice assistant for basic control of local stack – like maybe replace an Alexa or Google for home automation, Media player, etc. Or maybe local applicaiton for quick OCR, or agent-continuos workflow like organizing scan-folder , renaming files, organizing into folders according to reasoning.

That being said, please accept my LIKE, SUBSCRIBE, and COMMENT, for the Gods of the Youtube Algorithm.
You will get to 1M subscribers in no time if you follow this script! I will watch your channel for sure, as I'm unsubscribing those tiktok-wannabe-famous-youtubers puking low-effort-AI-slop into my feed.

Thank you, thank you very very much – post amazon affiliate link of your gear in the description – I was looking for a good rasberry pi 5 kit to do exactly what you did here, and it would be my pleasure to use your links and help your channel.

Cheers
@Davimejor on April 12, 2026 10:42 am

this is pretty interesting… I would be awesome if you show how to create a pi cluster to run IA the cheapest way possible (if that's possible…)
@petacardi on April 12, 2026 12:32 pm

Seeing your setup and how you use it was very satisfactory and relaxing.
@noext7001 on April 12, 2026 12:38 pm

so a 4b model is only tking 1.5gb ram on a rpi ?
@lukyawaluddin2349 on April 12, 2026 1:17 pm

awesome terminal themes, is it oh-my-zsh themes ? let me know
@hmonyqian on April 12, 2026 1:24 pm

Can the device's GPU and NPU already serve as a source of computing power? Can the utilization rate of these two modules be seen?
@edge-41 on April 12, 2026 4:00 pm

Running Gemma 4 on a Raspberry Pi 5 over the local network is a great example of practical edge AI infrastructure. Exposing the model via OpenAI-compatible API and accessing it from other devices is exactly the pattern that scales to IoT and embedded systems. For production edge deployments, the next challenge is secure OTA model updates and monitoring inference quality across a fleet of constrained devices like this.
@cackc8611 on April 12, 2026 8:07 pm

Beggining the age of local ai
@Sabotage_Labs on April 13, 2026 4:58 am

I just started running the Raspi 5 with AI Hat+ 2 and the Hailo NPU with the Ollama LLMs and OpenUI.
So far, pretty cool! Only been playing just a bit but, it works and is working pretty well. Dunno if I can say it's faster than I expected since it's the first self hosted environment I've setup for LLMs. I'm kinda being forced to learn AI as…. It's being hyped as making huge changes to the niche market I'm in …in IT. CRM, WFM, IP recording call center management software. I support large Fortune 500 companies as a systems integrator and support engineer.
I'm thinking about standing up an Openclaw server on a Raspi 4 8gb I have on the shelf. A bit curious about Openclaw. Especially…the security side of it, which is why I want to self host it.
It's a brave new world. I've been in the IT profession long enough to have seem more than a few new and whizzy things that will "revolutionize" the industry. It rarely ever actually happens. But, like the Zen Master said…."we'll see…." 😉😀
@syrus3k on April 13, 2026 8:23 am

There's a huge benefit to not being held hostage by the big guys – LLMs (especially agentic ones) are incredibly important to work today.
@vxDamianVR on April 13, 2026 10:58 am

I was going to test this, so this was perfect. Thank you.
@MediaControl-z8p on April 13, 2026 11:07 am

Socat is not needed. The right argument was –bind and not –host
@mathewmcfool on April 13, 2026 11:39 am

Gemma4 plus TurboQuant
@CONYTB-s9k on April 13, 2026 1:02 pm

The Raspberry Pi 5 is too expensive.
It is no longer appealing.
@atomictraveller on April 13, 2026 3:28 pm

why do people keep doing this without giving it a beachball body, teeth and webbed feet?
@JuanPabloMolinaMatute on April 13, 2026 8:38 pm

excellent video, the explanation and the walkthrough are very easy to understand!, Thanks for this video!
@ben_imaging on April 14, 2026 7:44 pm

Great video! It was clear and explained well enough for someone who does a little bit of vibe coding to understand. Thanks!

You must be logged in to post a comment.