I spent most of my time looking a Python docs (because I rarely use it) and API docs.
Yes, I understood a little bit more about how those agents work and how the local execution environment works internally, but that was relatively trivial stuff.
Yes, this is a fun little exercise, but it’s only minimally about AI and AI agents, it’s 99% about writing glue code. I was a bit disappointed by that, but I still had fun and learned something.
I could think of two ideas to improve and expand the course:
For people who need to understand security implications of LLMs
show an example where an LLM might screw things up badly if the environment was real
build security measures to prevent the worst things from happening while making clear, that agents are inherently insecure if they can do significant things, especially if the have access to the web
For tinkerers and self hosters
apply the course to a local LLM / VM / VPS so people can play with their own things if they want to
Yes, as a third step, absolutely! I’m a bit confused by that question though, because from a security perspective the next step would be to limit bad behavior right at the execution level, when the LLM requests to “rm -rf /” or if it tries to read the password file etc. and after that I’d look at sandboxing as an additional measure. Am I coming at this from the wrong angle? I mean sandboxing is not a panacea either, so anything bad that can happen should be contained as early as possible.
You can vote on these ideas here btw: The Software Pro's Best Kept Secret.. Haven’t added sandboxing yet but will do if we get more people mentioning it!
Happy to see someone else was uncomfortable about this phase. What I did for my Bash tool was to only allow non-recursive rm and ls and only on the project directory.
FROM ghcr.io/astral-sh/uv:python3.14-trixie
WORKDIR /app
COPY . .
RUN uv sync --no-dev --frozen --no-cache
RUN curl -fsSL https://tailscale.com/install.sh | sh
RUN chmod +x /app/docker/*.sh
ENTRYPOINT ["/app/docker/entrypoint.sh"]
Note the need to pass /dev/net/tun to the Docker container in order to use Tailscale. Additionally, systemctl didn’t work inside the container, so my entrypoint.sh runs it as a background task.
entrypoint.sh
#!/usr/bin/env bash
set -e
# Start the tailscaled daemon in the background
/usr/sbin/tailscaled &>/dev/null &
bash
start_tailscale.sh (run inside the container)
Had to do this because my dad’s fancy DGX Spark server is over at his place on a separate LAN:
#!/usr/bin/env bash
tailscale up --authkey=$(cat /app/secret/ts_auth_key) --hostname llm-client
As an easier alternative to sandboxing or setting up a permissions system, the challenge could add implementing Interactive Mode before introducing tools, so that all tool calls can be confirmed first. The coder can then gradually remove the need for asking permission for everything as a proper permissions system gets built.
More feedback: the challenge was more a pain exercise in Python types than anything (I used mypy on my code to make sure I got it right). My main.py contains monstrosities such as:
from openai.types.chat.chat_completion_message_tool_call import (
ChatCompletionMessageToolCall
)
from openai.types.chat.chat_completion_message_tool_call_param import (
ChatCompletionMessageToolCallParam
)
Yes these are two separate classes and if you poke around the code you’ll understand why (the latter is a simple TypedDict and the former is a Pydantic class for validation).