Feedback about the course

I feel a bit conflicted about this course.

I spent most of my time looking a Python docs (because I rarely use it) and API docs.
Yes, I understood a little bit more about how those agents work and how the local execution environment works internally, but that was relatively trivial stuff.
Yes, this is a fun little exercise, but it’s only minimally about AI and AI agents, it’s 99% about writing glue code. I was a bit disappointed by that, but I still had fun and learned something.

I could think of two ideas to improve and expand the course:

  • For people who need to understand security implications of LLMs
    • show an example where an LLM might screw things up badly if the environment was real
    • build security measures to prevent the worst things from happening while making clear, that agents are inherently insecure if they can do significant things, especially if the have access to the web
  • For tinkerers and self hosters
    • apply the course to a local LLM / VM / VPS so people can play with their own things if they want to
3 Likes

Yep these are just base stages :slight_smile: We add “extensions” over time to cover more ground in challenges (you can see 10s of these on shell/redis for example).

That said, this challenge is about writing code. It’s not the most efficient way to learn how to use something like Claude code.

Would a “sandboxing” extension be something that you’re interested in? (Since you mentioned security implications)

1 Like

Yes, as a third step, absolutely! I’m a bit confused by that question though, because from a security perspective the next step would be to limit bad behavior right at the execution level, when the LLM requests to “rm -rf /” or if it tries to read the password file etc. and after that I’d look at sandboxing as an additional measure. Am I coming at this from the wrong angle? I mean sandboxing is not a panacea either, so anything bad that can happen should be contained as early as possible.

Ah, yep - that’s on the list too (Claude Code’s solution for this is “Command Permissions”):

You can vote on these ideas here btw: The Software Pro's Best Kept Secret.. Haven’t added sandboxing yet but will do if we get more people mentioning it!

2 Likes

Awesome, thanks!

Happy to see someone else was uncomfortable about this phase. What I did for my Bash tool was to only allow non-recursive rm and ls and only on the project directory.

2 Likes

I stuffed everything into a Docker container.

Dockerfile (for a Python-based deployment)

FROM ghcr.io/astral-sh/uv:python3.14-trixie
WORKDIR /app
COPY . .

RUN uv sync --no-dev --frozen --no-cache
RUN curl -fsSL https://tailscale.com/install.sh | sh
RUN chmod +x /app/docker/*.sh

ENTRYPOINT ["/app/docker/entrypoint.sh"]

Build using
docker build -t yinchi/llm-client .

run.sh

#!/usr/bin/env bash

docker run -it --rm --hostname llm-client --cap-add=NET_ADMIN \
    --device /dev/net/tun -v $(pwd)/secret:/app/secret yinchi/llm-client

Note the need to pass /dev/net/tun to the Docker container in order to use Tailscale. Additionally, systemctl didn’t work inside the container, so my entrypoint.sh runs it as a background task.

entrypoint.sh

#!/usr/bin/env bash
set -e

# Start the tailscaled daemon in the background
/usr/sbin/tailscaled &>/dev/null &

bash

start_tailscale.sh (run inside the container)

Had to do this because my dad’s fancy DGX Spark server is over at his place on a separate LAN:

#!/usr/bin/env bash
tailscale up --authkey=$(cat /app/secret/ts_auth_key) --hostname llm-client
1 Like

As an easier alternative to sandboxing or setting up a permissions system, the challenge could add implementing Interactive Mode before introducing tools, so that all tool calls can be confirmed first. The coder can then gradually remove the need for asking permission for everything as a proper permissions system gets built.


More feedback: the challenge was more a pain exercise in Python types than anything (I used mypy on my code to make sure I got it right). My main.py contains monstrosities such as:

from openai.types.chat.chat_completion_message_tool_call import (
    ChatCompletionMessageToolCall
)
from openai.types.chat.chat_completion_message_tool_call_param import (
    ChatCompletionMessageToolCallParam
)

Yes these are two separate classes and if you poke around the code you’ll understand why (the latter is a simple TypedDict and the former is a Pydantic class for validation).

1 Like