Ideas for `ETag` extension for `HTTP Server` challenge

Hi! I have recently been working on implementing the support for the ETag header for my own HTTP server implementation, and I thought that it might also be a nice fit for the Codecrafters platform. I would be posting here an initial draft of how I think the extension can be made, and I would love to receive your feedback on the same.

For the beginning, I would like the ETag & Caching extension to be implemented with the help of three challenges:

  1. ETag Header Generation: Easy

  2. Handling If-None-Match header: Medium

  3. Supporting Weak ETags: Easy

The detailed descriptions of the challenges (as they might appear on the platform) can be something like this:


Challenge 1: ETag Header Generation

  • Marketing Description: In this stage, you’ll add support for ETags to your HTTP server, allowing clients to recognize when files change and enabling efficient caching.

  • Difficulty: Easy

Welcome to the ETag Caching Extension! In this extension, you’ll learn about ETags and how they are used for efficient HTTP caching.

In this stage, you’ll add support for sending an ETag header in your responses to GET /files/<filename> requests.

ETag Header

The ETag header is a string that represents a particular version of the resource. When a client requests a resource, the server includes the ETag header in the response. The client can then use this ETag value in subsequent requests to check if the resource has changed.

In this stage, your server will need to calculate the ETag for the file contents and include it in the response headers. The ETag should be calculated using the MD5 hash of the file contents and should be quoted. For example, if the file contents are hello world, the ETag would be "5eb63bbbe01eeed093cb22bb8f5acdc3".

Tests

The tester will execute your program like this:


$ ./your_server.sh --directory <directory>

The tester will also create a file called hello in the diretory with contents hello world.

It will then send a GET /files/hello request:


$ curl -i http://localhost:4221/files/hello

Your server must respond with a 200 OK response. The response should have the ETag header set to the MD5 hash of the file contents, and should be quoted. The response body should contain the file contents, and other previously required headers like Content-Type and Content-Length should be present as well.

Here’s the expected response:


HTTP/1.1 200 OK

Content-Type: application/octet-stream

Content-Length: 11

ETag: "5eb63bbbe01eeed093cb22bb8f5acdc3"

hello world


Challenge 2: Handling If-None-Match header

  • Marketing Description: In this stage, you’ll implement full ETag caching, letting clients avoid unnecessary downloads when files haven’t changed!

  • Difficulty: Medium

In this stage, you’ll complete the ETag mechanism by handling the If-None-Match request header.

How If-None-Match Works

Once the client has the ETag for a resource, it can use the If-None-Match header in subsequent requests to check if the resource has changed. The server will compare the ETag in the If-None-Match header with the current ETag for the resource.

If the ETags match, the server will respond with a 304 Not Modified response, indicating that the resource has not changed and the client can use its cached version. If the ETags do not match, the server will respond with a 200 OK response, including the new ETag and the updated resource.

In this stage, your server will need to handle the If-None-Match header in the request and respond accordingly.

Tests

The tester will run your server like this:


$ ./your_server.sh --directory <directory>

It will also create a file called hello in the directory with contents hello world.

Test 1: Cache Hit

The tester will send a GET /files/hello with If-None-Match header set to the correct ETag for the file (to simulate a cache hit):


$ curl -i http://localhost:4221/files/hello --header "If-None-Match: \"5eb63bbbe01eeed093cb22bb8f5acdc3\""

Your server must respond with a 304 Not Modified response:


HTTP/1.1 304 Not Modified

ETag: "5eb63bbbe01eeed093cb22bb8f5acdc3"

The body od the response should be empty, and thus no Content-Type or Content-Length headers should be present as well. The ETag header must be present in the response.

Test 2: Cache Miss

The tester will update the contents of the file hello to hello universe (to show a new version of the resource) and then send a GET /files/hello with the If-None-Match header set to the old outdated ETag (to simulate a cache miss):


$ curl -i http://localhost:4221/files/hello --header "If-None-Match: \"5eb63bbbe01eeed093cb22bb8f5acdc3\""

This time, your server must respond with a 200 OK response, and include both the correct ETag and the file contents in the response body:


HTTP/1.1 200 OK

Content-Type: application/octet-stream

Content-Length: 14

ETag: "342a4db326c2b213840c7d4967cb183e"

hello universe

Notes

  • Always calculate the current ETag before comparing it with the If-None-Match header.

  • Header names (If-None-Match) are case-insensitive.

  • Always include the current ETag, even in 304 responses.


Challenge 3: Weak ETags

  • Marketing Description: Learn about weak ETags and how they are used to provide weaker gurantees surrounding resource version.

  • Difficulty: Easy

In this stage, you will learn about weak ETags and how they provide weaker caching gurantees.

Weak ETags

In HTTP, ETags can be strong or weak:

  • Strong ETags: Must match byte-for-byte exact files (default behavior).

  • Weak ETags: Allow slight differences between the cached resource (like timestamps).

Weak ETags are prefixed with W/ like:


ETag: W/"etag-value"

They are useful when strong ETags (which require identical content) are impractical to generate efficiently, or for large resources that have not undergone significant revisions between requests.

In this stage, you will add support for returning Weak ETags for large files.

For the purpose of this challenge, we will assume that:

  • All the files which are longer than 1KB in size are “large” files, and thus the computation of a strong ETag for them is wasteful.

  • The weak Etag of a large file is computed as the MD5 hash of the first 1024 bytes of the file content, quoted.

Tests

The tester will run your server like this:


$ ./your_server.sh --directory <directory>

It will also create a file called hello in the directory with contents which will has the word hi repeated in it 1024 times (the file size would be 2KB and the content hihihi...).

The tester will then send a GET /files/hello:


$ curl -i http://localhost:4221/files/hello

Your server must respond with a 200 OK response and return the weak ETag for the file along the file content:


HTTP/1.1 200 OK

Content-Type: application/octet-stream

Content-Length: 2048

ETag: W"7231c8b7ea44acd0562363f248982abc"

hihihi...

It will also then test the cache-hit and cache miss (as in the last challenge) to ensure the If-None-Match functionality has no regressions.

2 Likes

Update: I had been reading more about the different HTTP headers surrounding caching, and I stumbled upon the If-Match header that makes the use of ETag values to prevent the lost update problem, and the complementary If-None-Match header. I think they can be a great addition to this extension itself, where the users can get a feel for how modern servers use the ETags for cache management in more practical settings.

If this sounds like a good idea, I can work on the plan above to incorporate these headers as well!. Let me how that sounds.

Thanks so much for this @EshaanAgg! I think the weak etags part can be skipped, reasons:

  • Weak etags aren’t really used much in practice
  • When weak etags are used, the conditions are usually more strict (things like timestamps / formatting differences) and not as loose as hashing the prefix of a file
  • A user’s implementation must pass all stages at once, so they wouldn’t be able to pass stage 1 (Etag header generation) and stage 3 (Weak ETags) at once unless they implemented conditional logic based on the contents of the file.

The other two stages look great! Was thinking if there’s a way to split these further but I couldn’t come up with anything.

We aren’t working on HTTP extensions right now, but will keep this bookmarked for when we get to it!

1 Like

Ahh makes sense! Even I do agree that the idea for the weak tags was a bit contrieved, but it was the simplest thing I could think of. Thanks for the review :slight_smile: