Receiving invalid message ids

I’m stuck on stage #nd2.

To implement downloading the mentioned piece, I am basically creating a struct called Peer that handles all the communication with a remote peer. On its instantiation, it performs the handshake, waits for a bitfield message, sends an interested message, and then waits for an unchoke message.

After all this process is complete, it then makes requests for all the blocks of the pieces, and then registers a global handler that reads messages from the peer connection, and if it’s a piece message, then handles it appropriately.

I am puzzled by the invalid message codes that I am receiving. I am accounting for empty keep alive messages as well, but still no luck.

Here are my logs:

remote: [tester::#ND2] Running tests for Stage #ND2 (Download a piece)
remote: [tester::#ND2] Running ./your_program.sh download_piece -o /tmp/torrents3482308911/piece-9 /tmp/torrents3482308911/itsworking.gif.torrent 9
remote: [your_program] [165.232.38.164:51548] Initializing handshake
remote: [your_program] [165.232.38.164:51548] Handshaking completed
remote: [your_program] [165.232.38.164:51548] Recieved bitfield message
remote: [your_program] [165.232.38.164:51548] Sent interested message
remote: [your_program] [165.232.38.164:51548] Recieved unchoke message
remote: [your_program] [165.232.38.164:51548] sent request message for piece_idx = 9, block_idx = 2
remote: [your_program] [165.232.38.164:51548] sent request message for piece_idx = 9, block_idx = 0
remote: [your_program] [165.232.38.164:51548] sent request message for piece_idx = 9, block_idx = 1
remote: [your_program] [165.232.38.164:51548] sent request message for piece_idx = 9, block_idx = 6
remote: [your_program] [165.232.38.164:51548] sent request message for piece_idx = 9, block_idx = 3
remote: [your_program] [165.232.38.164:51548] sent request message for piece_idx = 9, block_idx = 4
remote: [your_program] [165.232.38.164:51548] sent request message for piece_idx = 9, block_idx = 5
remote: [your_program] [165.232.38.164:51548] sent request message for piece_idx = 9, block_idx = 8
remote: [your_program] [165.232.38.164:51548] sent request message for piece_idx = 9, block_idx = 7
remote: [your_program] [165.232.38.164:51548] sent request message for piece_idx = 9, block_idx = 9
remote: [your_program] [165.232.38.164:51548] sent request message for piece_idx = 9, block_idx = 10
remote: [your_program] [165.232.38.164:51548] sent request message for piece_idx = 9, block_idx = 11
remote: [your_program] [165.232.38.164:51548] recieved piece message for piece_idx = 9, begin_idx = 2, block_len = 16384
remote: [your_program] [165.232.38.164:51548] recieved message with ID = 1, expected PIECE_MESSAGE_ID = 7
remote: [your_program] [165.232.38.164:51548] recieved message with ID = 204, expected PIECE_MESSAGE_ID = 7
remote: [tester::#ND2] timed out, test exceeded 10 seconds
remote: [tester::#ND2] Test failed
remote: 

My best guess would be that the way I am using the RecieveMessage is incorrect, but I don’t see anything wrong with it out of the box:

// RecieveMessage reads a message from the peer.
// It first reads the 4-byte length prefix, then reads the message of that length.
func (p *Peer) RecieveMessage() ([]byte, error) {
	lengthPrefix := make([]byte, 4)
	_, err := p.conn.Read(lengthPrefix)
	if err != nil {
		return nil, fmt.Errorf("error reading message length: %v", err)
	}
	length := binary.BigEndian.Uint32(lengthPrefix)

	// Keep-alive message, so recursively call itself
	if length == 0 {
		return p.RecieveMessage()
	}

	message := make([]byte, length)
	_, err = p.conn.Read(message)
	if err != nil {
		return nil, fmt.Errorf("error reading message body: %v", err)
	}

	return message, nil
}

Here is all of my code, anyway, if it helps. Would appreciate any pointers or gotcha’s that I might not be handling properly.

Hey @EshaanAgg, I added a couple of logs here:

While the first piece message looks valid, the subsequent ones are showing extremely large (namely wrong) message lengths:

This suggests a potential concurrency issue.

As a next step, I’d recommend holding off on sending all requests in goroutines for now:

Try running the requests sequentially and verify if the message lengths remain consistent.

Once stable, you can reintroduce concurrency with proper safeguards.

Even I thought that might be the case, but even after removing all goroutines, and just keeping one separate thread to manage the message from the peer, I am still running into the same issue.

I had updated the logging to:


// It first reads the 4-byte length prefix, then reads the message of that length.
func (p *Peer) RecieveMessage() ([]byte, error) {
	lengthPrefix := make([]byte, 4)
	_, err := p.conn.Read(lengthPrefix)
	if err != nil {
		return nil, fmt.Errorf("error reading message length: %w", err)
	}
	length := binary.BigEndian.Uint32(lengthPrefix)
	p.Log("Recieved message length: %d (%q)", length, lengthPrefix)

	// Keep-alive message, so recursively call itself
	if length == 0 {
		return p.RecieveMessage()
	}

	message := make([]byte, length)
	n, err := p.conn.Read(message)
	if err != nil {
		return nil, fmt.Errorf("error reading message body: %w", err)
	}
	if n != int(length) {
		return nil, fmt.Errorf("expected to read %d bytes, but got %d", length, n)
	}
	p.Log("Recieved message bytes: %q", message[:min(n, 20)])

	return message, nil
}

Now the logs show something of a more interesting error:

remote: [tester::#ND2] Running tests for Stage #ND2 (Download a piece)
remote: [tester::#ND2] Running ./your_program.sh download_piece -o /tmp/torrents2558714396/piece-0 /tmp/torrents2558714396/codercat.gif.torrent 0
remote: [your_program] [165.232.38.164:51589] Initializing handshake
remote: [your_program] [165.232.38.164:51589] Handshaking completed
remote: [your_program] [165.232.38.164:51589] Recieved message length: 3 ("\x00\x00\x00\x03")
remote: [your_program] [165.232.38.164:51589] Recieved message bytes: "\x05\xff\xf0"
remote: [your_program] [165.232.38.164:51589] Recieved bitfield message
remote: [your_program] [165.232.38.164:51589] Sent interested message
remote: [your_program] [165.232.38.164:51589] Recieved message length: 1 ("\x00\x00\x00\x01")
remote: [your_program] [165.232.38.164:51589] Recieved message bytes: "\x01"
remote: [your_program] [165.232.38.164:51589] Recieved unchoke message
remote: [your_program] [165.232.38.164:51589] sent request message for piece_idx = 0, block_idx = 0
remote: [your_program] [165.232.38.164:51589] sent request message for piece_idx = 0, block_idx = 1
remote: [your_program] [165.232.38.164:51589] sent request message for piece_idx = 0, block_idx = 2
remote: [your_program] [165.232.38.164:51589] sent request message for piece_idx = 0, block_idx = 3
remote: [your_program] [165.232.38.164:51589] sent request message for piece_idx = 0, block_idx = 4
remote: [your_program] [165.232.38.164:51589] sent request message for piece_idx = 0, block_idx = 5
remote: [your_program] [165.232.38.164:51589] sent request message for piece_idx = 0, block_idx = 6
remote: [your_program] [165.232.38.164:51589] sent request message for piece_idx = 0, block_idx = 7
remote: [your_program] [165.232.38.164:51589] sent request message for piece_idx = 0, block_idx = 8
remote: [your_program] [165.232.38.164:51589] sent request message for piece_idx = 0, block_idx = 9
remote: [your_program] [165.232.38.164:51589] sent request message for piece_idx = 0, block_idx = 10
remote: [your_program] [165.232.38.164:51589] sent request message for piece_idx = 0, block_idx = 11
remote: [your_program] [165.232.38.164:51589] sent request message for piece_idx = 0, block_idx = 12
remote: [your_program] [165.232.38.164:51589] sent request message for piece_idx = 0, block_idx = 13
remote: [your_program] [165.232.38.164:51589] sent request message for piece_idx = 0, block_idx = 14
remote: [your_program] [165.232.38.164:51589] sent request message for piece_idx = 0, block_idx = 15
remote: [your_program] [165.232.38.164:51589] Recieved message length: 16393 ("\x00\x00@\t")
remote: [your_program] [165.232.38.164:51589] error receiving message: expected to read 16393 bytes, but got 13676
remote: [your_program] [165.232.38.164:51589] Recieved message length: 1642310103 ("a\xe3\xa9\xd7")
remote: [your_program] [165.232.38.164:51589] error receiving message: expected to read 1642310103 bytes, but got 27356
remote: [your_program] [165.232.38.164:51589] Recieved message length: 904298258 ("5\xe6\x7f\x12")
remote: [your_program] [165.232.38.164:51589] error receiving message: expected to read 904298258 bytes, but got 221308
remote: [tester::#ND2] timed out, test exceeded 10 seconds
remote: [tester::#ND2] Test failed
remote: 
remote: View our article on debugging test failures: https://codecrafters.io/debug
remote: 

As you can see in the first peer response, the expected length of the payload is 16393, but the peer is only responding with 13676 bytes. What can be the reason for the same? Is it possible that the peer is buffering data, and I would need to change the p.RecieveMessage to block until it reads the complete expected n bytes? (I tried to see a couple of code examples, but none of them seems to be explicitly doing this)

This indeed was the error. Using io.ReadFull to read until the buffer is full solved the problem! Thanks for all the debugging help :slight_smile:

1 Like