Skip to content

Base64.DecodeFromUtf8() has incorrect behavior when a final quantum is split by whitespace and isFinalBlock=false #123311

@jstedfast

Description

@jstedfast

Description

When implementing a streaming base64 decoder, it's not always easy (or even possible) to know if the current buffer will contain the final block of data to be decoded.

This means that until the next stream.Read() call returns 0 (signifying end-of-stream), most streaming base64 decoders would end up calling Base64.DecodeFromUtf8() with the isFinalBlock parameter set to false.

The expectation is that it would be possible to call Base64.DecodeFromUtf8() again with whatever remains of the input buffer along with isFinalBlock:true and get the correct results.

However, this is not possible with the current implementation of Base64.DecodeFromUtf8() in all cases (namely cases where the final quantum is split by whitespace).

Reproduction Steps

ReadOnlySpan<byte> base64Data;
var output = new byte[10];
int bytesConsumed;
int bytesWritten;

// this works correctly - it will consume 4 bytes and write 3 bytes
base64Data = "AAAA"u8;
System.Buffers.Text.Base64.DecodeFromUtf8 (base64Data, output, out bytesConsumed, out bytesWritten, isFinalBlock: false);
Console.WriteLine ($"bytesConsumed: {bytesConsumed}; bytesWritten: {bytesWritten}");

// this works correctly - it will consume 0 bytes and write 0 bytes (which correctly allows a second iteration which could pass isFinalBlock:true)
base64Data = "AAA="u8;
System.Buffers.Text.Base64.DecodeFromUtf8 (base64Data, output, out bytesConsumed, out bytesWritten, isFinalBlock: false);
Console.WriteLine ($"bytesConsumed: {bytesConsumed}; bytesWritten: {bytesWritten}");

// this has incorrect behavior - it will consume 2 bytes and write 0 bytes (which makes it impossible to recover with another call where isFinalBlock:true)
base64Data = "AA\r\nA="u8;
System.Buffers.Text.Base64.DecodeFromUtf8 (base64Data, output, out bytesConsumed, out bytesWritten, isFinalBlock: false);
Console.WriteLine ($"bytesConsumed: {bytesConsumed}; bytesWritten: {bytesWritten}");

// this has incorrect behavior - it will consume 2 bytes and write 0 bytes (which makes it impossible to recover with another call where isFinalBlock:true)
base64Data = "AA\r\nA=\r\n"u8;
System.Buffers.Text.Base64.DecodeFromUtf8 (base64Data, output, out bytesConsumed, out bytesWritten, isFinalBlock: false);
Console.WriteLine ($"bytesConsumed: {bytesConsumed}; bytesWritten: {bytesWritten}");

Expected behavior

The expected behavior in the "AA\r\nA=" and "AA\r\nA=\r\n" cases is that bytesConsumed would beset to 0 because it should NOT be consuming partial quantums.

Any reasonable implementation making use of Base64.DecodeFromUtf8() will do this:

base64Data = "AA\r\nA="u8;
System.Buffers.Text.Base64.DecodeFromUtf8 (base64Data, output, out bytesConsumed, out bytesWritten, isFinalBlock: false);

// update buffer state
base64Data = base64Data.Slice(bytesConsumed);
output = output.Slice(bytesWritten);

// call again with isFinalBlock:true
System.Buffers.Text.Base64.DecodeFromUtf8 (base64Data, output, out bytesConsumed, out bytesWritten, isFinalBlock: true);

This obviously doesn't work the way things are currently implemented.

Actual behavior

The actual behavior is that Base64.DecodeFromUtf8() incorrectly consumes partial quantums when said quantum is a "final quantum" (i.e. it contains '=') and is split with whitespace.

Regression?

I'm not sure if this is a regression or not. I've only tested on net8.0 andnet10.0 with the same behavior afaict.

Known Workarounds

I suppose that it would be possible to "Trim" the end of the buffer and check if the last byte is an '=' to decide if isFinalBlock should be true or false?

Configuration

I've tried both net8.0 and net10.0 with the same results.

I'm using a Microsoft Surface Laptop 7 Intel edition.

Key Value
Processor Intel(R) Core(TM) Ultra 7 268V (2.20 GHz)
Installed RAM 32.0 GB (31.7 GB usable)
System type 64-bit operating system, x64-based processor

Other information

No response

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions