-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Description
When implementing a streaming base64 decoder, it's not always easy (or even possible) to know if the current buffer will contain the final block of data to be decoded.
This means that until the next stream.Read() call returns 0 (signifying end-of-stream), most streaming base64 decoders would end up calling Base64.DecodeFromUtf8() with the isFinalBlock parameter set to false.
The expectation is that it would be possible to call Base64.DecodeFromUtf8() again with whatever remains of the input buffer along with isFinalBlock:true and get the correct results.
However, this is not possible with the current implementation of Base64.DecodeFromUtf8() in all cases (namely cases where the final quantum is split by whitespace).
Reproduction Steps
ReadOnlySpan<byte> base64Data;
var output = new byte[10];
int bytesConsumed;
int bytesWritten;
// this works correctly - it will consume 4 bytes and write 3 bytes
base64Data = "AAAA"u8;
System.Buffers.Text.Base64.DecodeFromUtf8 (base64Data, output, out bytesConsumed, out bytesWritten, isFinalBlock: false);
Console.WriteLine ($"bytesConsumed: {bytesConsumed}; bytesWritten: {bytesWritten}");
// this works correctly - it will consume 0 bytes and write 0 bytes (which correctly allows a second iteration which could pass isFinalBlock:true)
base64Data = "AAA="u8;
System.Buffers.Text.Base64.DecodeFromUtf8 (base64Data, output, out bytesConsumed, out bytesWritten, isFinalBlock: false);
Console.WriteLine ($"bytesConsumed: {bytesConsumed}; bytesWritten: {bytesWritten}");
// this has incorrect behavior - it will consume 2 bytes and write 0 bytes (which makes it impossible to recover with another call where isFinalBlock:true)
base64Data = "AA\r\nA="u8;
System.Buffers.Text.Base64.DecodeFromUtf8 (base64Data, output, out bytesConsumed, out bytesWritten, isFinalBlock: false);
Console.WriteLine ($"bytesConsumed: {bytesConsumed}; bytesWritten: {bytesWritten}");
// this has incorrect behavior - it will consume 2 bytes and write 0 bytes (which makes it impossible to recover with another call where isFinalBlock:true)
base64Data = "AA\r\nA=\r\n"u8;
System.Buffers.Text.Base64.DecodeFromUtf8 (base64Data, output, out bytesConsumed, out bytesWritten, isFinalBlock: false);
Console.WriteLine ($"bytesConsumed: {bytesConsumed}; bytesWritten: {bytesWritten}");Expected behavior
The expected behavior in the "AA\r\nA=" and "AA\r\nA=\r\n" cases is that bytesConsumed would beset to 0 because it should NOT be consuming partial quantums.
Any reasonable implementation making use of Base64.DecodeFromUtf8() will do this:
base64Data = "AA\r\nA="u8;
System.Buffers.Text.Base64.DecodeFromUtf8 (base64Data, output, out bytesConsumed, out bytesWritten, isFinalBlock: false);
// update buffer state
base64Data = base64Data.Slice(bytesConsumed);
output = output.Slice(bytesWritten);
// call again with isFinalBlock:true
System.Buffers.Text.Base64.DecodeFromUtf8 (base64Data, output, out bytesConsumed, out bytesWritten, isFinalBlock: true);This obviously doesn't work the way things are currently implemented.
Actual behavior
The actual behavior is that Base64.DecodeFromUtf8() incorrectly consumes partial quantums when said quantum is a "final quantum" (i.e. it contains '=') and is split with whitespace.
Regression?
I'm not sure if this is a regression or not. I've only tested on net8.0 andnet10.0 with the same behavior afaict.
Known Workarounds
I suppose that it would be possible to "Trim" the end of the buffer and check if the last byte is an '=' to decide if isFinalBlock should be true or false?
Configuration
I've tried both net8.0 and net10.0 with the same results.
I'm using a Microsoft Surface Laptop 7 Intel edition.
| Key | Value |
|---|---|
| Processor | Intel(R) Core(TM) Ultra 7 268V (2.20 GHz) |
| Installed RAM | 32.0 GB (31.7 GB usable) |
| System type | 64-bit operating system, x64-based processor |
Other information
No response