ReadStream Concept Design
Overview
This document describes the design of the ReadStream concept: the fundamental partial-read primitive in the concept hierarchy. It explains why read_some is the correct building block, how composed algorithms build on top of it, and the relationship to ReadSource.
Definition
template<typename T>
concept ReadStream =
requires(T& stream, mutable_buffer_archetype buffers)
{
{ stream.read_some(buffers) } -> IoAwaitable;
requires awaitable_decomposes_to<
decltype(stream.read_some(buffers)),
std::error_code, std::size_t>;
};
A ReadStream provides a single operation:
read_some(buffers) — Partial Read
Reads one or more bytes from the stream into the buffer sequence. Returns (error_code, std::size_t) where n is the number of bytes read.
Semantics
-
On success:
!ec,n >= 1andn <= buffer_size(buffers). -
On EOF:
ec == cond::eof,n == 0. -
On error:
ec,n == 0. -
If
buffer_empty(buffers): completes immediately,!ec,n == 0.
The caller must not assume the buffer is filled. read_some may return fewer bytes than the buffer can hold. This is the defining property of a partial-read primitive.
Once read_some returns an error (including EOF), the caller must not call read_some again. The stream is done. Not all implementations can reproduce a prior error on subsequent calls, so the behavior after an error is undefined.
Buffers in the sequence are filled completely before proceeding to the next buffer in the sequence.
Concept Hierarchy
ReadStream is the base of the read-side hierarchy:
ReadStream { read_some }
|
v
ReadSource { read_some, read }
ReadSource refines ReadStream. Every ReadSource is a ReadStream. Algorithms constrained on ReadStream accept both raw streams and sources. The ReadSource concept adds a complete-read primitive on top of the partial-read primitive.
This mirrors the write side:
WriteStream { write_some }
|
v
WriteSink { write_some, write, write_eof(buffers), write_eof() }
Composed Algorithms
Three composed algorithms build on read_some:
read(stream, buffers) — Fill a Buffer Sequence
auto read(ReadStream auto& stream,
MutableBufferSequence auto const& buffers)
-> io_task<std::size_t>;
Loops read_some until the entire buffer sequence is filled or an error (including EOF) occurs. On success, n == buffer_size(buffers).
template<ReadStream Stream>
task<> read_header(Stream& stream)
{
char header[16];
auto [ec, n] = co_await read(
stream, mutable_buffer(header));
if(ec == cond::eof)
co_return; // clean shutdown
if(ec)
co_return;
// header contains exactly 16 bytes
}
read(stream, dynamic_buffer) — Read Until EOF
auto read(ReadStream auto& stream,
DynamicBufferParam auto&& buffers,
std::size_t initial_amount = 2048)
-> io_task<std::size_t>;
Reads from the stream into a dynamic buffer until EOF is reached. The buffer grows with a 1.5x factor when filled. On success (EOF), ec is clear and n is the total bytes read.
template<ReadStream Stream>
task<std::string> slurp(Stream& stream)
{
std::string body;
auto [ec, n] = co_await read(
stream, string_dynamic_buffer(&body));
if(ec)
co_return {};
co_return body;
}
read_until(stream, dynamic_buffer, match) — Delimited Read
Reads from the stream into a dynamic buffer until a delimiter or match condition is found. Used for line-oriented protocols and message framing.
template<ReadStream Stream>
task<> read_line(Stream& stream)
{
std::string line;
auto [ec, n] = co_await read_until(
stream, string_dynamic_buffer(&line), "\r\n");
if(ec)
co_return;
// line contains data up to and including "\r\n"
}
Use Cases
Incremental Processing with read_some
When processing data as it arrives without waiting for a full buffer, read_some is the right choice. This is common for real-time data or when the processing can handle partial input.
template<ReadStream Stream>
task<> echo(Stream& stream, WriteStream auto& dest)
{
char buf[4096];
for(;;)
{
auto [ec, n] = co_await stream.read_some(
mutable_buffer(buf));
if(ec == cond::eof)
co_return;
if(ec)
co_return;
// Forward whatever we received immediately
auto [wec, nw] = co_await dest.write_some(
const_buffer(buf, n));
if(wec)
co_return;
}
}
Relaying from ReadStream to WriteStream
When relaying data from a reader to a writer, read_some feeds write_some directly. This is the fundamental streaming pattern.
template<ReadStream Src, WriteStream Dest>
task<> relay(Src& src, Dest& dest)
{
char storage[65536];
circular_dynamic_buffer cb(storage, sizeof(storage));
for(;;)
{
// Read into free space
auto mb = cb.prepare(cb.capacity());
auto [rec, nr] = co_await src.read_some(mb);
cb.commit(nr);
if(rec && rec != cond::eof)
co_return;
// Drain to destination
while(cb.size() > 0)
{
auto [wec, nw] = co_await dest.write_some(
cb.data());
if(wec)
co_return;
cb.consume(nw);
}
if(rec == cond::eof)
co_return;
}
}
Because ReadSource refines ReadStream, this relay function also accepts ReadSource types. An HTTP body source or a decompressor can be relayed to a WriteStream using the same function.
Relationship to the Write Side
| Read Side | Write Side |
|---|---|
|
|
|
|
|
No write-side equivalent |
|
|
Design Foundations: Why Errors Exclude Data
The read_some contract requires that n is 0 whenever ec is set. Data and errors are mutually exclusive outcomes. This is the most consequential design decision in the ReadStream concept, with implications for every consumer of read_some in the library. The rule follows Asio’s established AsyncReadStream contract, is reinforced by the behavior of POSIX and Windows I/O system calls, and produces cleaner consumer code. This section explains the design and its consequences.
Reconstructing Kohlhoff’s Reasoning
Christopher Kohlhoff’s Asio library defines an AsyncReadStream concept with the identical requirement: on error, bytes_transferred is 0. No design rationale document accompanies this rule. The reasoning presented here was reconstructed from three sources:
-
The Asio source code. The function
non_blocking_recv1insocket_ops.ippexplicitly setsbytes_transferred = 0on every error path. The functioncomplete_iocp_recvmaps Windows IOCP errors to portable error codes, relying on the operating system’s guarantee that failed completions report zero bytes. These are deliberate choices, not accidental pass-through of OS behavior. -
A documentation note Kohlhoff left. Titled "Why EOF is an error," it gives two reasons: composed operations need EOF-as-error to report contract violations, and EOF-as-error disambiguates the end of a stream from a successful zero-byte read. The note is terse but the implications are deep.
-
Analysis of the underlying system calls. POSIX
recv()and WindowsWSARecv()both enforce a binary outcome per call: data or error, never both. This is not because the C++ abstraction copied the OS, but because both levels face the same fundamental constraint.
The following sections examine each of these points and their consequences.
Alignment with Asio
Asio’s AsyncReadStream concept has enforced the same rule for over two decades: on error, bytes_transferred is 0. This is a deliberate design choice, not an accident. The Asio source code explicitly zeroes bytes_transferred on every error path, and the underlying system calls (POSIX recv(), Windows IOCP) enforce binary outcomes at the OS level. The read_some contract follows this established practice.
The Empty-Buffer Rule
Every ReadStream must support the following:
read_some(empty_buffer)completes immediately with{success, 0}.
This is a no-op. The caller passed no buffer space, so no I/O is attempted. The operation does not inspect the stream’s internal state because that would require a probe capability — a way to ask "is there data? is the stream at EOF?" — without actually reading. Not every source supports probing. A TCP socket does not know that its peer has closed until it calls recv() and gets 0 back. A pipe does not know it is broken until a read fails. The empty-buffer rule is therefore unconditional: return {success, 0} regardless of the stream’s state.
This rule is a natural consequence of the contract, not a proof of it. When no I/O is attempted, no state is discovered and no error is reported.
Why EOF Is an Error
Kohlhoff’s documentation note gives two reasons for making EOF an error code rather than a success:
Composed operations need EOF-as-error to report contract violations. The composed read(stream, buffer(buf, 100)) promises to fill exactly 100 bytes. If the stream ends after 50, the operation did not fulfill its contract. Reporting {success, 50} would be misleading — it suggests the operation completed normally. Reporting {eof, 50} tells the caller both what happened (50 bytes landed in the buffer) and why the operation stopped (the stream ended). EOF-as-error is the mechanism by which composed operations explain early termination.
EOF-as-error disambiguates the empty-buffer no-op from the end of a stream. Without EOF-as-error, both read_some(empty_buffer) on a live stream and read_some(non_empty_buffer) on an exhausted stream would produce {success, 0}. The caller could not distinguish "I passed no buffer" from "the stream is done." Making EOF an error code separates these two cases cleanly.
These two reasons reinforce each other. Composed operations need EOF to be an error code so they can report early termination. The empty-buffer rule needs EOF to be an error code so {success, 0} is unambiguously a no-op. Together with the rule that errors exclude data, read_some results form a clean trichotomy: success with data, or an error (including EOF) without data.
The Write-Side Asymmetry
On the write side, WriteSink provides write_eof(buffers) to atomically combine the final data with the EOF signal. A natural question follows: if the write side fuses data with EOF, why does the read side forbid it?
The answer is that the two sides of the I/O boundary have different roles. The writer decides when to signal EOF. The reader discovers it. This asymmetry has three consequences:
write_eof exists for correctness, not convenience. Protocol framings require the final data and the EOF marker to be emitted together so the peer observes a complete message. HTTP chunked encoding needs the terminal 0\r\n\r\n coalesced with the final data chunk. A TLS session needs the close-notify alert coalesced with the final application data. A compressor needs Z_FINISH applied to the final input. These are correctness requirements, not optimizations. On the read side, whether the last bytes arrive with EOF or on a separate call does not change what the reader observes. The data and the order are identical either way.
write_eof is a separate function the caller explicitly invokes. write_some never signals EOF. The writer opts into data-plus-EOF by calling a different function. The call site reads write_eof(data) and the intent is unambiguous. If read_some could return data with EOF, every call to read_some would sometimes be a data-only operation and sometimes a data-plus-EOF operation. The stream decides which mode the caller gets, at runtime. Every call site must handle both possibilities. The burden falls on every consumer in the codebase, not on a single call site that opted into the combined behavior.
A hypothetical read_eof makes no sense. On the write side, write_eof exists because the producer signals the end of data. On the read side, the consumer does not tell the stream to end — it discovers that the stream has ended. EOF flows from producer to consumer, not the reverse. There is no action the reader can take to "read the EOF." The reader discovers EOF as a side effect of attempting to read.
A Clean Trichotomy
With the current contract, every read_some result falls into exactly one of three mutually exclusive cases:
-
Success:
!ec,n >= 1— data arrived, process it. -
EOF:
ec == cond::eof,n == 0— stream ended, no data. -
Error:
ec,n == 0— failure, no data.
Data is present if and only if the operation succeeded. This invariant — data implies success — eliminates an entire category of reasoning from every read loop. The common pattern is:
auto [ec, n] = co_await stream.read_some(buf);
if(ec)
break; // EOF or error -- no data to handle
process(buf, n); // only reached on success, n >= 1
If read_some could return n > 0 with EOF, the loop becomes:
auto [ec, n] = co_await stream.read_some(buf);
if(n > 0)
process(buf, n); // must handle data even on EOF
if(ec)
break;
Every consumer pays this tax: an extra branch to handle data accompanying EOF. The branch is easy to forget. Forgetting it silently drops the final bytes of the stream — a bug that only manifests when the source delivers EOF with its last data rather than on a separate call. A TCP socket receiving data in one packet and FIN in another will not trigger the bug. A memory source that knows its remaining length will. The non-determinism makes the bug difficult to reproduce and diagnose.
The clean trichotomy eliminates this class of bugs entirely.
Conforming Sources
Every concrete ReadStream implementation naturally separates its last data delivery from its EOF signal:
-
TCP sockets:
read_somemaps to a singlerecv()orWSARecv()call, returning whatever the kernel has buffered. The kernel delivers bytes on one call and returns 0 on the next. The separation is inherent in the POSIX and Windows APIs. -
TLS streams:
read_somedecrypts and returns one TLS record’s worth of application data. The close-notify alert arrives as a separate record. -
HTTP content-length body: the source delivers bytes up to the content-length limit. Once the limit is reached, the next
read_somereturns EOF. -
HTTP chunked body: the unchunker delivers decoded data from chunks. The terminal
0\r\n\r\nis parsed on a separate pass that returns EOF. -
Compression (inflate): the decompressor delivers output bytes. When
Z_STREAM_ENDis detected, the next read returns EOF. -
Memory source: returns
min(requested, remaining)bytes. Whenremainingreaches 0, the next call returns EOF. -
QUIC streams:
read_somereturns data from received QUIC frames. Stream FIN is delivered as EOF on a subsequent call. -
Buffered read streams:
read_somereturns data from an internal buffer, refilling from the underlying stream when empty. EOF propagates from the underlying stream. -
Test mock streams:
read_somereturns configurable data and error sequences for testing.
No source is forced into an unnatural pattern. The read_some call that discovers EOF is the natural result of attempting to read from an exhausted stream — not a separate probing step. Once the caller receives EOF, it stops reading.
Composed Operations and Partial Results
The composed read algorithm (and ReadSource::read) does report n > 0 on EOF, because it accumulates data across multiple internal read_some calls. When the underlying stream signals EOF mid-accumulation, discarding the bytes already gathered would be wrong. The caller needs n to know how much valid data landed in the buffer.
The design separates concerns cleanly: the single-shot primitive (read_some) delivers unambiguous results with a clean trichotomy. Composed operations that accumulate state (read) report what they accumulated, including partial results on EOF. Callers who need partial-on-EOF semantics get them through the composed layer, while the primitive layer remains clean.
Evidence from the Asio Implementation
The Asio source code confirms this design at every level.
On POSIX platforms, non_blocking_recv1 in socket_ops.ipp calls recv() and branches on the result. If recv() returns a positive value, the bytes are reported as a successful transfer. If recv() returns 0 on a stream socket, EOF is reported. If recv() returns -1, the function explicitly sets bytes_transferred = 0 before returning the error. The POSIX recv() system call itself enforces binary outcomes: it returns N > 0 on success, 0 on EOF, or -1 on error. A single call never returns both data and an error.
On Windows, complete_iocp_recv processes the results from GetQueuedCompletionStatus. It maps ERROR_NETNAME_DELETED to connection_reset and ERROR_PORT_UNREACHABLE to connection_refused. Windows IOCP similarly reports zero bytes_transferred on failed completions. The operating system enforces the same binary outcome per I/O completion.
The one edge case is POSIX signal interruption (EINTR). If a signal arrives after recv() has already copied some bytes, the kernel returns the partial byte count as success rather than -1/EINTR. Asio handles this transparently by retrying on EINTR, so the caller never observes it. Even the kernel does not combine data with an error — it chooses to report the partial data as success.
Convergent Design with POSIX
POSIX recv() independently enforces the same rule: N > 0 on success, -1 on error, 0 on EOF. The kernel never returns "here are your last 5 bytes, and also EOF." It delivers the available bytes on one call and returns 0 on the next. This is not because the C++ abstraction copied POSIX semantics. It is because the kernel faces the same fundamental constraint: state is discovered through the act of I/O. The alignment between read_some and recv() is convergent design, not leaky abstraction.
Summary
ReadStream provides read_some as the single partial-read primitive. This is deliberately minimal:
-
Algorithms that need to fill a buffer completely use the
readcomposed algorithm. -
Algorithms that need delimited reads use
read_until. -
Algorithms that need to process data as it arrives use
read_somedirectly. -
ReadSourcerefinesReadStreamby addingreadfor complete-read semantics.
The contract that errors exclude data follows Asio’s established AsyncReadStream contract, aligns with POSIX and Windows system call semantics, and produces a clean trichotomy that makes every read loop safe by construction.