Type-Erasing Awaitables
Overview
The any_* wrappers type-erase stream and source concepts so that algorithms can operate on heterogeneous concrete types through a uniform interface. Each wrapper preallocates storage for the type-erased awaitable at construction time, achieving zero steady-state allocation.
Two vtable layouts are used depending on how many operations the wrapper exposes.
Single-Operation: Flat Vtable
When a wrapper exposes exactly one async operation (e.g. any_read_stream with read_some, or any_write_stream with write_some), all function pointers live in a single flat vtable:
// Flat vtable -- 64 bytes, one cache line
struct vtable
{
void (*construct_awaitable)(...); // 8
bool (*await_ready)(void*); // 8
std::coroutine_handle<> (*await_suspend)(void*, ...); // 8
io_result<size_t> (*await_resume)(void*); // 8
void (*destroy_awaitable)(void*); // 8
size_t awaitable_size; // 8
size_t awaitable_align; // 8
void (*destroy)(void*); // 8
};
The inner awaitable can be constructed in either await_ready or await_suspend, depending on whether the outer awaitable has a short-circuit path.
Construct in await_ready (any_read_stream)
When there is no outer short-circuit, constructing in await_ready lets immediate completions skip await_suspend entirely:
bool await_ready() {
vt_->construct_awaitable(stream_, storage_, buffers);
awaitable_active_ = true;
return vt_->await_ready(storage_); // true → no suspend
}
std::coroutine_handle<> await_suspend(std::coroutine_handle<> h, io_env const* env) {
return vt_->await_suspend(storage_, h, env);
}
io_result<size_t> await_resume() {
auto r = vt_->await_resume(storage_);
vt_->destroy_awaitable(storage_);
awaitable_active_ = false;
return r;
}
Construct in await_suspend (any_write_stream)
When the outer awaitable has a short-circuit (empty buffers), construction is deferred to await_suspend so the inner awaitable is never created on the fast path:
bool await_ready() const noexcept {
return buffers_.empty(); // short-circuit, no construct
}
std::coroutine_handle<> await_suspend(std::coroutine_handle<> h, io_env const* env) {
vt_->construct_awaitable(stream_, storage_, buffers);
awaitable_active_ = true;
if(vt_->await_ready(storage_))
return h; // immediate → resume caller
return vt_->await_suspend(storage_, h, env);
}
io_result<size_t> await_resume() {
if(!awaitable_active_)
return {{}, 0}; // short-circuited
auto r = vt_->await_resume(storage_);
vt_->destroy_awaitable(storage_);
awaitable_active_ = false;
return r;
}
Both variants touch the same two cache lines on the hot path.
Multi-Operation: Split Vtable with awaitable_ops
When a wrapper exposes multiple operations that produce different awaitable types (e.g. any_read_source with read_some and read, or any_write_sink with write_some, write, write_eof(buffers), and write_eof()), a split layout is required. Each construct call returns a pointer to a static constexpr awaitable_ops matching the awaitable it created.
// Per-awaitable dispatch -- 32 bytes
struct awaitable_ops
{
bool (*await_ready)(void*);
std::coroutine_handle<> (*await_suspend)(void*, ...);
io_result<size_t> (*await_resume)(void*);
void (*destroy)(void*);
};
// Vtable -- 32 bytes
struct vtable
{
awaitable_ops const* (*construct_awaitable)(...);
size_t awaitable_size;
size_t awaitable_align;
void (*destroy)(void*);
};
The inner awaitable is constructed in await_suspend. Outer await_ready handles short-circuits (e.g. empty buffers) before the inner type is ever created:
bool await_ready() const noexcept {
return buffers_.empty(); // short-circuit
}
std::coroutine_handle<> await_suspend(std::coroutine_handle<> h, io_env const* env) {
active_ops_ = vt_->construct_awaitable(stream_, storage_, buffers_);
if(active_ops_->await_ready(storage_))
return h; // immediate → resume caller
return active_ops_->await_suspend(storage_, h, env);
}
io_result<size_t> await_resume() {
if(!active_ops_)
return {{}, 0}; // short-circuited
auto r = active_ops_->await_resume(storage_);
active_ops_->destroy(storage_);
active_ops_ = nullptr;
return r;
}
Cache Line Analysis
Immediate completion path — inner await_ready returns true:
Flat (any_read_stream, any_write_stream): 2 cache lines
LINE 1 object stream_, vt_, cached_awaitable_, ...
LINE 2 vtable construct → await_ready → await_resume → destroy
(contiguous, sequential access, prefetch-friendly)
Split (any_read_source, any_write_sink): 3 cache lines
LINE 1 object source_, vt_, cached_awaitable_, active_ops_, ...
LINE 2 vtable construct_awaitable
LINE 3 awaitable_ops await_ready → await_suspend → await_resume → destroy
(separate .rodata address, defeats spatial prefetch)
The flat layout keeps all per-awaitable function pointers adjacent to construct_awaitable in a single 64-byte structure. The split layout places vtable and awaitable_ops at unrelated addresses in .rodata, adding one cache miss on the hot path.
When to Use Which
| Flat vtable | Split vtable |
|---|---|
Wrapper has exactly one async operation |
Wrapper has multiple async operations |
|
|
|
|
Why the Flat Layout Cannot Scale
With multiple operations, each construct call produces a different concrete awaitable type. The per-awaitable function pointers (await_ready, await_suspend, await_resume, destroy) must match the type that was constructed. The split layout solves this by returning the correct awaitable_ops const* from each construct call. The flat layout would require duplicating all four function pointers in the vtable for every operation — workable for two operations, unwieldy for four.