Asterisk 23 added a new channel driver, chan_websocket, that lets the
dialplan dial an outbound WebSocket as if it were a SIP endpoint. I
tried it as soon as the release candidate was out, against my own
andrius/asterisk:23-rc images, to feed the audio of a live phone call
straight into a Pipecat
pipeline. It took three attempts.
Why chan_websocket
Before chan_websocket, the established way to push the audio of an
Asterisk call to an external service was ARI External Media: open an
ARI WebSocket, instruct Asterisk to send the media of a channel to a
UDP port, then receive RTP on that port. It works but it spreads the
state across two connections (ARI for control, RTP for audio) and the
service has to deal with RTP packetisation.
chan_websocket collapses that into a single channel. The dialplan
references a named WebSocket endpoint; Asterisk opens an outbound
WebSocket per call; control messages and audio flow over the same
connection. From the dialplan side it looks like a regular Dial():
exten => 1001,1,NoOp(Pipecat echo test)
same => n,Answer()
same => n,Wait(1)
same => n,Playback(hello-world)
same => n,Dial(Websocket/pipecat-2/c(slin16),60)
same => n,Hangup()
The c(slin16) argument tells Asterisk to negotiate signed-linear
16 kHz audio with the WebSocket peer.
Building Asterisk 23 With chan_websocket
chan_websocket is not enabled in the default menuselect.makeopts of
the 23.x release tarballs, so a stock build does not include it. I
needed a custom build. Since I keep
andrius/asterisk
images for every supported release, I added chan_websocket and
res_http_websocket to the menuselect step in my image’s build flow,
along with the res_ari family:
RUN cd asterisk-23.0.0-rc2 && \
./configure \
--with-pjproject-bundled \
--with-jansson-bundled && \
make menuselect.makeopts && \
menuselect/menuselect \
--enable chan_websocket \
--enable res_http_websocket \
--enable res_ari \
menuselect.makeopts && \
make && make install
The integration container then derives from that image:
FROM andrius/asterisk:23-rc
USER root
COPY config/ /etc/asterisk/
RUN chown -R asterisk:asterisk /etc/asterisk
USER asterisk
modules.conf loads the WebSocket modules explicitly so a missing
autoload flip does not silently turn them off later:
[modules]
autoload = yes
load = res_http_websocket.so
load = chan_websocket.so
The Asterisk Side
http.conf enables the HTTP server and turns on WebSockets, which is
what res_http_websocket rides on top of:
[general]
enabled = yes
bindaddr = 0.0.0.0
bindport = 8088
[websockets]
enabled = yes
websocket_client.conf declares the named outbound endpoints. I set up
two during testing, one TLS and one plain, because the TLS path forced
me to deal with self-signed certificates earlier than I wanted to:
[pipecat-1]
type = websocket_client
uri = wss://ws-pipecat-echo:7860/asterisk-ws
protocols = media
connection_type = per_call_config
connection_timeout = 1000
reconnect_interval = 1000
reconnect_attempts = 5
tls_enabled = yes
verify_server_cert = no
[pipecat-2]
type = websocket_client
uri = ws://ws-pipecat-echo:7861/asterisk-ws
protocols = media
connection_type = per_call_config
connection_timeout = 1000
tls_enabled = no
connection_type = per_call_config is the important line: a fresh
WebSocket is opened for every call rather than one connection being
shared across calls. This matches Pipecat’s per-session model.
The Protocol Surprise
The first thing the Pipecat side has to deal with is that
chan_websocket does not send protobuf-wrapped frames. It uses a
mixed-mode protocol: TEXT WebSocket frames for control, BINARY
WebSocket frames for raw PCM audio. The first message Asterisk sends
on a new connection is a TEXT frame that looks like:
MEDIA_START connection_id:abc channel:PJSIP/x-y
format:ulaw optimal_frame_size:160
It will also send GET_STATUS periodically and expects a
STATUS ... text response, and a HANGUP text frame on call
teardown. Audio is everything in between, as raw PCM matching the
codec negotiated in the c(...) argument of the dialplan Dial().
This is where my first attempt hit a wall.
Attempt 1: Pipecat’s Default WebsocketServerTransport
Pipecat’s WebsocketServerTransport defaults to
ProtobufFrameSerializer. Connect Asterisk to it, and the first thing
the serializer sees is the MEDIA_START text message, which it tries
to parse as a serialised Frame and fails on. The transport closes
the connection. Asterisk reconnects per the reconnect_attempts, and
the loop runs until either side gives up.
The fix is a serializer that does not try to be clever about the incoming bytes.
Attempt 2: A RawPCMSerializer
The second attempt was a custom serializer that ignores text frames, treats binary frames as raw PCM, and produces raw PCM bytes on the output side. This part is short:
from pipecat.frames.frames import (
Frame, InputAudioRawFrame, OutputAudioRawFrame,
)
from pipecat.serializers.base_serializer import (
FrameSerializer, FrameSerializerType,
)
class RawPCMSerializer(FrameSerializer):
def __init__(
self,
sample_rate: int = 16000,
num_channels: int = 1,
):
self._sample_rate = sample_rate
self._num_channels = num_channels
@property
def type(self) -> FrameSerializerType:
return FrameSerializerType.BINARY
async def serialize(
self, frame: Frame
) -> bytes | None:
if isinstance(frame, OutputAudioRawFrame):
return frame.audio
return None
async def deserialize(
self, data: bytes
) -> Frame | None:
if len(data) == 0:
return None
return InputAudioRawFrame(
audio=data,
sample_rate=self._sample_rate,
num_channels=self._num_channels,
)
This solved the audio path. The text control frames were still a problem; the Pipecat WebSocket transport does not surface them to user code in a usable form.
Attempt 3: FastAPI Bridge + Pipecat
The working version moves the WebSocket accept to a small FastAPI
endpoint that handles the chan_websocket TEXT/BINARY split itself,
and hands the live socket to Pipecat through
FastAPIWebsocketTransport once the audio path is open.
The TEXT handler is the part that earns its keep:
async def handle_text_message(
websocket: WebSocket, message: str
):
if message.startswith("MEDIA_START"):
parts = {}
for part in message.split()[1:]:
if ':' in part:
key, value = part.split(':', 1)
parts[key] = value
logger.info(f"Media session: {parts}")
elif message == "GET_STATUS":
status = (
"STATUS queue_length:0 xon_level:800 "
"xoff_level:900 queue_full:false "
"bulk_media:false media_paused:false"
)
await websocket.send_text(status)
elif message == "HANGUP":
await websocket.close()
Once the chan_websocket text protocol is satisfied, the binary side is fed straight into Pipecat through the custom serializer above. The Pipecat pipeline is then trivial:
ws_transport = FastAPIWebsocketTransport(
websocket=websocket,
params=FastAPIWebsocketParams(
serializer=RawPCMSerializer(),
audio_in_enabled=True,
audio_out_enabled=True,
audio_in_sample_rate=16000,
audio_out_sample_rate=16000,
add_wav_header=False,
vad_analyzer=None,
session_timeout=60 * 3,
),
)
pipeline = Pipeline([
ws_transport.input(),
AudioEchoProcessor(),
ws_transport.output(),
])
await PipelineRunner().run(
PipelineTask(pipeline)
)
The Echo Processor
An echo pipeline is the minimum useful thing to confirm the audio path
works end-to-end. It also surfaces a small Pipecat detail that cost me
some time on the first day. Pipecat distinguishes
InputAudioRawFrame (audio coming from the transport) from
OutputAudioRawFrame (audio going out). Pushing an InputAudioRawFrame
back into the pipeline does not cause the transport to emit it; the
transport’s output() only writes OutputAudioRawFrame. The echo has
to convert types:
class AudioEchoProcessor(FrameProcessor):
async def process_frame(
self, frame: Frame, direction: FrameDirection
):
await super().process_frame(frame, direction)
if isinstance(frame, InputAudioRawFrame):
echo = OutputAudioRawFrame(
audio=frame.audio,
sample_rate=frame.sample_rate,
num_channels=frame.num_channels,
)
await self.push_frame(echo, direction)
else:
await self.push_frame(frame, direction)
With this in place, dialing extension 1001 from a softphone routes through the dialplan, opens an outbound WebSocket from Asterisk to the FastAPI endpoint, sets up the Pipecat pipeline, and bounces the audio back. My voice came back a few hundred milliseconds delayed, which is the round trip through Docker networking plus the pipeline.
Findings
Things that surprised me:
chan_websocketis not in the default menuselect makeopts of the 23.x release tarballs. A custom build, or an image likeandrius/asteriskthat already enables it, is required.- The protocol mixes WebSocket TEXT and BINARY. Most existing WebSocket transports in Python AI frameworks assume one or the other, not both, so a thin protocol bridge is usually needed.
connection_type = per_call_configopens a new WebSocket per call. Plan capacity accordingly.slin16(signed-linear 16 kHz) works end-to-end.ulaw(G.711) also works but means the Pipecat side has to resample if the pipeline expects 16 kHz.- Pipecat strictly distinguishes
InputAudioRawFramefromOutputAudioRawFrame. An echo processor must convert types. I lost half an hour to silence on the first build before noticing. - TLS works (
tls_enabled = yes,wss://URI) but with a self-signed cert you needverify_server_cert = no. Production deployments need a real certificate the Asterisk container trusts.
What’s Next
This is the foundation. Replacing AudioEchoProcessor with the usual
STT → LLM → TTS chain turns it into an actual voice agent on the
phone, but that has its own complications (interruption handling,
turn detection, cold-start latency on the LLM side) and will land in
a separate post.
The Pipecat repository is at
github.com/pipecat-ai/pipecat.
The Asterisk 23 release notes describe chan_websocket and its
websocket_client.conf options. My Asterisk Docker images are at
andrius/asterisk.