← Blog

Streaming Asterisk Audio Into Pipecat With chan_websocket

Asterisk 23 added a new channel driver, chan_websocket, that lets the dialplan dial an outbound WebSocket as if it were a SIP endpoint. I tried it as soon as the release candidate was out, against my own andrius/asterisk:23-rc images, to feed the audio of a live phone call straight into a Pipecat pipeline. It took three attempts.

Why chan_websocket

Before chan_websocket, the established way to push the audio of an Asterisk call to an external service was ARI External Media: open an ARI WebSocket, instruct Asterisk to send the media of a channel to a UDP port, then receive RTP on that port. It works but it spreads the state across two connections (ARI for control, RTP for audio) and the service has to deal with RTP packetisation.

chan_websocket collapses that into a single channel. The dialplan references a named WebSocket endpoint; Asterisk opens an outbound WebSocket per call; control messages and audio flow over the same connection. From the dialplan side it looks like a regular Dial():

exten => 1001,1,NoOp(Pipecat echo test)
   same => n,Answer()
   same => n,Wait(1)
   same => n,Playback(hello-world)
   same => n,Dial(Websocket/pipecat-2/c(slin16),60)
   same => n,Hangup()

The c(slin16) argument tells Asterisk to negotiate signed-linear 16 kHz audio with the WebSocket peer.

Building Asterisk 23 With chan_websocket

chan_websocket is not enabled in the default menuselect.makeopts of the 23.x release tarballs, so a stock build does not include it. I needed a custom build. Since I keep andrius/asterisk images for every supported release, I added chan_websocket and res_http_websocket to the menuselect step in my image’s build flow, along with the res_ari family:

RUN cd asterisk-23.0.0-rc2 && \
    ./configure \
      --with-pjproject-bundled \
      --with-jansson-bundled && \
    make menuselect.makeopts && \
    menuselect/menuselect \
      --enable chan_websocket \
      --enable res_http_websocket \
      --enable res_ari \
      menuselect.makeopts && \
    make && make install

The integration container then derives from that image:

FROM andrius/asterisk:23-rc

USER root
COPY config/ /etc/asterisk/
RUN chown -R asterisk:asterisk /etc/asterisk
USER asterisk

modules.conf loads the WebSocket modules explicitly so a missing autoload flip does not silently turn them off later:

[modules]
autoload = yes
load = res_http_websocket.so
load = chan_websocket.so

The Asterisk Side

http.conf enables the HTTP server and turns on WebSockets, which is what res_http_websocket rides on top of:

[general]
enabled = yes
bindaddr = 0.0.0.0
bindport = 8088

[websockets]
enabled = yes

websocket_client.conf declares the named outbound endpoints. I set up two during testing, one TLS and one plain, because the TLS path forced me to deal with self-signed certificates earlier than I wanted to:

[pipecat-1]
type = websocket_client
uri = wss://ws-pipecat-echo:7860/asterisk-ws
protocols = media
connection_type = per_call_config
connection_timeout = 1000
reconnect_interval = 1000
reconnect_attempts = 5
tls_enabled = yes
verify_server_cert = no

[pipecat-2]
type = websocket_client
uri = ws://ws-pipecat-echo:7861/asterisk-ws
protocols = media
connection_type = per_call_config
connection_timeout = 1000
tls_enabled = no

connection_type = per_call_config is the important line: a fresh WebSocket is opened for every call rather than one connection being shared across calls. This matches Pipecat’s per-session model.

The Protocol Surprise

The first thing the Pipecat side has to deal with is that chan_websocket does not send protobuf-wrapped frames. It uses a mixed-mode protocol: TEXT WebSocket frames for control, BINARY WebSocket frames for raw PCM audio. The first message Asterisk sends on a new connection is a TEXT frame that looks like:

MEDIA_START connection_id:abc channel:PJSIP/x-y
            format:ulaw optimal_frame_size:160

It will also send GET_STATUS periodically and expects a STATUS ... text response, and a HANGUP text frame on call teardown. Audio is everything in between, as raw PCM matching the codec negotiated in the c(...) argument of the dialplan Dial().

This is where my first attempt hit a wall.

Attempt 1: Pipecat’s Default WebsocketServerTransport

Pipecat’s WebsocketServerTransport defaults to ProtobufFrameSerializer. Connect Asterisk to it, and the first thing the serializer sees is the MEDIA_START text message, which it tries to parse as a serialised Frame and fails on. The transport closes the connection. Asterisk reconnects per the reconnect_attempts, and the loop runs until either side gives up.

The fix is a serializer that does not try to be clever about the incoming bytes.

Attempt 2: A RawPCMSerializer

The second attempt was a custom serializer that ignores text frames, treats binary frames as raw PCM, and produces raw PCM bytes on the output side. This part is short:

from pipecat.frames.frames import (
    Frame, InputAudioRawFrame, OutputAudioRawFrame,
)
from pipecat.serializers.base_serializer import (
    FrameSerializer, FrameSerializerType,
)


class RawPCMSerializer(FrameSerializer):
    def __init__(
        self,
        sample_rate: int = 16000,
        num_channels: int = 1,
    ):
        self._sample_rate = sample_rate
        self._num_channels = num_channels

    @property
    def type(self) -> FrameSerializerType:
        return FrameSerializerType.BINARY

    async def serialize(
        self, frame: Frame
    ) -> bytes | None:
        if isinstance(frame, OutputAudioRawFrame):
            return frame.audio
        return None

    async def deserialize(
        self, data: bytes
    ) -> Frame | None:
        if len(data) == 0:
            return None
        return InputAudioRawFrame(
            audio=data,
            sample_rate=self._sample_rate,
            num_channels=self._num_channels,
        )

This solved the audio path. The text control frames were still a problem; the Pipecat WebSocket transport does not surface them to user code in a usable form.

Attempt 3: FastAPI Bridge + Pipecat

The working version moves the WebSocket accept to a small FastAPI endpoint that handles the chan_websocket TEXT/BINARY split itself, and hands the live socket to Pipecat through FastAPIWebsocketTransport once the audio path is open.

The TEXT handler is the part that earns its keep:

async def handle_text_message(
    websocket: WebSocket, message: str
):
    if message.startswith("MEDIA_START"):
        parts = {}
        for part in message.split()[1:]:
            if ':' in part:
                key, value = part.split(':', 1)
                parts[key] = value
        logger.info(f"Media session: {parts}")
    elif message == "GET_STATUS":
        status = (
            "STATUS queue_length:0 xon_level:800 "
            "xoff_level:900 queue_full:false "
            "bulk_media:false media_paused:false"
        )
        await websocket.send_text(status)
    elif message == "HANGUP":
        await websocket.close()

Once the chan_websocket text protocol is satisfied, the binary side is fed straight into Pipecat through the custom serializer above. The Pipecat pipeline is then trivial:

ws_transport = FastAPIWebsocketTransport(
    websocket=websocket,
    params=FastAPIWebsocketParams(
        serializer=RawPCMSerializer(),
        audio_in_enabled=True,
        audio_out_enabled=True,
        audio_in_sample_rate=16000,
        audio_out_sample_rate=16000,
        add_wav_header=False,
        vad_analyzer=None,
        session_timeout=60 * 3,
    ),
)

pipeline = Pipeline([
    ws_transport.input(),
    AudioEchoProcessor(),
    ws_transport.output(),
])

await PipelineRunner().run(
    PipelineTask(pipeline)
)

The Echo Processor

An echo pipeline is the minimum useful thing to confirm the audio path works end-to-end. It also surfaces a small Pipecat detail that cost me some time on the first day. Pipecat distinguishes InputAudioRawFrame (audio coming from the transport) from OutputAudioRawFrame (audio going out). Pushing an InputAudioRawFrame back into the pipeline does not cause the transport to emit it; the transport’s output() only writes OutputAudioRawFrame. The echo has to convert types:

class AudioEchoProcessor(FrameProcessor):
    async def process_frame(
        self, frame: Frame, direction: FrameDirection
    ):
        await super().process_frame(frame, direction)
        if isinstance(frame, InputAudioRawFrame):
            echo = OutputAudioRawFrame(
                audio=frame.audio,
                sample_rate=frame.sample_rate,
                num_channels=frame.num_channels,
            )
            await self.push_frame(echo, direction)
        else:
            await self.push_frame(frame, direction)

With this in place, dialing extension 1001 from a softphone routes through the dialplan, opens an outbound WebSocket from Asterisk to the FastAPI endpoint, sets up the Pipecat pipeline, and bounces the audio back. My voice came back a few hundred milliseconds delayed, which is the round trip through Docker networking plus the pipeline.

Findings

Things that surprised me:

  • chan_websocket is not in the default menuselect makeopts of the 23.x release tarballs. A custom build, or an image like andrius/asterisk that already enables it, is required.
  • The protocol mixes WebSocket TEXT and BINARY. Most existing WebSocket transports in Python AI frameworks assume one or the other, not both, so a thin protocol bridge is usually needed.
  • connection_type = per_call_config opens a new WebSocket per call. Plan capacity accordingly.
  • slin16 (signed-linear 16 kHz) works end-to-end. ulaw (G.711) also works but means the Pipecat side has to resample if the pipeline expects 16 kHz.
  • Pipecat strictly distinguishes InputAudioRawFrame from OutputAudioRawFrame. An echo processor must convert types. I lost half an hour to silence on the first build before noticing.
  • TLS works (tls_enabled = yes, wss:// URI) but with a self-signed cert you need verify_server_cert = no. Production deployments need a real certificate the Asterisk container trusts.

What’s Next

This is the foundation. Replacing AudioEchoProcessor with the usual STT → LLM → TTS chain turns it into an actual voice agent on the phone, but that has its own complications (interruption handling, turn detection, cold-start latency on the LLM side) and will land in a separate post.

The Pipecat repository is at github.com/pipecat-ai/pipecat. The Asterisk 23 release notes describe chan_websocket and its websocket_client.conf options. My Asterisk Docker images are at andrius/asterisk.