Mcp And Voice

Tools are resolved once at agent initialization and don't change. Tools are resolved per-request with user-specific credentials. Static tools (listTools()) are

Paid MediaBranding
bySamuelca6399670 words

What is Mcp And Voice?

What this skill does

This skill enables integration with the Model Context Protocol (MCP) to connect external tool servers and manage voice capabilities for agents. It supports both static, single-user CLI tools initialized once per agent and dynamic, multi-user SaaS toolsets resolved per request with user-specific credentials. The voice functionality includes text-to-speech, speech-to-text, and realtime speech-to-speech streaming, leveraging providers like OpenAI, ElevenLabs, and Google Cloud.

Who it's for

This skill is designed for paid media specialists and branding strategists who want to incorporate voice interactivity into marketing automation workflows or client-facing applications. Agency strategists managing multi-user SaaS accounts can use it to dynamically resolve tools per user while maintaining secure credential handling. Growth leads experimenting with conversational voice interfaces for customer engagement will find the realtime speech capabilities particularly useful.

Key workflows

Practitioners first set up MCP clients to connect with external tool servers, configuring either local CLI transports or remote SSE endpoints with appropriate credentials. They then decide whether to use static tool lists resolved once at agent startup for single-user scenarios or dynamic toolsets fetched per request for multi-tenant environments. For voice integration, marketers install relevant voice provider packages, configure environment variables for authentication, and attach voice providers to agents for TTS, STT, or realtime voice streaming. CompositeVoice allows mixing different providers for transcription and synthesis to optimize quality or cost.

Common questions

Can I switch tools dynamically for different users? Yes, by using `listToolsets()` with per-request credentials rather than static `listTools()` at initialization. Which voice providers are supported? Providers include OpenAI, ElevenLabs, Google Cloud, Azure, Deepgram, and others, each requiring specific environment variables. How does realtime speech-to-speech differ from simple TTS or STT? Realtime streaming enables live transcription and audio playback simultaneously for interactive voice applications, adding complexity but enabling conversational flows.

How to use in Metaflow

Attach the MCP and Voice skill to your agent task to enable tool resolution and voice capabilities based on your configuration. You can expect static tools to be available immediately at agent startup, while dynamic toolsets are fetched per request with user credentials. Voice features will allow your agent to speak, listen, or stream audio in real time, depending on the providers you configure. For detailed setup, we recommend reviewing the environment variables and transport options before deployment.

For broader context, see our roundup of claude marketing skills, and read connect Claude Desktop to Google Ads with MCP for related setup guidance.

Related skills