Client
Our client, SaaS founder, is one of the top companies in clinical trials research and patient recruitment services in the USA. We have been working with the client for the past 5 years as a custom recruitment CRM software development partner, providing, among others, clinical trial web development services.
In 2026, we developed and implemented a HIPAA-compliant AI voice agent for this Client.
Challenges
State of the Clinical Trial Patient Recruitment Market
The U.S. clinical trials market is estimated to reach 73+ billion by 2033, growing at a CAGR of 6%-7%. A segment of this market is clinical trial recruitment software.
The business model for recruitment platforms is based on the margin between the sponsor payment for a qualified patient and the operational costs. These costs include advertising, manual screening, handoff failures (patient is found but never makes it into the clinical trial), and replacing participant dropouts.
Sponsors (biopharma & medtech organizations funding the trials) contract recruitment platforms to enroll enough qualified candidates on time and at an affordable cost.
Recruitment platforms are companies that find, screen, and pre-qualify candidates. These platforms need to call leads within minutes, handle more inquiries than screeners can manage, and operate outside business hours without hiring more staff. These challenges are what make these platforms explore AI voice agents.
AI voice agents can provide a return on investment for this segment when call volumes are high and the rate of ineligible applicants is high. AI pre-screens potential candidates and discards those who don't meet eligibility criteria before human review.
Our Client Specific Challenge
Their clinical trial screening platform receives thousands of applications from people wanting to participate in studies. All forms were being processed manually by human operators. An operator had to call the applicant, ask around 20 qualification questions based on a specific script, and determine eligibility. Processing a single applicant took several days on average, mostly due to missed calls and the 30-minute duration of the actual screening call.
They want to reduce the processing time from several days to a few hours to process vastly more applicants, close trial enrollment faster (which currently takes months), and avoid hiring additional human staff. This also makes the platform more competitive to pharmaceutical clients by offering pre-scrubbed, clean data faster.
Solution
Why not ready-to-use voice AI agent solutions
There are a lot of players in the voice AI agent market: Leaping AI, Bolka AI, Delfa, TeleWizard, Alleviate Health, Roseline AI, Artera, Gridspace Grace, Grove AI, Bond Health, Power, Phases, Awaz.ai, DigiQT.

Ready-to-use HIPAA-compliant voice AI for clinical research are virtual medical receptionists that can call patients, explain the study, conduct pre-screening, track eligibility, and record responses in the connected CRM, EHR, or CTMS, all without human staff.
The system runs 24/7, follows a set script, asks protocol questions, responds to participant questions in about 2 seconds, fills out questionnaires, and performs automatic participation checks. Once a candidate qualifies, the agent can route them directly to a clinical center. An AI voice agent is particularly well suited for large-scale studies, including depression and neurological studies.
AI voice agent recruitment may reach 75+% completed surveys, deliver 3x+ more qualified leads, improve accurate identification of eligible patients by up to 50%, increase study enrollments by 10+%, and improve sample diversity.
With AI agents, screening time can drop by 30+%, overall recruitment accelerates by over 60%, and each recruiter saves up to 20 hours per week. Staff labor costs fall by around 90%, manual workload by 40%, telephone operation costs by 70%, and overall costs by 60%.
Over one year, AI agents can process 10 million requests and conduct more than 500,000 interactions across dozens Phase II to III studies, engaging 70,000 patients. Recruitment directors report that voice AI agents can accelerate screening and randomization while freeing up thousands of man-hours.
However, our client did not have the opportunity to use ready-to-use solutions from that market because they needed their own white-label voice AI agent. That is why custom voice AI agent development services were the primary solution for them.
Custom AI Voice Agent Development
Our enterprise client (a recruitment platform handling pre-screening calls in the medical domain) needed to implement an AI voice agent, but the primary technical constraint was that the solution had to be strictly HIPAA and GDPR compliant.
The client had a vision for an AI voice agent but lacked the technical expertise to select the right backend components, specifically a Large Language Model (LLM) that could handle Protected Health Information (PHI) legally and securely.
Process
HIPAA-Compliant LLM Platform Consulting
Our AI consulting team evaluated five major LLM providers based on cost and HIPAA compliance readiness.
Anthropic Claude API
Only compliant on expensive, sales-assisted Enterprise plans.
Claude Opus 4.5 costs $5 per million input tokens and $25 per million output tokens. The Claude Sonnet 4.5 price remains $3 per million input tokens and $15 per million output tokens. Regarding HIPAA compliance, a Business Associate Agreement (BAA) is only available to customers on HIPAA-ready services such as the Enterprise plan or the first-party API (self-serve plans are not covered). The HIPAA-ready Enterprise plan requires a sales-assisted agreement and proper configuration.
Hathr.ai
A specialized vendor offering out-of-the-box HIPAA compliance with a signed BAA (Business Associate Agreement) at a flat monthly rate.
A Single Subscription Webapp plan at $45/month that is HIPAA-compliant and includes a signed BAA, unlimited queries, unlimited file uploads/downloads and NIST 800-171 + HIPAA protections. A custom Enterprise plan with EHR/EMR integration, SCIM/SAML/SSO and custom workflow automation is available on request.
Google Vertex AI
Requires significant manual configuration (IAM roles, encryption) and a custom signed BAA to be compliant.
Google's developer pricing lists Gemini 3 Pro preview at $2 per million input tokens and $12 per million output tokens for prompts up to 200 k tokens. Gemini 3 Flash costs $0.50 per million input tokens and $3 per million output tokens for text/image/video input. Audio input is priced at $1 per million tokens. Organizations must sign a Google Cloud BAA and configure IAM roles, encryption and region restrictions to use Vertex AI with protected health information - HIPAA compliance is not automatic.
Microsoft Azure OpenAI Service
Medical text is eligible, but voice/image modalities are not automatically covered, requiring heavy configuration.
GPT-4o on Azure costs $2.50 per million input tokens and $10 per million output tokens. Azure OpenAI is considered HIPAA-eligible for text-based models when customers sign Microsoft's BAA (part of the Data Protection Addendum) and configure encryption, private virtual networks and role-based access control. However, image and voice modalities (e.g., DALL·E or speech/real-time audio features) are not automatically covered.
AWS Bedrock
Similar to Google and Microsoft, requires a signed BAA and manual configuration, compliance is not automatic out-of-the-box.
Claude Sonnet 4.5 on AWS Bedrock costs $3 per million input tokens and $15 per million output tokens, matching the direct Anthropic pricing. AWS services, including Bedrock, become HIPAA-eligible only after a customer signs the AWS BAA and configures their account properly - compliance is not automatic. Provisioned throughput (dedicated model units) and custom fine-tuning are billed separately at hourly rates and require commitments depending on the model and term.
The Decision
The client rejected the "Big Tech" solutions (Microsoft, AWS, Google) because they didn't want to pay for bloated subscription models with unnecessary features and because they preferred negotiating a custom, direct solution with a smaller, specialized vendor.
They chose a specialized, medical-grade, HIPAA-compliant LLM to act as the "brain", integrated with Twilio for the IP telephony (the "mouth and ears" of the voice agent).
AI Voice Agent Development
Features of AI-Powered Voice Agent for Healthcare/Clinical Trials
Below are the basic rules, behaviors, and system integrations needed to build an AI voice assistant whose only job is to call potential patients (candidates) and check if they are eligible to take part in a clinical study or medical research.
To build a reliable voice agent, these rules must be split into system prompts for the conversational LLM you select, others must be built into the code wrapper for telephony and application logic (using frameworks like Twilio) that orchestrates the call, and the rest must be built as tools (function calling/tools) that the LLM can trigger to take actions when it reaches conclusions.
Call initiation & identification
- Outbound calls. The AI voice agent must place an outbound call within 30 seconds of receiving the contact request.
- Initial greeting and identity check. The AI voice agent begins the call with a greeting, introduces itself, and confirms the patient's first and last name. It starts screening only after the patient confirms their identity.
- Holds and unavailability. If the patient asks to hold, the AI voice agent must wait up to 10 minutes and disconnect if they remain unavailable.
- Wrong person or wrong number. The AI voice agent must end the call politely if the person says they aren't the patient or it's the wrong number.
- Referral unavailable.If the referral is unavailable, the AI voice agent must end the call attempt, and log in the CRM that the referral was unavailable.
Conversation style, personality and topic control
- Calm, polite and human-like. The AI voice agent must use simple language, maintain a steady pace, listen attentively, and avoid talking over the candidate's speech.
- Conversation control. The AI voice agent must respond to interruptions politely, pause if the candidate speaks, allow them to finish, and then return to the scripted flow.
- Topic control. The AI voice agent must stick strictly to approved materials (script, pre-screener, ICF, study Q&A, research Q&A, clinicalTrials.gov), decline to answer questions outside these sources and bring the discussion back to screening.
Call duration, pauses and timing
- Call Limits. Calls have a strict two-hour maximum. The AI voice agent must begin warning the candidate 15 minutes before time is up, then remind them every 5 minutes until the call ends.
- Pausing. If the candidate needs a moment, the AI voice agent must stop talking. The AI voice agent must check in every 30 seconds but hang up if there is no response for 10 minutes.
- Conversation Pace. The AI voice agent pauses for about 2 seconds to avoid interruptions and preserve the natural flow of the conversation.
Screening process & script rules
- Follow the script exactly. The voice agent must follow the study screening script exactly as written, in the same order. It shouldn't rephrase questions, skip them, or change their sequence.
- Decide eligibility after all questions. The voice agent must wait until the candidate answers every question. It decides eligibility after receiving the answer to the final question, then tells the candidate whether they qualify.
- Update call results automatically. After the call, the voice agent updates each record as Excluded if the candidate doesn't qualify, Ready if they do, or Not Processed if the call ends before all questions are answered, for instance when responses are too unclear to process. It also creates or updates referral data in the CRM and marks it as an AI-processed referral.
Managing unclear answers, abuse and off-topic calls
- Clarifications. If a candidate gives no answer, a contradictory answer, or an inaudible response, the AI voice agent must ask them to answer the question again up to 4 times and then transfer the call to a human.
- Abuse or off-topic conversations. If a candidate insults, or talks about things not related to the questions, the AI voice agent must politely return them back to the call's topic no more than 10 times and then must terminate the call.
Secondary Screening & Time Restrictions
- Prerequisites. The AI agent must check that the referral answered all identity and pre-screener questions before it starts secondary screening.
- No Late Screenings. If there are less than 15 minutes left, the AI voice agent won't start a second screening. The AI voice agent explains there is not enough time and schedules a callback from a staff member.
- Pushback in the Final 15 Minutes. If a referral pushes to begin the screening during the last 15-minute window, the AI agent must tell the referral that there isn't enough time to get through the process and then arrange for a company representative to reach out to the referral.
Voice-mail & inbound calls
- No inbound calls. The AI voice agent only makes outbound calls and does not answer incoming calls.
- Send calls to voicemail. If someone calls the number, the AI voice agent sends the call to voicemail.
- Link to records. The AI voice agent must attach each voicemail to the correct referral record.
- Voicemail labels. Every voicemail from the AI voice agent must have a label "AI voice agent voicemails" so it's easy to tell them apart from messages left by site staff, coordinators, or other agents.
- Shared storage. All voicemails must be stored in one place.
System Integration & Call Results
- Update records automatically. When a call ends and the voice AI agent marks the candidate as qualified or not, the system must update the referral's data in the right record.
- Tag in the CRM every AI-screened referral. The AI voice agent must label each referral it processes, so staff can see straight away that the AI (not a person) screened the candidate.
- Don't promise a study place. The AI voice agent can only assess whether a candidate meets the study criteria and can't guarantee they'll get a spot or promise them payment.
- Keep medical advice off every call. The agent mustn't answer health questions or offer medical guidance. When a candidate raises a health concern, the AI voice agent must put them through to the clinical team.
Final Technical Realization of a Custom AI Voice Agent
This is a high-level architectural overview and business logic description (technical specification) of an AI voice agent designed to automate outbound calls and conduct candidate screening. We built this system for the initial screening of patients who have submitted an application to participate in a medical study.
Our engineering team solved specific technical challenges (like voice latency) and product requirements (verification, edge-case handling) to meet the clients' business goals.
Call Scenario & Logic
Outbound calls are initiated (via Twilio) immediately after an application is received. The AI verifies the candidate's identity and asks around 20 questions, ensuring it doesn't duplicate information already provided in the web form.
Question Processing Architecture
We implemented a hybrid approach to optimize speed. Simple "yes/no" questions are processed locally with low latency, while complex questions (requiring calculations like BMI) are routed to an LLM. The early disqualification mechanism immediately ends the call if a candidate fails to meet the trial criteria.
We also had to securely store intermediate answers outside of Twilio. IP telephony doesn't allow you to store state, so we used separate services for logs.
Context & Memory Management
Our developers opted out of complex Retrieval-Augmented Generation (RAG) systems. Since there is only one reference document for the agent, FAQs are hardcoded into the prompt context, making the system simpler and more reliable.
User Experience & Timings
We implemented strict latency requirements (a maximum delay of 2 seconds between responses) to ensure a natural conversation flow. Our AI developers also programmed algorithms to handle such cases as the user swearing, asking to pause ("hold on a minute"), mumbling, or suddenly switching speaking languages.
Infrastructure & Integration
The gathered data is automatically logged into the candidate's profile, stored in an AWS-based CRM, and then exported back to the end client.
The success of developing the AI voice agent depended on the presence of expertise in building scalable server backends and reliable integrations with third-party APIs. This expertise was provided by the Belitsoft team, which has been operating in this market for over 20 years.
Results

The system now takes hours, not days, to finish a review
- HIPAA-compliant AI voice agents for healthcare lead qualification can make hundreds of calls simultaneously
- The whole review process now takes a few hours instead of several days
- Pharmaceutical companies screen their first batch of applicants much faster than before
- Each call with a candidate runs to about 30 minutes
The client cuts costs with automated calls
- The client doesn't have to hire, train, and keep a large team of call operators
- They pay a one-time setup fee and then covers only the calls the voice agent makes
The solution is relied on proven tools and meets strict healthcare rules
- The development team built it on Twilio, a HIPAA-compliant LLM, and AWS
- The architecture meets strict healthcare data privacy and security rules
Pharmaceutical companies get high-quality data
- AI voice agents run every screening call the same way to ensure every required data point is asked for and captured. The result is more standardized and reliable data.
Related cases
Our Clients' Feedback
We have been working for over 10 years and they have become our long-term technology partner. Any software development, programming, or design needs we have had, Belitsoft company has always been able to handle this for us.
Founder from ZensAI (Microsoft)/ formerly Elearningforce