FinDev Blog

AI and the Art of Listening to Customers

How FINCA is using locally trained AI to understand customer experiences more accurately
Two women sitting down in plastic chairs, one listening to the other speak.

FINCA, like most financial institutions, strives to listen to customers through surveys and day-to-day interactions. But this listening is only useful if we are hearing our customers correctly, understanding what they mean to say. Experience has shown us this is not something we can assume.

The challenge became clear when FINCA began analyzing customer feedback at scale using artificial intelligence (AI). Last year, we built our Customer Voice (CV) platform to process recorded interviews, which are rich in emotion and context that are difficult to capture through manual review. Assembling the AI workflow and prompting it for analysis were relatively straightforward from a technical standpoint. But we quickly discovered how easily meaning gets lost when machines process human speech, unless they are purposefully trained on the local dialect, context and emotions behind it.

A moment of mishearing

We learned this the hard way while piloting the CV platform in Uganda. We started by applying mainstream AI transcription tools to call-center audio recordings in Luganda and other local dialects. Although the outputs looked plausible, as AI outputs often do, human review uncovered error after error. Some statements were incorrectly transcribed; others were attributed to the wrong people. In other cases, real Luganda words were strung together, creating phrases that didn’t match the rest of the conversation.

These errors were systematic distortions from a model not adequately trained on local languages, and they often reversed the speaker’s intent. In Luganda and other Bantu languages, even a single misplaced syllable can negate a verb, change the subject, or shift meaning entirely. 

For example, in one recording, a customer saying Sinnaba kufuna (I have not yet received my loan) was transcribed as Nnaba kufuna (I have already received it). The missing prefix si completely negated the complaint. In another recording, a woman spoke about her working capital, saying capital yange (my capital), which was transcribed as capital yaabwe (their capital), erasing her ownership and financial independence from the record. And when she described repaying on time, sasuura, the AI transcript produced labeera, an unrelated word (meaning library) that completely buried her behavior as a responsible borrower.

These distortions were not edge cases or random flukes. Across the recordings we reviewed, off-the-shelf AI tools produced transcripts that looked like usable data but regularly obscured the most important customer insights. Local languages like Luganda remain vastly underrepresented in mainstream AI models. The errors we observed were a direct result of this exclusion.

Accurate feedback spurs action

Switching to a locally trained AI speech library (Sunbird AI) was a game-changer. With accurate, culturally nuanced transcripts, we learned that while customers trusted FINCA, they were experiencing real frustration over loan delays, confusion about fees, and anxiety caused by inconsistent information across branches. These were concrete, recurring complaints that staff could immediately recognize and start fixing. 

Within weeks, changes were underway. Clearer fee explanations were introduced. Teams started calling customers to update them on their loan status. Managers delivered training to branches where service quality was most uneven. The accuracy of our transcriptions and our ability to deploy them at scale across a large sample of recordings strengthened confidence in the findings.

Listening as a research method

Accurate audio transcripts are only the beginning. Working with more audio data, we saw that open-ended conversations carry layers of meaning that no structured survey can fully capture. Customers naturally reveal what matters most to them through spontaneous verbal cues: returning to the same issue, lingering on a painful experience, or shifting the tone from frustration to warmth when describing a positive one. Across an entire conversation, these expressive patterns reveal the issues that weigh most heavily, bringing the data to life. 

We started using the Customer Voice platform to dig deeper. The AI tool analyzes expressive and emotional patterns within FINCA’s rich trove of recorded client conversations to unearth insights that are helping us evaluate real impact and develop new products.  

In Côte d’Ivoire, for example, we analyzed recorded interviews with cocoa farmers who had received formal land documents through Meridia, a FINCA Ventures investee. Here, emotional patterns became the backbone of our impact analysis. When farmers described life before documentation, their expressions reflected fear and vigilance; after receiving documentation, their tone shifted to relief, pride and calm.

In Tanzania and Uganda, we analyzed recordings of focus group discussions and key informant interviews as part of a youth livelihoods study run by FINCA’s Poverty Eradication Lab. Rather than producing a conventional codebook and tallying theme frequencies across pages of transcripts, we instructed an AI agent within CV to generate personas. These profiles captured the subtleties of what people were really saying behind the words spoken, helping us better understand their stresses and aspirations. 

The picture that emerged went beyond financial barriers and revealed the same gap surfacing everywhere — people with skills and a desire to work, but no tools or resources to get started. These insights helped us validate the personas and product ideas developed by our human-centered design experts. As a result, we are now launching two youth-oriented banking products which remove traditional collateral and credit history requirements, making them more accessible to prospective young clients. 

What this means for practitioners

Technology enables us to collect increasing volumes of data. But more data does not automatically lead to better understanding.  Our objective isn’t to use more AI but rather to listen more accurately and purposefully.

That starts in a local context, with people expressing themselves in culturally natural ways, and with locally trained models that can reflect their reality. It also requires listening beyond structured questions – paying attention to what customers emphasize, revisit or express emotionally. These powerful signals get lost when flattened into survey data.

Ultimately, listening only has value if it leads to action. When we truly hear customers, we can focus on what matters most to them. If inclusive finance is to live up to its social promise, we need to ensure no customer’s voice is lost in translation — or in transcription. 

Leave a Comment

Comments on this page are moderated by FinDev Editors. We welcome comments that offer remarks and insights that are relevant to the post. Learn More