Gaana Mangwao

Gaana Mangwao

Pakistani music faces significant challenges in reaching a wider audience and preserving its linguistic nuances. This project addresses these critical issues by aiming to increase the accessibility of Pakistani music for both local and international audiences. A core objective is to preserve the linguistic culture of Urdu by ensuring an accurate and authentic representation of the language. The primary outcome was the design and development of a user-friendly tool that supports multilingual, multiscript captions for songs, with a particular focus on Urdu and English as they are the official languages of Pakistan. This solution seeks to increase language equality in technology.

Client
Purdue University and Hamnawa
Type
UX Engineer
Role
Research, design, end-to-end development, and testing.
TImeline
Sep 2024 - Apr 2025
Team Members
Solo

Overview

Problem

Urdu faces a significant technological barrier: it’s largely unrecognized by essential platforms like transcription software and music distribution services. A fundamental challenge lies in its right-to-left orientation, which clashes with the left-to-right design prevalent in Western technology. This mismatch creates a high learning curve for typing in Urdu, hindering its digital presence. 

Furthermore, Nastaliq, the traditional and preferred Urdu script, presents display difficulties due to its non-horizontal baseline and other rendering complexities. 

Consequently, users often resort to Roman Urdu – typing Urdu words with English letters. This informal practice, however, lacks standardization and contributes to the erosion of the language and its script.

The issue is compounded in Pakistani music, where songs now frequently blend Urdu and English, forcing technology to handle text in opposing directions. 

This leads to a pervasive lack of online Urdu song lyrics, diminishing accessibility and listener engagement, as visual lyrics are crucial for retention. Musicians do not necessarily have the complete lyrics after production changes, nor do they have the time to transcribe them, especially given the fear of writing incorrect lyrics in their mother tongue.

Figure 1: Study Overview

Solution

A website that enables users to upload music files for rapid, highly accurate transcription into Roman Urdu and English. It offers seamless conversion of Roman Urdu transcripts into Urdu script, and revert to Roman Urdu as needed. Comprehensive editing tools, including guidance on Urdu and transliteration keyboards, empower users to ensure linguistic precision.

For more details about the study you can continue scrolling.

Secondary Research

Figure 3: Secondary Research Overview

Challenges in Urdu Digitalization

  • Prevalence of Roman Urdu: Lack of digital infrastructure for native Urdu has led to the widespread use of Roman Urdu (Urdu written with Latin characters) for search and accessibility, despite its standardization issues due to varied spellings.
  • Nastaliq script complexity: The native Urdu script poses digital integration challenges due to its right-to-left orientation, context-sensitive letter shapes, and downward-sloping nature.
  • Lyrics and Song Popularity: The fluency and visual presentation of lyrics can significantly impact a song's popularity.
  • Limited transcription software support: Urdu is largely unavailable in most transcription software, and when it is, performance is often poor.
  • Roman Urdu in media & auto-captions: Lyric databases and videos frequently use Roman Urdu, and auto-generated captions often default to Hindi, which is illegible to many Pakistanis.
  • Usability of Urdu keyboards: Typing in Urdu usually has a steep learning curve as these keyboards are often QWERTY based and have limited autocorrection support.

Difficulties in Automatic Lyrics Transcription

  • Complexities of Music Transcription: Automatic lyric transcription is a nascent field. Transcribing voices in music is particularly challenging due to high signal-to-noise ratios from layered production and artistic choices that can reduce vocal intelligibility compared to normal speech.
  • Bilingual Code-switching in Music: Musicians, especially Gen Z artists, often code-switch between English and Urdu, complicating transcription due to the fundamental differences between right-to-left and left-to-right scripts.
  • LLM Potential & Language Bias: Large Language Models (LLMs) show promise for automatic lyric transcription due to their extensive databases. However, these models are typically optimized for Western languages.
  • Direct Access Limitations: LLMs cannot always use direct URLs for content because platform business models, API access policies, and copyright agreements govern direct access. Therefore running into roadblocks when links from streaming platforms (YouTube, Spotify, Apple Music etc.) are used.

Primary Research Methodology

Figure 4: Primary Research Overview

The study comprised three stages:

  1. Problem Discovery: Identifying the problem space and associated pain points.
  2. Iterating Multi-script Transcription Tool: Developing, designing, and testing the transcription tool and prompt engineering LLM.
  3. Guideline Creation: Mixed-methods study to develop multi-script captioning guidelines.

Discovery

Figure 5: Problem discovery with musicians and technologists

Songwriting Habits of Musicians

Mobile-First Approach: Musicians primarily use their phones for songwriting, leveraging the Notes app for storing, sharing, brainstorming, and archiving lyrics and ideas. Phones are preferred over laptops due to their accessibility and mobility, especially during creative flow states.

Time Constraints for Lyric Posting: Musicians recognize the value of posting lyrics, seeing it as crucial for audience engagement and song memorability—even likening it to "treating your songs as poetry." However, they often lack the time to write lyrics due to managing all aspects of their work independently.

Challenges with Urdu Lyrics

Reliance on Roman Urdu: Musicians often default to typing lyrics in Roman Urdu due to the difficulty of learning Urdu keyboard layouts and a lack of confidence in writing in the native script. This stems from a common background in English-medium education, where Urdu ceased to be a mandatory subject after a certain point.

Concerns About Urdu Accuracy: Musicians fear Urdu grammar/spelling mistakes more than English ones. Unlike English, errors in their mother tongue can feel incompetent, highlighting a need for strong Urdu language support.

Consumption vs. Production Mismatch: Despite difficulties in writing in Urdu script many musicians, particularly rappers, draw inspiration from written Urdu poetry. This creates a disconnect between their consumption and production of songs.

Script Preferences: Musicians generally couldn't name the Nastaliq typeface, but they could easily identify it and showed a strong preference for it over the Naskh script which isrobotic and boxed in”. This preference stems from their familiarity with Nastaliq and its aesthetically pleasing appearance.

Dynamic Nature of Lyrics & Collaboration

Fluid Lyrics: Musicians don't always have complete lyrics written down, as content can evolve during production or collaboration. Rappers, for instance, are often possessive about their lyrics and may freestyle, meaning final lyrics might not be fully documented even after production.

Structural Labeling Needs: While musicians may not always include song structure labels (like chorus, verse) when writing in their Notes app, these labels are required when uploading lyrics to platforms such as Musixmatch.

Perspectives on LLM Transcription

Openness to LLMs: Musicians are generally unconcerned about Large Language Models (LLMs) training on their songs for transcription purposes. They are open to LLM assistance in transcription, believing it can save them time.

Human Intervention Required: They don't expect lyric transcriptions to be completely or mostly accurate and humans will need to fix the lyrics particularly because of their prior experience with Urdu technology.

Urdu Technology Tools

LLM Capabilities for Urdu Transcription: Usama Bin Shafqat, an AI Engineer at Google, demonstrated Gemini's ability to handle less-documented languages in Pakistan and manage the musical complexity of multivocal group performances. Gemini achieved near-complete accuracy in transcribing both Urdu and Roman Urdu.

Prompt Engineering: However, as LLMs are likely to resort to the Roman Hindi form of transliteration, developing an Urdu-specific solution would require more strict prompt engineering.

Urdu Transliteration Keyboard: Sheikh Ahmed, an Urdu language specialist at Musixmatch, showed the importance of a Roman Urdu transliteration keyboard for transcribing songs. This tool is especially valuable for songs blending Urdu and English, as it enables him to type in Roman Urdu and instantly generate the correct Urdu script, all while seamlessly including English words without constantly switching between different keyboard layouts.

Urdu Keyboard Insights: Zeerak Ahmed, founder of Urdu keyboard (Matnsaz), shared insights from his research on technology adoption and pain points when typing in Urdu. He was also able to provide guidance on project direction based on his experience designing and developing Matnsaz.

Figure 6: Miro process mapping and song artifact analysis with songwriters

Song Transcription Tool

Ideation

Figure 7: Proposed Features for Song Transcription Tool

Multi-script Transcripts: Lyrics to be provided in both Roman Urdu and Urdu (Nastaliq) to reduce friction in transcribing in both scripts. 

Error Correction: Offer users alternative word suggestions in both Roman Urdu and Urdu, along with access to an Urdu dictionary, to help them write confidently and reduce the fear of making mistakes.

Alternative Input Methods: Provide speech-to-text technology for dictating lyrics and an Urdu transliteration keyboard to lower the barrier to accessing Urdu text technology.

Song Structure Labels: Enable users to select and apply predefined song structure labels (like chorus, verse) to quickly add finishing touches to their song transcripts.

Timestamped Transcripts: Integrate timestamp markers into transcripts, allowing musicians to convert their song transcripts into lyric videos faster within video editing software.

Audio-Playback Controls: Give users more control over transcription accuracy by providing song playback speed controls. This aligns with the mental models of other transcription software and would also benefit fans transcribing songs.

Mid-fi Design and Development

Figure 8: Working MVP Web Prototype 

An MVP was developed to test Gemini’s real-time Roman Urdu song transcription capabilities and observe how users made corrections. Participants were also provided with a transliteration keyboard, which was a first-time experience for all of them. The songs were transcribed in Roman Urdu by default as it is the mental model for digital Urdu song lyrics.

Figure 9: Two Mobile Display Options

Figma prototypes were created to test the necessity of all ideated features and to provide users with mobile-first layouts, aligning with their habit of songwriting on phones. To evaluate different display options for Roman Urdu and Urdu lyrics, participants were given two distinct prototypes: one where both lyric types coexisted on the same page, and another where users could toggle between the two.

Usability Testing Findings

Figure 10: Remote testing of MVP with a participant

Usability testing was conducted with both musicians and music enthusiasts, as they are the most likely users to transcribe song lyrics. Remote testing was necessary for participants located in Pakistan, given the researcher's location in the US. Designs were updated in iterations.

Key Observations & User Feedback:

  • Transcription Accuracy: Participants were pleasantly surprised by the accuracy of the transcriptions. One user enthusiastically stated, “99% of the work has been done.” All participants strongly demanded the tool be launched as soon as possible.
  • Desktop Preference: The desktop view was preferred by participants, primarily because it offered a side-by-side comparison of different lyric types and more screen real estate for text editing, which felt cramped on mobile. Additionally, using a desktop streamlined the process of uploading lyrics to various platforms.
  • Correction Workflow: Participants demonstrated varied correction workflows. Some would first think of and correct the Urdu lyrics, then address the Roman Urdu version, while others started with Roman Urdu. This highlights a need for the ability to convert Urdu lyrics back into Roman Urdu after edits.
  • Enhanced English Transliteration for Urdu: Musicians specifically requested additional accent marks for English words converted into Urdu. This is because Pakistanis are unaccustomed to reading English transliterated into Urdu, and these marks would significantly improve comprehension.
  • Importance of Loading & Error States: Participants experienced confusion during the few seconds it took for songs to transcribe in Roman Urdu and convert to Urdu, or when errors occurred. Implementing clear loading and error states is crucial to provide feedback and prevent user uncertainty.
  • Urdu Keyboard Necessity: The ability to use an Urdu keyboard was identified as essential for participants to easily make minor edits and add accent marks.

Hi-Fi Development

Figure 11: GIFs of high fidelity website

A final version of the website was created after incorporating user feedback. The Home page now also provides clear guidance for installing Urdu and transliteration keyboards, tailored to the user’s specific platform.

Figure 12: Song Transcription Page

Key Feature Updates and Design Improvements

  • Refined Playback Controls: Rewind and fast-forward options were adjusted to 5 and 10 seconds, respectively, to enhance modularity and precision, addressing musician feedback that previous intervals were too large for accurate navigation.
  • Bidirectional Lyric Conversion: Users can now seamlessly convert Roman Urdu lyrics to Urdu and vice versa even after making edits, improving flexibility in their workflow.
  • Enhanced LLM Prompting: The Large Language Model (LLM) prompt was updated to align with user preferences when re-transcribing Roman Urdu and English into Urdu, ensuring more accurate and desired output.
  • New Song Upload Functionality: The "Choose New File" button was renamed to "Transcribe New Song" and relocated closer to other CTAs. This ensures users can easily find it without navigating back to the homepage.
  • Accessibility & Readability: Foreground and background colors were tested using a contrast checker to ensure accessibility. Additionally, the font size for Urdu lyrics was increased relative to Roman Urdu, improving overall readability.
  • Branding: Consistent branding was incorporated to enhance the website's aesthetic appeal and foster user trust through a positive halo effect.
  • Desktop-first Responsive Design: While users preferred a desktop-first design, a mobile version was also created to accommodate musicians who may lack laptop access, especially when touring.

Visual Captioning Design

This part of the study aims to establish Urdu song captioning guidelines through testing multilingual, multi-script captions with both Pakistani and Indian audiences. Indians were considered as they form a large part of the Pakistani music fan base as Urdu and Hindi have a large phonetic overlap. But Urdu and Hindi scripts are completely different. As a result Roman Urdu/Hindi becomes a common ground.

Given that musicians indicated short-form content as a primary driver for music discovery, it was crucial to design for this experience, especially considering the limited digital space on mobile screens.

Figure 13: Illustrating three lyric variations: solely Urdu, Urdu with Roman Urdu, and Urdu with English

Figure 14: Visual design evaluation overview

A mixed-methods study was conducted with Pakistanis and Indians music enthusiasts to do a visual design evaluation. This study also tested assumptions that musicians had provided in the primary research.

Visual and Emotional Impact of Urdu Script

  • Aesthetic and Cultural Resonance: Pakistanis described the Urdu script as adding a "classic touch" and expressed that seeing song lyrics in Urdu "feels like a homecoming."

Benefits of Dual-Script Display

  • Enhanced User Preference: Displaying both Urdu and Roman Urdu lyrics together was generally not distracting. This allowed Pakistanis to switch between scripts based on their personal preference, improving accessibility and engagement.
  • Indian Audience Acceptance: Contrary to concerns, Indian participants enjoyed seeing Urdu alongside Romanized script. They found it helped identify linguistic origin and added to the song's "vibe," rather than alienating them.

Optimizing Script Presentation

  • Roman Urdu for Bilingual Songs: Pakistanis are unaccustomed to reading English when transliterated into Urdu script. Reading both Urdu and English scripts together creates too much cognitive load, making Roman Urdu the preferred choice for bilingual songs.
  • Limiting Script Count: Participants found that displaying three scripts/languages (Roman, Urdu, and Hindi) was overwhelming. Instead, two video formats are recommended: one with Hindi and Romanized lyrics, and another with Urdu and Romanized lyrics. This approach ensures neither Pakistani nor Indian audiences are alienated.

Factors Influencing Script Preference

  • Vocabulary Complexity: Roman Urdu aided comprehension when the lyrics contained complex vocabulary.
  • Lyric Segmentation: Urdu captions required more segmentation for complex lyrics to maintain readability.
  • Song Tempo: Urdu script was preferred for slower songs.
  • Genre Influence: For songs perceived as classical, Urdu script was preferred over Roman Urdu. But genre had a smaller influence relative to other factors. Musicians believed that audience members would prefer Roman Urdu script unless their project was perceived as classical.

Conclusion

Limitations & Future Works

The study faced challenges in recruiting Urdu-medium artists and audience members, primarily because the researcher was located in the US during the study period. As a result, the findings are biased towards English-medium participants and may not fully represent the general population.

Due to time constraints, the number of participants for testing was limited. However, there are plans to expand participant recruitment after the product launch. Future development also includes extending the transcription tool to support other regional languages, such as Punjabi, Sindhi, and Pashto.

Finally, the multi-script captioning guidelines require more comprehensive development before they are disseminated. The goal is to share them through the Pakistani music magazine Hamnawa, which boasts a substantial readership of musicians.

Reflection

Gaana Mangwao was possible because of the foundational work done by others before me and all of my prior experiences. It may seem obvious, but this project would have turned out very differently without the supervision of Dr. Rua Williams at Purdue; advice of Zeerak Ahmed, founder of Matnsaz and Hamnawa; Sheikh Ahmed's work at Musixmatch; Usama Bin Shafqat's experiments with LLMs; and all the musicians who were willing to give me their time despite a 10-hour time difference. As it turns out, AI can be used in a meaningful way. However, my biggest takeaway from this project is the importance of building a community.

other work