VR Language Learning with AI – Jamie Dinh

VR Language Learning

A VR application to assist with verbal language learning using AI Virtual Assistant

Role

VR Prototyper, Developer

Overview

VR Language Learning with AI Virtual Assistant is a VR application that assists verbal language learning by allowing users to have real-time conversation with a AI Virtual Assistant in their desired language.

This project is done as a part of my Master’s coursework – Advances in XR.

Team

Daiyi Peng

Mengying Li

Timeline

Apr – May 2023

Tools

Unity

Oculus Quest

Figma

Background

Previous study has shown that, compared to traditional ways of memorizing vocabulary & reading textbooks, the interactive experience and highly immersive environment of VR can boost the efficiency of language learning.

There has been several methods to use VR in language learning, such as putting users in a context related to the vocabulary; tour, lead the user all the way on tour and introduce the vocabulary; or conversation, virtual human talk to the user, then present users with several options that user can choose from. However, the study contents are predefined and there are only selectable options, can not learn things out of the scope. Furthermore, users can just memorize the question and the answer after several practices without really thinking about what they are saying.

The concept

From the research, we decided to develop a VR application that assist with verbal language learning by applying the latest generative AI language models (particularly focus on ChatGPT by OpenAI). The application will allow users to have real-time verbal conversation to a AI character in a predefined virtual context.

Techniques

Visual

Menu UI and application flow designed using Figma
Main scene settings used prefabs from Unity Asset Store
Character animations from mixamo.com

Real-time conversation understanding

Speech recognition with Meta Voice SDK
Insert GPT-turbo-3.5 using OpenAI API

Text-to-Speech API requests with multiple languages

Using Text-to-Speech in Meta Voice SDK based on Wit.ai
Hear the answer from ChatGPT while showing the real-time text sentence by sentence

Application flow

Step #1

Choose character

Step #2

Choose language

Step #3

Settings

Step #4

Main scene in play

Features showcase

Menu settings

Users can select different characters and languages, as well as enable/disable transcription
For transcription settings, the user can also choose to use the X button on the left joystick controller as a shortcut in the Main scene

(video coming soon)

Movement in the scene

When the user moves in the scene, the character as well as two transcription boxes follow the camera’s movement and rotation.

We made the position change quite smooth to make the user not feel too dizzy. We also add a small latency so that if the user quickly turns left or right, the character won’t follow in a second.

Real-time conversation

The conversation is currently available in English, Chinese, French, and Spanish (see demo video).

In the conversation, we added several features to make the language-learning process more realistic and flexible.

Speech-to-Text: Transcribe the user’s speech to text and send it to GPT API (currently we use GPT-3.5). User can see real-time transcription in the UI box.
Text-to-Speech (TTS): we use the same Meta Voice SDK to transcribe the answer from GPT to audio.
Animation sync: The character will enter the talking animation mode with some body gestures when the audio is playing, which could make the user feel more realistic in the conversation.

Emotion-based animation

We added the emotion analysis feature so that GPT would respond with an emotion keyword (“Happy”, “Sad”, and “Calm”). Correspondingly, the character will do animations based on these emotions, particularly “Laughing” for “Happy”, “Shaking Head” for “Sad”, and “Thinking” for “Calm” (see demo video).

Note: The emotion prompt will not appear either in the transcription or in the audio.

Save user history

The app will save the user’s history cache so that users can see their system settings for the last time even if they exit and restart the app.

Team member contribution

Jamie Dinh

Interactive menu UI and application flow
Main scene UI settings and locomotion: joystick control steer
Character animations

Mengying Li

Text-to-Speech feature and Speech response sync with character animations
Sentence parse for GPT response
Character relative position to the camera and follow the camera movement

Daiyi Peng

Speech-to-Text feature, GPT API request and fine-tuning
Multi-language model adaptation & multi-language interface build
Menu functions implementation, e.g., enable/disable transcription, toggle characters, etc.

Future works

More learning contexts base on user’s prompts
More languages integrated
Virtual assistants change facial expressions/body languages based on simple analysis of real-time conversations
UI improvement: dynamic dialog size based on response length, menu pop-up in main scene, add colliders to objects in scene
Reduce locomotion sickness

Dribbble

Github