VR Language Learning
A VR application to assist with verbal language learning using AI Virtual Assistant
Role
VR Prototyper, Developer
Overview
VR Language Learning with AI Virtual Assistant is a VR application that assists verbal language learning by allowing users to have real-time conversation with a AI Virtual Assistant in their desired language.
This project is done as a part of my Master’s coursework – Advances in XR.
Team
Daiyi Peng
Mengying Li
Timeline
Apr – May 2023
Tools
Unity
Oculus Quest
Figma
Background
Previous study has shown that, compared to traditional ways of memorizing vocabulary & reading textbooks, the interactive experience and highly immersive environment of VR can boost the efficiency of language learning.
There has been several methods to use VR in language learning, such as putting users in a context related to the vocabulary; tour, lead the user all the way on tour and introduce the vocabulary; or conversation, virtual human talk to the user, then present users with several options that user can choose from. However, the study contents are predefined and there are only selectable options, can not learn things out of the scope. Furthermore, users can just memorize the question and the answer after several practices without really thinking about what they are saying.
The concept
From the research, we decided to develop a VR application that assist with verbal language learning by applying the latest generative AI language models (particularly focus on ChatGPT by OpenAI). The application will allow users to have real-time verbal conversation to a AI character in a predefined virtual context.
Techniques
Visual
- Menu UI and application flow designed using Figma
- Main scene settings used prefabs from Unity Asset Store
- Character animations from mixamo.com
Real-time conversation understanding
- Speech recognition with Meta Voice SDK
- Insert GPT-turbo-3.5 using OpenAI API
Text-to-Speech API requests with multiple languages
- Using Text-to-Speech in Meta Voice SDK based on Wit.ai
- Hear the answer from ChatGPT while showing the real-time text sentence by sentence
Application flow
Step #1
Choose character
Step #2
Choose language
Step #3
Settings
Features showcase
Menu settings
- Users can select different characters and languages, as well as enable/disable transcription
- For transcription settings, the user can also choose to use the X button on the left joystick controller as a shortcut in the Main scene
(video coming soon)
Movement in the scene
When the user moves in the scene, the character as well as two transcription boxes follow the camera’s movement and rotation.
We made the position change quite smooth to make the user not feel too dizzy. We also add a small latency so that if the user quickly turns left or right, the character won’t follow in a second.
Real-time conversation
The conversation is currently available in English, Chinese, French, and Spanish (see demo video).
In the conversation, we added several features to make the language-learning process more realistic and flexible.
- Speech-to-Text: Transcribe the user’s speech to text and send it to GPT API (currently we use GPT-3.5). User can see real-time transcription in the UI box.
- Text-to-Speech (TTS): we use the same Meta Voice SDK to transcribe the answer from GPT to audio.
- Animation sync: The character will enter the talking animation mode with some body gestures when the audio is playing, which could make the user feel more realistic in the conversation.
Emotion-based animation
We added the emotion analysis feature so that GPT would respond with an emotion keyword (“Happy”, “Sad”, and “Calm”). Correspondingly, the character will do animations based on these emotions, particularly “Laughing” for “Happy”, “Shaking Head” for “Sad”, and “Thinking” for “Calm” (see demo video).
Note: The emotion prompt will not appear either in the transcription or in the audio.
Save user history
The app will save the user’s history cache so that users can see their system settings for the last time even if they exit and restart the app.
Team member contribution
Jamie Dinh
- Interactive menu UI and application flow
- Main scene UI settings and locomotion: joystick control steer
- Character animations
Mengying Li
- Text-to-Speech feature and Speech response sync with character animations
- Sentence parse for GPT response
- Character relative position to the camera and follow the camera movement
Daiyi Peng
- Speech-to-Text feature, GPT API request and fine-tuning
- Multi-language model adaptation & multi-language interface build
- Menu functions implementation, e.g., enable/disable transcription, toggle characters, etc.
Future works
- More learning contexts base on user’s prompts
- More languages integrated
- Virtual assistants change facial expressions/body languages based on simple analysis of real-time conversations
- UI improvement: dynamic dialog size based on response length, menu pop-up in main scene, add colliders to objects in scene
- Reduce locomotion sickness