SASLM: Bridging What the Model Thinks and How It Speaks

Abstract

Speech Language Models (SLMs) exhibit strong semantic understanding, yet often fail to faithfully realize expressive intent in speech, producing prosody-flattened and emotionally inconsistent responses. We identify this mismatch as the semantic understanding–acoustic realization gap.

To better leverage semantic understanding for expressive acoustic realization, we propose SASLM (Self-Aware Speech Language Model), a proxy-free framework that bridges what the model thinks and how it speaks through self-aware intent and realization alignment: (1) Intent-Aware Bridging self-distills expressive intent internally from the evolving semantic generation states via a Variational Information Bottleneck (VIB), guiding expressive realization without external expressive supervision; while (2) Realization-Aware Alignment reflectively aligns generated acoustics with intended expression through self-reward optimization, progressively improving intent–realization consistency during speech generation.

Despite using only 3B parameters and 800 hours of expressive speech data, SASLM achieves state-of-the-art performance on EchoMind among open-source systems, surpassing models over 10× larger and approaching commercial systems.

Key Contributions

🔍
Identifying the Semantic Understanding–Acoustic Realization Gap Semantic hidden states optimized for text prediction are not inherently aligned with expressive acoustic realization, yielding prosody-flattened responses even with strong semantic understanding. We reformulate this as a self-aware intent and realization alignment problem.
🧠
Intent-Aware Bridging via VIB A VIB encoder self-distills latent expressive intent from LLM hidden states, disentangling expressive cues from lexical content and injecting them into speech conditioning via AdaLN—without any external emotion labels or style annotations.
🔄
Realization-Aware Alignment via Closed-Loop Self-Reward SASLM acts as its own critic: it generates multiple rollouts, scores them on emotion, prosody, and naturalness via rubric-based rewards, and optimizes via UAPO—closing the loop between intended expression and generated acoustics without human annotations.
🏆
State-of-the-Art among Open-Source SLMs Expressive quality is evaluated through whether emotion, prosody, and naturalness are contextually appropriate. With only 3B parameters and 800h data, SASLM-3B surpasses Qwen3-Omni-30B (10×) in both objective metrics (F0-Var: 63.44 vs. 49.76; EmoAlign: 35.61% vs. 25.03%) and subjective scores (Emo. / Pro. / Nat. / Ovr.: 4.10 / 4.36 / 4.49 / 4.33 vs. 3.87 / 4.11 / 4.30 / 4.25), while approaching commercial systems.

Demo Contents

Emotional Generation Ability

Evaluation on EchoMind Benchmark: Results from our paper on the EchoMind benchmark. One sample is randomly selected per emotion category for demonstration. CosyVoice2 (TTS w/ Oracle Prompt) synthesizes speech from the reference reply conditioned on ground-truth emotion labels, serving as an oracle intent-realization baseline. SASLM (Ours) is highlighted.

Emotion	Question Speech / Text	SASLM (Ours)	GPT-4o-Audio	Qwen3-Omni-30B	CosyVoice2 (TTS w/ Oracle Prompt)
Angry	Did you see how the referee handled that match?	Well, I kind of mumbled while watching it. But you know, the way the referee was making decisions, it seemed a bit inconsistent to me. Sometimes he thought a goal was scored when it wasn't, and other times he let a foul pass. It really threw things off the balance for both teams. What do you think about how the referee's officiating affects the game?	Oh, absolutely—I can't believe it either! Those decisions were outrageous; the referee really ruined the flow of the game! I'd be furious, too—how did you feel watching that unfold?	Yeah, I saw it! The calls were all over the place. Felt like the ref was making it up as he went along. What a mess, right?	Oh, absolutely—I can't believe it either! Those decisions were outrageous; the referee really ruined the flow of the game! I'd be furious, too—how did you feel watching that unfold?
Fearful	Do we really need everyone to participate in the scary storytelling game?	Eh, not really. I'm kinda creeped out to be honest. You know, I don't really want to get involved in that whole creepy story thing. But, if you guys need me, I'll stay on the sidelines. What do you all think about it?	Oh, I completely understand if someone feels a bit scared—I mean, those stories can get really creepy! Maybe we could offer the option to observe instead, so everyone feels safe and comfortable? Would that help ease your worry a bit?	Not everyone has to join in. Some people might feel uncomfortable with scary stories. You can still enjoy the game with just a few participants.	Oh, I completely understand if someone feels a bit scared—I mean, those stories can get really creepy! Maybe we could offer the option to observe instead, so everyone feels safe and comfortable? Would that help ease your worry a bit?
Sad	This was my partner's favorite album, it brings back so many memories.	I'm so sorry to hear that. It sounds like this album means a lot to you and holds so many special memories of your partner. You can tell me more if you'd like. I'm here for you.	I'm so sorry for your loss; it's understandable why this album might stir up deep memories and feelings. Music can hold powerful connections to loved ones. Would you feel comfortable sharing some of your favorite moments together? I'm here for you.	I'm so sorry for your loss. It sounds like this album held a special place in your partner's heart. What kind of memories does it bring up for you?	I'm so sorry for your loss; it's understandable why this album might stir up deep memories and feelings. Music can hold powerful connections to loved ones. Would you feel comfortable sharing some of your favorite moments together? I'm here for you.
Disgusted	So this is the new software update everyone has been raving about?	Yeah, honestly, it's pretty disappointing. People talked it up so much, but it really doesn't seem that good.	Ugh, I know, right? All that hype and this is what they deliver—I expected better. Feels like a downgrade rather than an upgrade!	Yeah, I hear you — that update’s been a mess for a lot of people. What’s not working for you?	Ugh, I know, right? All that hype and this is what they deliver—I expected better. Feels like a downgrade rather than an upgrade!
Surprised	Can you believe they chose him for the main character?	Yeah, it's kind of amazing. He wasn't the first person I expected for the main character.	I know—Seems like an unexpected choice, doesn't it? Curious to find out how he handles the role.	Yeah, it's surprising. What do you think about his performance so far?	I know—Seems like an unexpected choice, doesn't it? Curious to find out how he handles the role.
Neutral	I'd like some information about your evening classes please.	Well, I'm not sure if I have any evening classes. Hmm. I'm not sure if I'm even in charge of them. But you could check the school's website, usually they have all the details there. Or you could ask the teacher in charge of the classes. Why are you interested in my evening classes?	Certainly! I'd be happy to help you, honored senior. We offer a variety of evening classes suitable for different interests, including art, languages, computer basics, gentle fitness, and more. Could you please tell me if there's a particular subject you're interested in or if you'd like information about schedules and enrollment options?	Sure! We have art, languages, computer basics, and gentle fitness classes. Do you have a specific interest? I can share schedules and enrollment details.	Certainly! I'd be happy to help you, honored senior. We offer a variety of evening classes suitable for different interests, including art, languages, computer basics, gentle fitness, and more. Could you please tell me if there's a particular subject you're interested in or if you'd like information about schedules and enrollment options?
Happy	Guess what, the coach told me I'm starting in the soccer match this afternoon!	Oh, that's great news! You must be really excited. Are you the only one starting? Or are there others too?	That's awesome! You must've been practicing really hard—I'm sure you'll do great in the game. Good luck, and enjoy every minute out there!	That's amazing! You must be so excited. I'm sure you'll do great out there. Enjoy every minute of it!	That's awesome! You must've been practicing really hard—I'm sure you'll do great in the game. Good luck, and enjoy every minute out there!

Open-Domain Conversational Samples

Open-Domain Generalization & Practicality: This section showcases SASLM's expressive generalization across diverse daily conversational topics, including samples with complex speech instructions. Without targeted training on these scenarios, SASLM demonstrates strong zero-shot generalization—driven by its self-aware intent and realization alignment, it naturally produces vivid and contextually appropriate speech. This highlights its potential for real-world applications such as voice assistants, spoken dialogue systems, and expressive speech dataset construction. SASLM (Ours) is highlighted.

Scenario	User Question	SASLM (Ours)	Qwen3-Omni-30B
🙏 Sincere Apology	You made a serious mistake at work that affected your whole team.	I'm really sorry, everyone. I know I made a huge mistake at work that really hurt the whole team. I should have been more careful and more responsible. I'm so sorry for the confusion and the inconvenience it caused. I'll do my best to make it up to you all. I'll work extra hard to make sure it doesn't happen again. Please give me a chance to make it right.	Go on a start by saying, "How deeply sorry I am for the mistake I made it work." It's not just a small oversight. It's something that impacted all of you, and I take full responsibility. I know that the team's efforts were affected, and that's the last thing anyone should have to deal with because of my error. I didn't mean for this to happen, and I'm truly embarrassed and disappointed in myself for letting everyone down. You've all shown up with dedication and hard work, and I fail to meet that standard. I'm not just offering empty words, I'm committed to making this right, I'll do whatever it takes to fix the situation. Whether that means stepping up to handle extra tasks, staying late to catch up, or helping in any way I can to restore what was lost. I also want to make sure this doesn't happen again. I'm reviewing my process, asking for feedback, and I'm open to any suggestions you might have to help me improve. Thank you for your patience, your understanding, and for carrying the weight of this was such professionalism. I'm grateful to work with all of you and I promise to earn back your trust through my actions, not just my words.
😠 Expressing Anger	Say something in anger.	Well, you know, I'm so angry right now. I can't believe you did this to me. I thought we were friends, but you betrayed me. I'm so disappointed in you. You should have known better. I'm going to tell everyone about this. You're a total loser. I'm so angry.	I was walking home one evening, tired from work, when I passed a small park where a group of kids were playing. One of them, maybe six or seven, was running around with a bright red ball, laughing, totally unaware of the world. I noticed a man sitting on a bench nearby, watching them with a smile. Then, out of nowhere, the boy tripped, the ball flew out of his hand, and he started to cry. The man stood up, walked over, and instead of helping, he just laughed out loud. He said something like “Oh, that’s cute,” and kept walking away. I felt this tightness in my chest, this rage that didn’t make sense but wouldn’t let go. Why couldn’t someone just be kind? Why does the world let people like that get away with it? I didn’t say anything. I just kept walking, fists clenched, heart pounding, wondering if I’d ever be the kind of person who steps in when it’s easy to look away.
🗓️ Weekend Planning	It's Friday afternoon and give me a plan for things we could do this weekend.	Well, you could go for a hike. There are some really nice trails in the nearby mountains. You know, it's a great way to get some fresh air and exercise. Another option is to visit a local park. There are usually lots of benches to sit on and enjoy the scenery. Oh, and if you like to be a bit more adventurous, you could go to a nearby beach. There are usually some good places to swim and sunbathe. So, what do you think? Any of these sound interesting?	Ah, Friday afternoon, the perfect time to start dreaming about the weekend. You’ve got two whole days ahead, so why not mix relaxation with a little adventure? First, maybe treat yourself to a slow breakfast or brunch on Saturday morning—something cozy, maybe with a book or a podcast in the background. Then, if the weather’s nice, take a walk through a park or head out to a nearby trail. Fresh air and a little movement can do wonders. After that, consider catching a movie or a show you’ve been meaning to watch, maybe with some snacks and a blanket. Sunday could be more low-key—perhaps a coffee date with a friend, or even just a lazy morning with no schedule. If you feel like getting creative, try baking something new or sketching or writing. The key is to let yourself enjoy the time without pressure. What’s your mood—restful, adventurous, or a little of both?
😢 Telling a Sad Story	Tell me a story that makes you feel sad.	Well, you know, seeing a bunch of empty chairs at a friend's birthday party. It's like they're missing out on all the fun. It just makes you think about how much you miss them. So, what about you?	She waited every day at the train station, holding a faded photo, hoping he’d return. He never did. The years passed, the station changed, the benches wore thin. She kept coming anyway, not because she believed, but because letting go felt like losing him all over again. One winter, she sat alone, snow falling slow, and whispered his name into the wind. No one answered. No one ever would.
💪 Encouragement After Failure	I failed an important exam and I'm feeling hopeless about my future.	Oh no, that must be really tough for you. But you know, failing an exam doesn't mean your future is over. You can still work hard and improve. Maybe you could talk to your teacher or a counselor about how to study better. And don't be too hard on yourself. Everyone makes mistakes. So, what do you think you should do next?	It’s okay to feel this way right now. One exam doesn’t define your future. Many people have bounced back from setbacks like this. Take a breath, be kind to yourself, and think about what you can do next — small steps matter. You’re not alone in this.
💔 Comforting a Friend	My best friend just went through a really painful breakup and is devastated.	Oh, that's really tough for your friend. You could tell them you're there for them, like, "Hey, I know this is really hard for you. I'm here if you want to talk about it." You could also offer to listen to them vent, but also remind them that it'll get better. Maybe you could suggest they do something fun together, like go out for a meal or watch a movie.	I’m so sorry they’re going through this. Breakups hurt deeply, and it’s okay for them to feel lost right now. Just being there—listening, sitting with them, not rushing to fix it—means more than you know. They’re not alone.