Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What kind of server #49

Open
towfiqi opened this issue May 9, 2019 · 8 comments
Open

What kind of server #49

towfiqi opened this issue May 9, 2019 · 8 comments

Comments

@towfiqi
Copy link

towfiqi commented May 9, 2019

Hi,

What type of server config would it need to process/decode 10 concurrent speech recognition? How many cores and ram? Not training only for decoding.

Thanks

@realill
Copy link
Contributor

realill commented May 9, 2019

Few years ago, when I tested this on usual EC2 instance, it was 1 core can manage one concurrent speech recognition in realtime. So 10 cores for 10 concurrent speech recognition could be your target.

But this is model dependent. There could be slower or faster models.

@towfiqi
Copy link
Author

towfiqi commented May 9, 2019

Thanks a lot. Can nivida gpu with cuda cores can handle more concurrent processes? Interesting Update

@realill
Copy link
Contributor

realill commented May 9, 2019

As far as I know GPU used exclusively for training.

@towfiqi
Copy link
Author

towfiqi commented May 9, 2019

That was my understanding too. But I am confused how they used GPU for interference and achieved 9.2x preformance result:

Performance

"One experiment with clean data achieved speech-to-text inferencing 3,524x faster than real-time processing using an NVIDIA Tesla V100."

@realill
Copy link
Contributor

realill commented May 9, 2019

You should address this question to kaldi project developers. This server was developed a while ago and not really in sync with latest kaldi developments.

@towfiqi
Copy link
Author

towfiqi commented May 10, 2019

Thank You. According to the post mentioned below, usually 1 cpu(worker) can only serve 1 connection at a time. But if the server shares the same decoding graph for every connection, it can serve 10 concurrent connection per cpu.

Read the last comment of this thread

Do you have any idea if this asr-server shares the decoding graph? Or If its possible to implement this into it?

Thanks

@realill
Copy link
Contributor

realill commented May 10, 2019

ASR server keep one model in memory and serves it. It work as HTTP based wrapper around kaldi decoder. You can use it as codebase for your solution, but you likely want to modify it to be used with more modern versions of kaldi decoders.

@towfiqi
Copy link
Author

towfiqi commented May 11, 2019

Thank you for all your help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants