...A Planet of Partners™

  • Increase font size
  • Default font size
  • Decrease font size

Patents

 
Method and an apparatus for processing an audio signal
Tue, 24 Mar 2015 08:00:00 EDT
An apparatus for processing an audio signal and method thereof are disclosed. The present invention includes receiving, by an audio processing apparatus, an audio signal including a first data of a first block encoded with rectangular coding scheme and a second data of a second block encoded with non-rectangular coding scheme; receiving a compensation signal corresponding to the second block; estimating a prediction of an aliasing part using the first data; and, obtaining a reconstructed signal for the second block based on the second data, the compensation signal and the prediction of aliasing part.
Coding and decoding a transient frame
Tue, 24 Mar 2015 08:00:00 EDT
An electronic device for coding a transient frame is described. The electronic device includes a processor and executable instructions stored in memory that is in electronic communication with the processor. The electronic device obtains a current transient frame. The electronic device also obtains a residual signal based on the current transient frame. Additionally, the electronic device determines a set of peak locations based on the residual signal. The electronic device further determines whether to use a first coding mode or a second coding mode for coding the current transient frame based on at least the set of peak locations. The electronic device also synthesizes an excitation based on the first coding mode if the first coding mode is determined. The electronic device also synthesizes an excitation based on the second coding mode if the second coding mode is determined.
Visualizing, navigating and interacting with audio content
Tue, 24 Mar 2015 08:00:00 EDT
Methods and arrangements for visually representing audio content in a voice application. A display is connected to a voice application, and an image is displayed on the display, the image comprising a main portion and at least one subsidiary portion, the main portion representing a contextual entity of the audio content and the at least one subsidiary portion representing at least one participatory entity of the audio content. The at least one subsidiary portion is displayed without text, and the image is changed responsive to changes in audio content in the voice application.
Voice recognition device
Tue, 24 Mar 2015 08:00:00 EDT
A voice recognition device includes a voice recognition dictionary in which a word which is recognized as a result of voice recognition on an inputted voice is registered, a reply voice data storage unit for storing recorded voice data about words registered in the voice recognition dictionary, a dialog control unit for, when a word registered in the voice recognition dictionary is recognized, acquiring recorded voice data corresponding to the word from the reply voice data storage unit, a reproduction noise reduction unit for carrying out a process of reducing noise included in the recorded voice data, an amplitude adjusting unit for adjusting an amplitude of the recorded voice data in which the noise has been reduced to a predetermined amplitude level, and a voice reproduction unit for reproducing a voice from the amplitude-adjusted recorded voice data.
Parsimonious protection of sensitive data in enterprise dialog systems
Tue, 24 Mar 2015 08:00:00 EDT
In one embodiment, a method comprises classifying a representation of audio data of a dialog turn in a dialog system to a classification. The method may further comprise taking a security action on the classified representation of the audio data of the dialog turn as a function of the classification. The security action can be suppressing the representation of the audio data, encrypting the representation of the audio data, releasing the representation of the audio data, partially suppressing the representation of the audio data, partially encrypting the representation of the audio data, partially releasing the representation of the audio data, or a command.
Script compliance using speech recognition
Tue, 24 Mar 2015 08:00:00 EDT
A system and method for evaluating the compliance of an agent reading a script to a client comprises conducting a voice interaction between the agent and the client wherein the agent follows a script, and dividing data representing a portion of the voice interaction into a plurality of panels after being spoken by the agent, wherein the panels correspond to respective sections of the script, wherein the dividing is based upon timestamps of the panels, and wherein the panels correspond to a single offer of a good or service.
Text to speech synthesis for texts with foreign language inclusions
Tue, 24 Mar 2015 08:00:00 EDT
A speech output is generated from a text input written in a first language and containing inclusions in a second language. Words in the native language are pronounced with a native pronunciation and words in the foreign language are pronounced with a proficient foreign pronunciation. Language dependent phoneme symbols generated for words of the second language are replaced with language dependent phoneme symbols of the first language, where said replacing includes the steps of assigning to each language dependent phoneme symbol of the second language a language independent target phoneme symbol, mapping to each one language independent target phoneme symbol a language independent substitute phoneme symbol assignable to a language dependent substitute phoneme symbol of the first language, substituting the language dependent phoneme symbols of the second language by the language dependent substitute phoneme symbols of the first language.
Tool and framework for creating consistent normalization maps and grammars
Tue, 24 Mar 2015 08:00:00 EDT
A runtime framework and authoring tool are provided for enabling linguistic experts to author text normalization maps and grammar libraries without requiring high level of technical or programming skills. Authors define or select terminals, map the terminals, and define rules for the mapping. The tool enables an author to validate their work, by executing the map in the same way the recognition engine does, causing consistency in results from authoring to user operations. The runtime is used by the speech engines and by the tools to provide consistent normalization for supported scenarios.
Providing text to speech from digital content on an electronic device
Tue, 24 Mar 2015 08:00:00 EDT
A method for providing text to speech from digital content in an electronic device is described. Digital content including a plurality of words and a pronunciation database is received. Pronunciation instructions are determined for the word using the digital content. Audio or speech is played for the word using the pronunciation instructions. As a result, the method provides text to speech on the electronic device based on the digital content.
Recognition confidence measuring by lexical distance between candidates
Tue, 24 Mar 2015 08:00:00 EDT
A recognition confidence measurement method, medium and system which can more accurately determine whether an input speech signal is an in-vocabulary, by extracting an optimum number of candidates that match a phone string extracted from the input speech signal and estimating a lexical distance between the extracted candidates is provided. A recognition confidence measurement method includes: extracting a phoneme string from a feature vector of an input speech signal; extracting candidates by matching the extracted phoneme string and phoneme strings of vocabularies registered in a predetermined dictionary and; estimating a lexical distance between the extracted candidates; and determining whether the input speech signal is an in-vocabulary, based on the lexical distance.
System and method for handling repeat queries due to wrong ASR output by modifying an acoustic, a language and a semantic model
Tue, 24 Mar 2015 08:00:00 EDT
Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for handling expected repeat speech queries or other inputs. The method causes a computing device to detect a misrecognized speech query from a user, determine a tendency of the user to repeat speech queries based on previous user interactions, and adapt a speech recognition model based on the determined tendency before an expected repeat speech query. The method can further include recognizing the expected repeat speech query from the user based on the adapted speech recognition model. Adapting the speech recognition model can include modifying an acoustic model, a language model, and a semantic model. Adapting the speech recognition model can also include preparing a personalized search speech recognition model for the expected repeat query based on usage history and entries in a recognition lattice. The method can include retaining unmodified speech recognition models with adapted speech recognition models.
Method of active learning for automatic speech recognition
Tue, 24 Mar 2015 08:00:00 EDT
State-of-the-art speech recognition systems are trained using transcribed utterances, preparation of which is labor-intensive and time-consuming. The present invention is an iterative method for reducing the transcription effort for training in automatic speech recognition (ASR). Active learning aims at reducing the number of training examples to be labeled by automatically processing the unlabeled examples and then selecting the most informative ones with respect to a given cost function for a human to label. The method comprises automatically estimating a confidence score for each word of the utterance and exploiting the lattice output of a speech recognizer, which was trained on a small set of transcribed data. An utterance confidence score is computed based on these word confidence scores; then the utterances are selectively sampled to be transcribed using the utterance confidence scores.
System and method for generating personal vocabulary from network data
Tue, 24 Mar 2015 08:00:00 EDT
A method is provided in one example and includes receiving data propagating in a network environment, and identifying selected words within the data based on a whitelist. The whitelist includes a plurality of designated words to be tagged. The method further includes assigning a weight to the selected words based on at least one characteristic associated with the data, and associating the selected words to an individual. A resultant composite is generated for the selected words that are tagged. In more specific embodiments, the resultant composite is partitioned amongst a plurality of individuals associated with the data propagating in the network environment. A social graph can be generated that identifies a relationship between a selected individual and the plurality of individuals based on a plurality of words exchanged between the selected individual and the plurality of individuals.
Non-scorable response filters for speech scoring systems
Tue, 24 Mar 2015 08:00:00 EDT
A method for scoring non-native speech includes receiving a speech sample spoken by a non-native speaker and performing automatic speech recognition and metric extraction on the speech sample to generate a transcript of the speech sample and a speech metric associated with the speech sample. The method further includes determining whether the speech sample is scorable or non-scorable based upon the transcript and speech metric, where the determination is based on an audio quality of the speech sample, an amount of speech of the speech sample, a degree to which the speech sample is off-topic, whether the speech sample includes speech from an incorrect language, or whether the speech sample includes plagiarized material. When the sample is determined to be non-scorable, an indication of non-scorability is associated with the speech sample. When the sample is determined to be scorable, the sample is provided to a scoring model for scoring.
Method of analysing an audio signal
Tue, 24 Mar 2015 08:00:00 EDT
A method of analyzing an audio signal is disclosed. A digital representation of an audio signal is received and a first output function is generated based on a response of a physiological model to the digital representation. At least one property of the first output function may be determined. One or more values are determined for use in analyzing the audio signal, based on the determined property of the first output function.
Techniques to normalize names efficiently for name-based speech recognition grammars
Tue, 24 Mar 2015 08:00:00 EDT
Techniques to normalize names for name-based speech recognition grammars are described. Some embodiments are particularly directed to techniques to normalize names for name-based speech recognition grammars more efficiently by caching, and on a per-culture basis. A technique may comprise receiving a name for normalization, during name processing for a name-based speech grammar generating process. A normalization cache may be examined to determine if the name is already in the cache in a normalized form. When the name is not already in the cache, the name may be normalized and added to the cache. When the name is in the cache, the normalization result may be retrieved and passed to the next processing step. Other embodiments are described and claimed.
Automatic calibration of command-detection thresholds
Tue, 24 Mar 2015 08:00:00 EDT
When a voice-activated device or application is first started, the signal levels corresponding to spoken commands are initially unknown, making it difficult to set detection thresholds. The inventive method provides an initial command-detection threshold based on the noise level alone. The first command is detected using this initial threshold. Then the threshold is revised according to the first command sound, and a second command is detected using the revised threshold. After detecting each command, the detection threshold is further refined according to the current noise and command sounds. Methods are also disclosed for optimizing the thresholds, adjusting parameters according to sound, and detecting voiced and unvoiced sounds separately. These capabilities enable many emerging voice-activated products and applications.
Information presentation device associated with sound source separation
Tue, 24 Mar 2015 08:00:00 EDT
An information presentation device includes an audio signal input unit configured to input an audio signal, an image signal input unit configured to input an image signal, an image display unit configured to display an image indicated by the image signal, a sound source localization unit configured to estimate direction information for each sound source based on the audio signal, a sound source separation unit configured to separate the audio signal to sound-source-classified audio signals for each sound source, an operation input unit configured to receive an operation input and generates coordinate designation information indicating a part of a region of the image, and a sound source selection unit configured to select a sound-source-classified audio signal of a sound source associated with a coordinate which is included in a region indicated by the coordinate designation information, and which corresponds to the direction information.
Method and system for sharing portable voice profiles
Tue, 24 Mar 2015 08:00:00 EDT
An embodiment of the present invention provides a speech recognition engine that utilizes portable voice profiles for converting recorded speech to text. Each portable voice profile includes speaker-dependent data, and is configured to be accessible to a plurality of speech recognition engines through a common interface. A voice profile manager receives the portable voice profiles from other users who have agreed to share their voice profiles. The speech recognition engine includes speaker identification logic to dynamically select a particular portable voice profile, in real-time, from a group of portable voice profiles. The speaker-dependent data included with the portable voice profile enhances the accuracy with which speech recognition engines recognize spoken words in recorded speech from a speaker associated with a portable voice profile.
Front-end difference coding for distributed speech recognition
Tue, 24 Mar 2015 08:00:00 EDT
In automated speech recognition (ASR), multiple devices may be employed to perform the ASR in a distributed environment. To reduce bandwidth use in transmitting between devices ASR information is compressed prior to transmission. To counteract fidelity loss that may accompany such compression, two versions of an audio signal are processed by an acoustic front end (AFE), one version is unaltered and one is compressed and decompressed prior to AFE processing. The two versions are compared, and the comparison data is sent to a recipient for further ASR processing. The recipient uses the comparison data and a received version of the compressed audio signal to recreate the post-AFE processing results from the received audio signal. The result is improved ASR results and decreased bandwidth usage between distributed ASR devices.
Method, apparatus, and medium for bandwidth extension encoding and decoding
Tue, 24 Mar 2015 08:00:00 EDT
Provided are a method, apparatus, and medium for encoding/decoding a high frequency band signal by using a low frequency band signal corresponding to an audio signal or a speech signal. Accordingly, since the high frequency band signal is encoded and decoded by using the low frequency band signal, encoding and decoding can be carried out with a small data size while avoiding deterioration of sound quality.
Noise-robust speech coding mode classification
Tue, 24 Mar 2015 08:00:00 EDT
A method of noise-robust speech classification is disclosed. Classification parameters are input to a speech classifier from external components. Internal classification parameters are generated in the speech classifier from at least one of the input parameters. A Normalized Auto-correlation Coefficient Function threshold is set. A parameter analyzer is selected according to a signal environment. A speech mode classification is determined based on a noise estimate of multiple frames of input speech.
Method and device for sound activity detection and sound signal classification
Tue, 24 Mar 2015 08:00:00 EDT
A device and method for estimating a tonal stability of a sound signal include: calculating a current residual spectrum of the sound signal; detecting peaks in the current residual spectrum; calculating a correlation map between the current residual spectrum and a previous residual spectrum for each detected peak; and calculating a long-term correlation map based on the calculated correlation map, the long-term correlation map being indicative of a tonal stability in the sound signal.
Apparatus and method for encoding and decoding of integrated speech and audio utilizing a band expander to output the audio or speech to a frequency domain encoder or an LPC encoder
Tue, 24 Mar 2015 08:00:00 EDT
Provided are an apparatus and a method for integrally encoding and decoding a speech signal and a audio signal. The encoding apparatus may include: an input signal analyzer to analyze a characteristic of an input signal; a first conversion encoder to convert the input signal to a frequency domain signal, and to encode the input signal when the input signal is a audio characteristic signal; a Linear Predictive Coding (LPC) encoder to perform LPC encoding of the input signal when the input signal is a speech characteristic signal; a frequency band expander for expanding a frequency band of the input signal whose output is transmitted to either the first conversion encoder or the LPC encoder based on the input characteristic; and a bitstream generator to generate a bitstream using an output signal of the first conversion encoder and an output signal of the LPC encoder.
Telephony service interaction management
Tue, 24 Mar 2015 08:00:00 EDT
A method for managing an interaction of a calling party to a communication partner is provided. The method includes automatically determining if the communication partner expects DTMF input. The method also includes translating speech input to one or more DTMF tones and communicating the one or more DTMF tones to the communication partner, if the communication partner expects DTMF input.
Computer-based construction of arbitrarily complex formal grammar expressions
Tue, 24 Mar 2015 08:00:00 EDT
A method, system and computer program product for building an expression, including utilizing any formal grammar of a context-free language, displaying an expression on a computer display via a graphical user interface, replacing at least one non-terminal display object within the displayed expression with any of at least one non-terminal display object and at least one terminal display object, and repeating the replacing step a plurality of times for a plurality of non-terminal display objects until no non-terminal display objects remain in the displayed expression, wherein the non-terminal display objects correspond to non-terminal elements within the grammar, and wherein the terminal display objects correspond to terminal elements within the grammar.
Techniques for pruning phrase tables for statistical machine translation
Tue, 24 Mar 2015 08:00:00 EDT
A computer-implemented technique includes receiving, at a server including one or more processors, a phrase table for statistical machine translation, the phrase table including a plurality of phrase pairs corresponding to one or more pairs of languages. The technique includes determining, at the server, a redundant set of phrase pairs from the plurality of phrase pairs and calculating first and second probabilities for each specific phrase pair of the redundant set. The second probability can be based on third probabilities for sub-phrases of each specific phrase pair. The technique includes determining, at the server, one or more selected phrase pairs based on whether a corresponding second probability for a specific phrase pair is within a probability threshold from its corresponding first probability. The technique also includes removing, at the server, the one or more selected phrase pairs from the phrase table to obtain a modified phrase table.
Systems and methods for multi-user multi-lingual communications
Tue, 24 Mar 2015 08:00:00 EDT
Various embodiments described herein facilitate multi-lingual communications. The systems and methods of some embodiments enable multi-lingual communications through different modes of communication including, for example, Internet-based chat, e-mail, text-based mobile phone communications, postings to online forums, postings to online social media services, and the like. Certain embodiments implement communication systems and methods that translate text between two or more languages. Users of the systems and methods may be incentivized to submit corrections for inaccurate or erroneous translations, and may receive a reward for these submissions. Systems and methods for assessing the accuracy of translations are described.
Machine translation into a target language by interactively and automatically formalizing non-formal source language into formal source language
Tue, 24 Mar 2015 08:00:00 EDT
A machine translation method and system comprises the steps of (a) formalizing a non-formal source language in an interactive or automatic way and (b) transforming the formal source language into a formal or non-formal target language in an automatic way. It eliminates the language barrier between person and person and the language barrier between person and computer: A user translates his/her non-formal native language correctly and without lexical ambiguity into any non-formal foreign language which he/she knows nothing about; a user and a computer exchange information in his/her non-formal native language correctly and without lexical ambiguity. It can be used in network terminal equipment, Internet knowledge bases, knowledge reasoning search engines, expert systems and automatic programming. That formalization of a source language is the common foundation for transformation into various target languages makes it especially suitable for multilingual machine translation.
Resolving out-of-vocabulary words during machine translation
Tue, 24 Mar 2015 08:00:00 EDT
Some implementations provide techniques and arrangements to perform automated translation from a source language to a target language. For example, an out-of-vocabulary word may be identified and a morphological analysis may be performed to determine whether the out-of-vocabulary word reduces to at least one stem. If the out-of-vocabulary word reduces to a stem, the stem may be translated. The translated stem may be inflected if the out-of-vocabulary word is inflected. If the out-of-vocabulary word has any affixes, the affixes may be translated. In some cases, the translated affixes may be reordered before being combined with the inflected and translated stem. If the out-of-vocabulary word is misspelled, the spelling of the out-of-vocabulary word may be corrected before performing the morphological analysis. If the out-of-vocabulary word is a colloquial form of a formal word, the out-of-vocabulary word may be replaced with the formal word before performing the morphological analysis.

Language Selection

linkedin        twitter

Company Search


Advanced Search   


Services


logo inttranet


logo inttrastats


logo Inttranews


logo Inttrasearch


logo linguists of the year