...A Planet of Partners™

  • Increase font size
  • Default font size
  • Decrease font size

Patents

 
Analysis filterbank, synthesis filterbank, encoder, de-coder, mixer and conferencing system
Tue, 16 Dec 2014 08:00:00 EST
An embodiment of an analysis filterbank for filtering a plurality of time domain input frames, wherein an input frame comprises a number of ordered input samples, comprises a windower configured to generate a plurality of windowed frames, wherein a windowed frame comprises a plurality of windowed samples, wherein the windower is configured to process the plurality of input frames in an overlapping manner using a sample advance value, wherein the sample advance value is less than the number of ordered input samples of an input frame divided by two, and a time/frequency converter configured to provide an output frame comprising a number of output values, wherein an output frame is a spectral representation of a windowed frame.
Multiple web-based content category searching in mobile search application
Tue, 16 Dec 2014 08:00:00 EST
In embodiments of the present invention improved capabilities are described for multiple web-based content category searching for web content on a mobile communication facility comprising capturing speech presented by a user using a resident capture facility on the mobile communication facility; transmitting at least a portion of the captured speech as data through a wireless communication facility to a speech recognition facility; generating speech-to-text results for the captured speech utilizing the speech recognition facility; and transmitting the text results and a plurality of formatting rules specifying how search text may be used to form a query for a search capability on the mobile communications facility, wherein each formatting rule is associated with a category of content to be searched.
System and method for an N-best list interface
Tue, 16 Dec 2014 08:00:00 EST
Disclosed herein are systems, methods, and computer-readable storage media for providing an N-best list interface. A system practicing the method receives a search query formatted according to a standard language for containing and annotating interpretations of user input, the search query being based on a natural language spoken query from a user and retrieves an N-best list of recognition results based on the search query. The system then transmits the N-best list of recognition results to a user device, receives multimodal disambiguation input from the user, the input indicating an entry in the N-best list, and transmits to the user device additional information associated with the selected entry. The additional information can be a map indicating an address for the selected entry. The standard language can be XML-based Extensible MultiModal Annotation (EMMA) markup language from W3C.
System and method for an iterative disambiguation interface
Tue, 16 Dec 2014 08:00:00 EST
Disclosed herein are systems, methods, and computer-readable storage media for an iterative disambiguation interface. A system practicing the method receives a search query formatted according to a standard XML markup language for containing and annotating interpretations of user input, the search query being based on a natural language spoken query from a user and retrieves search results based on the search query. The system transmits the search results to a user device and iteratively receives multimodal input from the user to change search attributes and transmits updated search results to the user device based on the changed search attributes. The search results can include a link to additional information, such as a video presentation, related to the search results. The standard XML markup language can be Extensible MultiModal Annotation (EMMA) markup language from W3C. The system can generate an iteration transaction history for each multimodal input and updated search result.
Database query translation system
Tue, 16 Dec 2014 08:00:00 EST
In a method for translation of a medical database query from a first language into a second language, a query to be translated is received from a use of the medical database. A respective translation for the query from each of several translation engines is obtained, and a respective ranking score for each of the obtained translations is determined. The determined ranking scores are then utilized to select a translations from the several obtained translations. The selected translation is then provided to the user and/or is used to search the medical database to obtain search results for the query and the obtained search results are then provided to the user.
Audio encoder, audio decoder, method for encoding and audio information, method for decoding an audio information and computer program using an optimized hash table
Tue, 16 Dec 2014 08:00:00 EST
An audio decoder includes an arithmetic decoder for providing decoded spectral values on the basis of an arithmetically encoded representation thereof, and a frequency-domain-to-time-domain converter for providing a time-domain audio representation. The arithmetic decoder selects a mapping rule describing a mapping of a code value onto a symbol code representing a spectral value, or a most significant bit-plane thereof, in a decoded form, in dependence on a context state described by a numeric current context value. The arithmetic decoder determines the numeric current context value in dependence on a plurality of previously decoded spectral values. It evaluates a hash table, entries of which define both significant state values amongst the numeric context values and boundaries of intervals of numeric context values, in order to select the mapping rule, wherein the hash table ari_hash_m is defined as given in FIGS. 22(1), 22(2), 22(3) and 22(4).
Apparatus and method for providing messages in a social network
Tue, 16 Dec 2014 08:00:00 EST
A system that incorporates teachings of the present disclosure may include, for example, a server including a controller to receive audio signals and content identification information from a media processor, generate text representing a voice message based on the audio signals, determine an identity of media content based on the content identification information, generate an enhanced message having text and additional content where the additional content is obtained by the controller based on the identity of the media content, and transmit the enhanced message to the media processor for presentation on the display device, where the enhanced message is accessible by one or more communication devices that are associated with a social network and remote from the media processor. Other embodiments are disclosed.
System and method of providing an automated data-collection in spoken dialog systems
Tue, 16 Dec 2014 08:00:00 EST
The invention relates to a system and method for gathering data for use in a spoken dialog system. An aspect of the invention is generally referred to as an automated hidden human that performs data collection automatically at the beginning of a conversation with a user in a spoken dialog system. The method comprises presenting an initial prompt to a user, recognizing a received user utterance using an automatic speech recognition engine and classifying the recognized user utterance using a spoken language understanding module. If the recognized user utterance is not understood or classifiable to a predetermined acceptance threshold, then the method re-prompts the user. If the recognized user utterance is not classifiable to a predetermined rejection threshold, then the method transfers the user to a human as this may imply a task-specific utterance. The received and classified user utterance is then used for training the spoken dialog system.
Web browser implementation of interactive voice response instructions
Tue, 16 Dec 2014 08:00:00 EST
Web browser implementable instructions are generated from interactive voice instructions that are not natively interpreted by web browsers. Generating web browser implementable instructions in this manner allows for faster and cheaper deployment of voice, video, and/or data services by allowing legacy services based on interactive voice instructions to function seamlessly within an all data network.
Internal and external speech recognition use with a mobile communication facility
Tue, 16 Dec 2014 08:00:00 EST
In embodiments of the present invention improved capabilities are described for a user interacting with a mobile communication facility, where speech presented by the user is recorded using a mobile communication facility resident capture facility. The recorded speech may be recognized using an external speech recognition facility to produce an external output and a resident speech recognition facility to produce an internal output, where at least one of the external output and the internal output may be selected based on a criteria.
Method and apparatus for generating synthetic speech with contrastive stress
Tue, 16 Dec 2014 08:00:00 EST
Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.
Systems and methods for dynamically improving user intelligibility of synthesized speech in a work environment
Tue, 16 Dec 2014 08:00:00 EST
Method and apparatus that dynamically adjusts operational parameters of a text-to-speech engine in a speech-based system. A voice engine or other application of a device provides a mechanism to alter the adjustable operational parameters of the text-to-speech engine. In response to one or more environmental conditions, the adjustable operational parameters of the text-to-speech engine are modified to increase the intelligibility of synthesized speech.
Analyzing and processing a verbal expression containing multiple goals
Tue, 16 Dec 2014 08:00:00 EST
A method for parsing a verbal expression received from a user to determine whether or not the expression contains a multiple-goal command is described. Specifically, known techniques are applied to extract terms from the verbal expression. The extracted terms are assigned to categories. If two or more terms are found in the parsed verbal expression that are in associated categories and that do not overlap one another temporally, then the confidence levels of these terms are compared. If the confidence levels are similar, then the terms may be parallel entries in the verbal expression and may represent multiple goals. If a multiple-goal command is found, then the command is either presented to the user for review and possible editing or is executed. If the parsed multiple-goal command is presented to the user for review, then the presentation can be made via any appropriate interface including voice and text interfaces.
System and method for advanced turn-taking for interactive spoken dialog systems
Tue, 16 Dec 2014 08:00:00 EST
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for advanced turn-taking in an interactive spoken dialog system. A system configured according to this disclosure can incrementally process speech prior to completion of the speech utterance, and can communicate partial speech recognition results upon finding particular conditions. A first condition which, if found, allows the system to communicate partial speech recognition results, is that the most recent word found in the partial results is statistically likely to be the termination of the utterance, also known as a terminal node. A second condition is the determination that all search paths within a speech lattice converge to a common node, also known as a pinch node, before branching out again. Upon finding either condition, the system can communicate the partial speech recognition results. Stability and correctness probabilities can also determine which partial results are communicated.
Remote control audio link
Tue, 16 Dec 2014 08:00:00 EST
One embodiment may take the form of a voice control system. The system may include a first apparatus with a processing unit configured to execute a voice recognition module and one or more executable commands, and a receiver coupled to the processing unit and configured to receive a first audio file from a remote control device. The first audio file may include at least one voice command. The first apparatus may further include a communication component coupled to the processing unit and configured to receive programming content, and one or more storage media storing the voice recognition module. The voice recognition module may be configured to convert voice commands into text.
Speech recognition with hierarchical networks
Tue, 16 Dec 2014 08:00:00 EST
Provided are systems and methods for using hierarchical networks for recognition, such as speech recognition. Conventional automatic recognition systems may not be both efficient and flexible. Recognition systems are disclosed that may achieve efficiency and flexibility by employing hierarchical networks, prefix consolidation of networks, and future consolidation of networks. The disclosed networks may be associated with a network model and the associated network model may be modified during recognition to achieve greater flexibility.
Predicting a sales success probability score from a distance vector between speech of a customer and speech of an organization representative
Tue, 16 Dec 2014 08:00:00 EST
A computerized method for sales optimization including receiving at a computer server a digital representation of a portion of an interaction between a customer and an organization representative, the portion of an interaction comprises a speech signal of the customer and a speech signal of the organization representative; analyzing the speech signal of the organization representative; analyzing the speech signal of the customer; determining a distance vector between the speech signal of the organization representative and the speech signal of the customer; and predicting a sale success probability score for the captured speech signal portion.
Methods and apparatus for conducting internet protocol telephony communication
Tue, 16 Dec 2014 08:00:00 EST
IP telephony communications are conducted by sending both data produced by a CODEC that represents received spoken audio input, and a textual representation of the spoken audio input. A receiving device utilizes the textual representation of the spoken audio input to help recreate the spoken audio input when a portion of the CODEC data is missing. The textual representation can be generated by a speech-to-text function. Alternatively, the textual representation can be a notation of extracted phonemes.
System and method for unsupervised and active learning for automatic speech recognition
Tue, 16 Dec 2014 08:00:00 EST
A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data.
Wind noise reduction
Tue, 16 Dec 2014 08:00:00 EST
By monitoring the wind noise in a location in which a cellular telephone is operating and by applying noise reduction and/or cancellation protocols at the appropriate time via analog and/or digital signal processing, it is possible to significantly reduce wind noise entering into a communication system.
Method and apparatus for processing audio signal in a mobile communication terminal
Tue, 16 Dec 2014 08:00:00 EST
A method and an apparatus for processing an audio signal in a mobile terminal, in which an audio signal that is received from a counterpart mobile terminal is classified into a voice signal and a noise signal according to respective energy. A frequency of the classified voice signal and an energy of the classified noise signal is controlled according to a predetermined criteria, then the controlled voice signal and the controlled noise signal are coupled and output to a speaker.
Method and apparatus for encoding/decoding speech signal
Tue, 16 Dec 2014 08:00:00 EST
An apparatus and method for encoding/decoding a speech signal which determines a variable bit rate based on reserved bits obtained from a target bit rate, is provided. The variable bit rate is determined based on a source feature of the speech signal and the reserved bits is obtained based on the target bit rate. The apparatus for encoding the speech signal may include a linear predictive (LP) analysis unit/quantization unit to determine an immittance spectral frequencies (ISF) index, a closed loop pitch search unit, a fixed codebook search unit, a gain vector quantization (VQ) unit to determine a gain vector quantization (VQ) index, and a bit rate control unit to control at least two indexes of the ISF index, the pitch index, the code index, and the gain VQ index to be encoded to be variable bit rates based on a source feature of a speech signal and the reserved bits.
Efficient parsing with structured prediction cascades
Tue, 16 Dec 2014 08:00:00 EST
A dependency parsing method can include determining an index set of possible head-modifier dependencies for a sentence. The index set can include inner arcs and outer arcs, inners arcs representing possible dependency between words in the sentence separated by a distance less than or equal to a threshold and outer arcs representing possible dependency between words in the sentence separated by a distance greater than the threshold. The index set can be pruned to include: (i) each specific inner arc when a likelihood that the specific inner arc is appropriate is greater than a first threshold, and (ii) the outer arcs when a likelihood that there exists any possible outer arc that is appropriate is greater than the first threshold. The method can include further pruning the pruned index set based on a second parsing algorithm, and determining a most-likely parse for the sentence from the pruned index set.
Automatic context sensitive language correction and enhancement using an internet corpus
Tue, 16 Dec 2014 08:00:00 EST
A computer-assisted language correction system including spelling correction functionality, misused word correction functionality, grammar correction functionality and vocabulary enhancement functionality utilizing contextual feature-sequence functionality employing an internet corpus.
Speech and language translation of an utterance
Tue, 16 Dec 2014 08:00:00 EST
According to example configurations, a speech-processing system parses an uttered sentence into segments. The speech-processing system translates each of the segments in the uttered sentence into candidate textual expressions (i.e., phrases of one or more words) in a first language. The uttered sentence can include multiple phrases or candidate textual expressions. Additionally, the speech-processing system translates each of the candidate textual expressions into candidate textual phrases in a second language. Based at least in part on a product of confidence values associated with the candidate textual expressions in the first language and confidence values associated with the candidate textual phrases in the second language, the speech-processing system produces a confidence metric for each of the candidate textual phrases in the second language. The confidence metric can indicate degree to which the candidate textual phrase in the second language is an accurate translation of a respective segment in the utterance.
Dynamic video caption translation player
Tue, 16 Dec 2014 08:00:00 EST
A caption translation system is described herein that provides a way to reach a greater world-wide audience when displaying video content by providing dynamically translated captions based on the language the user has selected for their browser. The system provides machine-translated captions to accompany the video content by determining the language the user has selected for their browser or a manual language selection of the user. The system uses the language value to invoke an automated translation application-programming interface that returns translated caption text in the selected language. The system can use one or more well-known caption formats to store the translated captions, so that video playing applications that know how to consume captions can automatically display the translated captions. The video playing application plays back the video file and displays captions in the user's language.
Text prediction
Tue, 16 Dec 2014 08:00:00 EST
One or more techniques and/or systems are provided for suggesting a word and/or phrase to a user based at least upon a prefix of one or more characters that the user has inputted. Words in a database are respectively assigned a unique identifier. Generally, the unique identifiers are assigned sequentially and contiguously, beginning with a first word alphabetically and ending with a last word alphabetically. When a user inputted prefix is received, a range of unique identifiers corresponding to words respectively having a prefix that matches the user inputted prefix are identified. Typically, the range of unique identifiers corresponds to substantially all of the words that begin with the given prefix and does not correspond to words that do not begin with the given prefix. The unique identifiers may then be compared to a probability database to identify which words have a higher probability of being selected by the user.
System and method for processing a voicemail
Tue, 16 Dec 2014 08:00:00 EST
Described is a system and method for processing a voice mail. The method comprises receiving a voice mail, converting the voice mail into a text message using a predefined speech-to-text conversion algorithm and transmitting the text message to a wireless computing device.
Sound source recording apparatus and method adaptable to operating environment
Tue, 16 Dec 2014 08:00:00 EST
Disclosed herein is a sound source recording apparatus and method adaptable to an operating environment, which can record a target sound source at a predetermined level without being affected by characteristics of the sound source or ambient noise. A target sound source is separated from a sound source signal received through an array of microphones and a recording sound pressure level and a gain are estimated using a reference sound pressure level and a reference distance for the target sound source, thereby controlling or adjusting the gain of the microphones.
Method and system for providing an audio representation of a name
Tue, 16 Dec 2014 08:00:00 EST
A system and method for providing an audio representation of a name includes providing a list of a plurality of users of a network and respective presence information regarding each of the plurality of users; receiving a request from an endpoint to receive an audio representation of a name of a particular user of the plurality of users, and providing the audio representation to the endpoint. Moreover, the audio representation of the name at least generally approximates a pronunciation of the name as pronounced by the particular user.

Language Selection

linkedin        twitter

Company Search


Advanced Search   


Services


logo inttranet


logo inttrastats


logo Inttranews


logo Inttrasearch


logo linguists of the year