...A Planet of Partners™

  • Increase font size
  • Default font size
  • Decrease font size

Patents

 
System and method for linking streams of multimedia data to reference material for display
Tue, 19 Aug 2014 08:00:00 EDT
A system for indexing displayed elements that is useful for accessing and understanding new or difficult materials, in which a user highlights unknown words or characters or other displayed elements encountered while viewing displayed materials. In a language learning application, the system displays the meaning of a word in context; and the user may include the word in a personal vocabulary to build a database of words and phrases. In a Japanese language application, one or more Japanese language books are read on an electronic display. Readings (‘yomi’) for all words are readily viewable for any selected word or phrase, as well as an English reference to the selected word or phrase. Extensive notes are provided for difficult phrases and words not normally found in a dictionary. A unique indexing scheme allows word-by-word access to any of several external multi-media references.
System and method for internationalization encoding
Tue, 19 Aug 2014 08:00:00 EDT
A system and computer-implemented method for transforming source code in an original natively encoded format to a locale neutral format, wherein data types and functions in the original format are estimated for compliance with the locale neutral format and an estimation is made as to the amount of code conversions necessary to comply with the locale neutral format. In addition, image files referenced by the source code is analyzed and embedded text extracted for enabling translation during the localization process.
Error concealment for sub-band coded audio signals
Tue, 19 Aug 2014 08:00:00 EDT
A decoder and method of decoding a sub-band coded digital audio signal. The decoder comprises: an input, for receiving sub-band coefficients for a plurality of sub-bands of the audio signal; an error detection unit, adapted to analyze the content of a sequence of coefficients in one of the sub-bands, to derive for each coefficient an indication of whether the coefficient has been corrupted by an error of a predefined type; an error masking unit, adapted to generate from the sequence a modified sequence of coefficients for the sub-band, wherein errors of the predefined type are attenuated; a coefficient combination unit, adapted to combine the received coefficients and the modified coefficients, in dependence upon the indication of error; and a signal reconstruction unit, adapted to reconstruct the audio signal using the combined coefficients.
Coding/decoding of digital audio signals
Tue, 19 Aug 2014 08:00:00 EDT
A method of hierarchical coding of a digital audio frequency input signal into several frequency sub-bands, including a core coding of the input signal according to a first throughput and at least one enhancement coding of higher throughput, of a residual signal. The core coding uses a binary allocation according to an energy criterion. The method includes for the enhancement coding: calculating a frequency-based masking threshold for at least part of the frequency bands processed by the enhancement coding; determining a perceptual importance per frequency sub-band as a function of the masking threshold and as a function of the number of bits allocated for the core coding; binary allocation of bits in the frequency sub-bands processed by the enhancement coding, as a function of the perceptual importance determined; and coding the residual signal according to the bit allocation. Also provided are a decoding method, a coder and a decoder.
Detection and use of acoustic signal quality indicators
Tue, 19 Aug 2014 08:00:00 EDT
A computer-driven device assists a user in self-regulating speech control of the device. The device processes an input signal representing human speech to compute acoustic signal quality indicators indicating conditions likely to be problematic to speech recognition, and advises the user of those conditions.
Use of multiple speech recognition software instances
Tue, 19 Aug 2014 08:00:00 EDT
A wireless communication device is disclosed that accepts recorded audio data from an end-user. The audio data can be in the form of a command requesting user action. Likewise, the audio data can be converted into a text file. The audio data is reduced to a digital file in a format that is supported by the device hardware, such as a .wav, .mp3, .vnf file, or the like. The digital file is sent via secured or unsecured wireless communication to one or more server computers for further processing. In accordance with an important aspect of the invention, the system evaluates the confidence level of the of the speech recognition process. If the confidence level is high, the system automatically builds the application command or creates the text file for transmission to the communication device. Alternatively, if the confidence of the speech recognition is lower, the recorded audio data file is routed to a human transcriber employed by the telecommunications service, who manually reviews the digital voice file and builds the application command or text file. Once the application command is created, it is transmitted to the communication device. As a result of the present invention, speech recognition in the context of a communications devices has been shown to be accurate over 90% of the time.
Coding, modification and synthesis of speech segments
Tue, 19 Aug 2014 08:00:00 EDT
The invention relates to a method for speech signal analysis, modification and synthesis comprising a phase for the location of analysis windows by means of an iterative process for the determination of the phase of the first sinusoidal component and comparison between the phase value of said component and a predetermined value, a phase for the selection of analysis frames corresponding to an allophone and readjustment of the duration and the fundamental frequency according to certain thresholds and a phase for the generation of synthetic speech from synthesis frames taking the information of the closest analysis frame as spectral information of the synthesis frame and taking as many synthesis frames as periods that the synthetic signal has. The method allows a coherent location of the analysis windows within the periods of the signal and the exact generation of the synthesis instants in a manner synchronous with the fundamental period.
Dialogue system and a method for executing a fully mixed initiative dialogue (FMID) interaction between a human and a machine
Tue, 19 Aug 2014 08:00:00 EDT
A method for executing a fully mixed initiative dialogue (FMID) interaction between a human and a machine, a dialogue system for a FMID interaction between a human and a machine and a computer readable data storage medium having stored thereon computer code for instructing a computer processor to execute a method for executing a FMID interaction between a human and a machine are provided. The method includes retrieving a predefined grammar setting out parameters for the interaction; receiving a voice input; analyzing the grammar to dynamically derive one or more semantic combinations based on the parameters; obtaining semantic content by performing voice recognition on the voice input; and assigning the semantic content as fulfilling the one or more semantic combinations.
Semi-supervised source separation using non-negative techniques
Tue, 19 Aug 2014 08:00:00 EDT
Systems and methods for semi-supervised source separation using non-negative techniques are described. In some embodiments, various techniques disclosed herein may enable the separation of signals present within a mixture, where one or more of the signals may be emitted by one or more different sources. In audio-related applications, for instance, a signal mixture may include speech (e.g., from a human speaker) and noise (e.g., background noise). In some cases, speech may be separated from noise using a speech model developed from training data. A noise model may be created, for example, during the separation process (e.g., “on-the-fly”) and in the absence of corresponding training data.
System and method for combining speech recognition outputs from a plurality of domain-specific speech recognizers via machine learning
Tue, 19 Aug 2014 08:00:00 EDT
Disclosed herein are systems, methods and non-transitory computer-readable media for performing speech recognition across different applications or environments without model customization or prior knowledge of the domain of the received speech. The disclosure includes recognizing received speech with a collection of domain-specific speech recognizers, determining a speech recognition confidence for each of the speech recognition outputs, selecting speech recognition candidates based on a respective speech recognition confidence for each speech recognition output, and combining selected speech recognition candidates to generate text based on the combination.
Segment-based speaker verification using dynamically generated phrases
Tue, 19 Aug 2014 08:00:00 EDT
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying an identity of a user. The methods, systems, and apparatus include actions of receiving a request for a verification phrase for verifying an identity of a user. Additional actions include, in response to receiving the request for the verification phrase for verifying the identity of the user, identifying subwords to be included in the verification phrase and in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase. Further actions include providing the verification phrase as a response to the request for the verification phrase for verifying the identity of the user.
Dynamic pass phrase security system (DPSS)
Tue, 19 Aug 2014 08:00:00 EDT
There is disclosed an n-dimensional biometric security system as well as a method of identifying and validating a user through the use of a automated random one-time passphrase generation. The use of tailored templates to generate one-time phase phrase text as well as the use of update subscriptions of templates ensures a high level of security. A verification session preferably uses short, text-independent one-time pass phrases and secure audio tokens with master audio generated from an internal text-to-speech security processor. An automated enrollment process may be implemented in an ongoing and seamless fashion with a user's interactions with the system. Various calibration and tuning techniques are also disclosed.
Speaker recognition in a multi-speaker environment and comparison of several voice prints to many
Tue, 19 Aug 2014 08:00:00 EDT
One-to-many comparisons of callers' voice prints with known voice prints to identify any matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract at least a portion of the customer's voice to create a customer voice print, and it formats the segmented voice print for network transmission to a server. The server compares the customer's voice print with multiple known voice prints to determine any matches, meaning that the customer's voice print and one of the known voice prints are likely from the same person. The identification of any matches can be used for a variety of purposes, such as determining whether to authorize a transaction requested by the customer.
Signal processing apparatus capable of learning a voice command which is unsuccessfully recognized and method of recognizing a voice command thereof
Tue, 19 Aug 2014 08:00:00 EDT
Provided are an apparatus and method for recognizing voice commands, the apparatus including: a voice command recognition unit which recognizes an input voice command; a voice command recognition learning unit which learns a recognition-targeted voice command; and a controller which controls the voice command recognition unit to recognize the recognition-targeted voice command from an input voice command, controls the voice command recognition learning unit to learn the input voice command if the voice command recognition is unsuccessful, and performs a particular operation corresponding to the recognized voice command if the voice command recognition is successful.
Speech recognition repair using contextual information
Tue, 19 Aug 2014 08:00:00 EDT
A speech control system that can recognize a spoken command and associated words (such as “call mom at home”) and can cause a selected application (such as a telephone dialer) to execute the command to cause a data processing system, such as a smartphone, to perform an operation based on the command (such as look up mom's phone number at home and dial it to establish a telephone call). The speech control system can use a set of interpreters to repair recognized text from a speech recognition system, and results from the set can be merged into a final repaired transcription which is provided to the selected application.
System and method for adapting automatic speech recognition pronunciation by acoustic model restructuring
Tue, 19 Aug 2014 08:00:00 EDT
Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
Method of and system for improving accuracy in a speech recognition system
Tue, 19 Aug 2014 08:00:00 EDT
A method for transcribing an audio response includes: A. constructing an application including a plurality of queries and a set of expected responses for each query, the set including a plurality of expected responses to each query in a textual form; B. posing each of the queries to a respondent with a querying device; C. receiving an audio response to each query from the respondent; D. performing a speech recognition function on each audio response with an automatic speech recognition device to transcribe each audio response to a textual response to each query; E. recording each audio response with a recording device; and F. comparing, with the automatic speech recognition device, each textual response to the set of expected responses for each corresponding query to determine if each textual response corresponds to any of the expected responses in the set of expected responses for the corresponding query.
Voice activity detector, voice activity detection program, and parameter adjusting method
Tue, 19 Aug 2014 08:00:00 EDT
Judgment result deriving means 74 makes a judgment between active voice and non-active voice every unit time for a time series of voice data in which the number of active voice segments and the number of non-active voice segments are already known as a number of the labeled active voice segment and a number of the labeled non-active voice segment and shapes active voice segments and non-active voice segments as the result of the judgment by comparing the length of each segment during which the voice data is consecutively judged to correspond to active voice by the judgment or the length of each segment during which the voice data is consecutively judged to correspond to non-active voice by the judgment with a duration threshold. Segments number calculating means 75 calculates the number of active voice segments and the number of non-active voice segments. Duration threshold updating means 76 updates the duration threshold so that the difference between the calculated number of active voice segments and the number of the labeled active voice segments decreases or the difference between the calculated number of non-active voice segments and the number of the labeled non-active voice segments decreases.
System, method and program for speech processing
Tue, 19 Aug 2014 08:00:00 EDT
The present invention relates to a system, method and program for speech recognition. In an embodiment of the invention a method for processing a speech signal consists of receiving a power spectrum of a speech signal and generating a log power spectrum signal of the power spectrum. The method further consists of performing discrete cosine transformation on the log power spectrum signal and cutting off cepstrum upper and lower terms of the discrete cosine transformed signal. The method further consists of performing inverse discrete cosine transformation on the signal from which the cepstrum upper and lower terms are cut off. The method further consists of converting the inverse discrete cosine transformed signal so as to bring the signal back to a power spectrum domain and filtering the power spectrum of the speech signal by using, as a filter, the signal which is brought back to the power spectrum domain.
Character-based automated shot summarization
Tue, 19 Aug 2014 08:00:00 EDT
Methods, devices, systems and tools are presented that allow the summarization of text, audio, and audiovisual presentations, such as movies, into less lengthy forms. High-content media files are shortened in a manner that preserves important details, by splitting the files into segments, rating the segments, and reassembling preferred segments into a final abridged piece. Summarization of media can be customized by user selection of criteria, and opens new possibilities for delivering entertainment, news, and information in the form of dense, information-rich content that can be viewed by means of broadcast or cable distribution, “on-demand” distribution, internet and cell phone digital video streaming, or can be downloaded onto an iPod™ and other portable video playback devices.
Environment recognition of audio input
Tue, 19 Aug 2014 08:00:00 EDT
The present disclosure introduces a new technique for environmental recognition of audio input using feature selection. In one embodiment, audio data may be identified using feature selection. A plurality of audio descriptors may be ranked by calculating a Fisher's discriminant ratio for each audio descriptor. Next, a configurable number of highest ranking audio descriptors based on the Fisher's discriminant ratio of each audio descriptor are selected to obtain a selected feature set. The selected feature set is then applied to audio data. Other embodiments are also described.
Methods and apparatus for suppressing ambient noise using multiple audio signals
Tue, 19 Aug 2014 08:00:00 EDT
A method for suppressing ambient noise using multiple audio signals may include providing at least two audio signals captured by at least two electro-acoustic transducers. The at least two audio signals may include desired audio and ambient noise. The method may also include performing beamforming on the at least two audio signals in order to obtain a desired audio reference signal that is separate from a noise reference signal.
Apparatus and method for modifying an input audio signal
Tue, 19 Aug 2014 08:00:00 EDT
An apparatus for modifying an input audio signal has an excitation determiner, a storage device and a signal modifier. The excitation determiner determines a value of an excitation parameter of a subband of a plurality of subbands of the input audio signal based on an energy content of the subband. Further, the storage device stores a lookup table containing a plurality of spectral weighting factors. A spectral weighting factor of the plurality of spectral weighting factors is associated to a predefined value of the excitation parameter and a subband of the plurality of subbands. The storage device provides a spectral weighting factor corresponding to the determined value of the excitation parameter and corresponding to the subband, the value of the excitation parameter is determined for. Further, the signal modifier modifies a content of the subband of the audio signal, the value of the excitation parameter is determined for, based on the provided spectral weighting factor to provide a modified subband.
Method, apparatus and system for linear prediction coding analysis
Tue, 19 Aug 2014 08:00:00 EDT
The present invention relates to communication technologies and discloses a method, an apparatus and a system for Linear Prediction Coding (LPC) analysis to improve LPC prediction performance and simplify analysis operation. The method includes: obtaining signal feature information of at least one sample point of input signals; comparing and analyzing the signal feature information to obtain an analysis result; selecting a window function according to the analysis result to perform adaptive windowing for the input signals and obtain windowed signals; and processing the windowed signals to obtain an LPC coefficient for linear prediction. The embodiments of the present invention are applicable to LPC.
Speech decoding and encoding apparatus for lost frame concealment using predetermined number of waveform samples peripheral to the lost frame
Tue, 19 Aug 2014 08:00:00 EDT
An audio decoding device capable of suppressing an information amount for a lost flame compensation process and encoding efficiency is provided. A decoded sound source generator generates a lost frame's CELP decoded sound source signal. A pitch pulse information decoder CELP decodes a pitch pulse position information and a pitch pulse amplitude information. A pitch pulse waveform learner learns a pitch pulse learning waveform in a past frame in advance from the lost frame. A convolution adjuster amplitude-adjusts the pitch pulse learning waveform according to the pitch pulse amplitude information by considering a predetermined number of waveforms peripheral to a peak position of the lost frame's CELP decoded excitation signal, and convolutes a pitch pulse waveform into a time axis which has been amplitude-adjusted according to the pitch pulse position information. A sound source signal corrector adds or replaces the pitch pulse waveform convoluted into the time axis to the lost flame decoded sound source signal.
Encoder, decoder and methods for encoding and decoding data segments representing a time-domain data stream
Tue, 19 Aug 2014 08:00:00 EDT
An apparatus for decoding data segments representing a time-domain data stream, a data segment being encoded in the time domain or in the frequency domain, a data segment being encoded in the frequency domain having successive blocks of data representing successive and overlapping blocks of time-domain data samples. The apparatus includes a time-domain decoder for decoding a data segment being encoded in the time domain and a processor for processing the data segment being encoded in the frequency domain and output data of the time-domain decoder to obtain overlapping time-domain data blocks. The apparatus further includes an overlap/add-combiner for combining the overlapping time-domain data blocks to obtain a decoded data segment of the time-domain data stream.
Method and system for downloading additional search results into electronic dictionaries
Tue, 19 Aug 2014 08:00:00 EDT
In one embodiment, the invention provides a method for a system to provide information based on a query, the method comprising: performing a first search of at least one first source for information responsive to the query; providing a result of the search to a user; searching documents using at least a part of the result of the search; providing the user with at least one example of usage of the result of the search obtained from the searching of stored documents; based on user input, performing a second search of at least one second source for information responsive to the query; and providing a result of said second search to the user.
Multi-language relevance-based indexing and search
Tue, 19 Aug 2014 08:00:00 EDT
Indexing and querying in multiple languages is accomplished using an ordered chain of filters and/or other such components. When receiving information to be indexed or for a query, the information can be tokenized and typed based at least in part on the language of each token. The character types can be adjusted if appropriate for the languages, and the tokens can be further segmented using a dictionary for the respective language types. Once appropriate tokens are determined, relevant synonyms in each appropriate language can be determined and typed accordingly. If necessary the case of the tokens and synonyms can be adjusted and further segmented based on punctuation. The terms and synonyms then can be used as part of the index or as part of the search query to include other terms or phrases based on relevance to the original information.
Techniques for inserting diacritical marks to text input via a user device
Tue, 19 Aug 2014 08:00:00 EDT
A computer-implemented method for assisting a user to input Vietnamese text to a user device lacking a subset of characters in a Vietnamese alphabet includes receiving a character input by a user, determining three words previously input by the user, the three words having already had diacritical marks inserted, transmitting the three words and the character to a server via a network, receiving first and second information corresponding to the character from the server via the network, the first and second information generated at the server based on a context of the three words, the context determined at the server using a language model, the first and second information indicating whether the character requires a diacritical mark and a specific diacritical mark, respectively, generating a modified character comprising a character in the Vietnamese alphabet based on the character and the first and second information, and displaying the modified character.
Linguistically-adapted structural query annotation
Tue, 19 Aug 2014 08:00:00 EDT
A system and method for natural language processing of queries are provided. A lexicon includes text elements that are recognized as being a proper noun when capitalized. A natural language query includes a sequence of text elements including words. The query is processed. The processing includes a preprocessing step, in which part of speech features are assigned to the text elements in the query. This includes identifying, from a lexicon, a text element in the query which starts with a lowercase letter and assigning recapitalization information to the text element in the query, based on the lexicon. This information includes a part of speech feature of the capitalized form of the text element. Then parts of speech for the text elements in the query are disambiguated, which includes applying rules for recapitalizing text elements based on the recapitalization information.

Language Selection

linkedin        twitter

Company Search


Advanced Search   


Services


logo inttranet


logo inttrastats


logo Inttranews


logo Inttrasearch


logo linguists of the year