...A Planet of Partners™

  • Increase font size
  • Default font size
  • Decrease font size

Patents

 
Adaptive controller for a configurable audio coding system
Tue, 26 Aug 2014 08:00:00 EDT
An adaptive controller for a configurable audio coding system comprising a fuzzy logic controller modified to use reinforcement learning to create an intelligent control system. With no knowledge of the external system into which it is placed the audio coding system, under the control of the adaptive controller, is capable of adapting its coding configuration to achieve user set performance goals.
Apparatus, system, and method for natural language processing
Tue, 26 Aug 2014 08:00:00 EDT
Various embodiments are described for searching and retrieving documents based on a natural language input. A computer-implemented natural language processor electronically receives a natural language input phrase from an interface device. The natural language processor attributes a concept to the phrase with the natural language processor. The natural language processor searches a database for a set of documents to identify one or more documents associated with the attributed concept to be included in a response to the natural language input phrase. The natural language processor maintains the concepts during an interactive session with the natural language processor. The natural language processor resolves ambiguous input patterns in the natural language input phrase with the natural language processor. The natural language processor includes a processor, a memory and/or storage component, and an input/output device.
Thematic clustering
Tue, 26 Aug 2014 08:00:00 EDT
A data set is clustered into one or more initial clusters using a first term space. Initial themes for the initial clusters are determined. The first term space is reduced to create a reduced term space. At least a portion of the data set is reclustered into one or more baby clusters using the reduced term space. One or more singletons are reassigned to form one or more renovated clusters from the baby clusters. A renovated theme is determined for at least some of the renovated clusters. One or more of the renovated clusters and their respective themes are provided as output.
Audio encoding device, method, and program which controls the number of time groups in a frame using three successive time group energies
Tue, 26 Aug 2014 08:00:00 EDT
[PROBLEMS] To provide a high-quality audio signal encoding technique by controlling the number of time/frequency groups in a frame. [MEANS FOR SOLVING PROBLEMS] An audio encoding device includes: a time group boundary candidate position extraction unit (101) for analyzing a sub-band signal (2001) obtained by frequency-changing an input signal and calculating a candidate position of the time group boundary by comparing change in energy of three successive time groups; a time group quantity generation unit (103) for outputting a maximum value of the time group quantity; a time group selection unit (102) for generating a time group quantity not greater than the maximum time group quantity by using the candidate position; and a frequency group generation unit (104) for generating a frequency group by using the generated time group information. The device generates a time/frequency group accurately reflecting a change of the input signal and performs operations while controlling the number of time/frequency groups in the frame.
Network/peer assisted speech coding
Tue, 26 Aug 2014 08:00:00 EDT
A communications network is used to transfer user attribute information about participants in a communication session to their respective communication terminals for storage and use thereon to configure a speech codec to operate in a speaker-dependent manner, thereby improving speech coding efficiency. In a network-assisted model, the user attribute information is stored on the communications network and selectively transmitted to the communication terminals while in a peer-assisted model, the user attribute information is derived by and transferred between communication terminals.
Voice recognition device
Tue, 26 Aug 2014 08:00:00 EDT
A voice recognition device includes a voice input unit 11 for inputting a voice of an uttered button name to convert the voice into an electric signal, a voice recognition processing unit 12 for performing a voice recognition process according to a sound signal sent thereto, as the electric signal, from the voice input unit, a button candidate detecting unit 13 for detecting, as a button candidate, a button having a button name which partially matches a voice recognition result acquired by the voice recognition processing unit, a display control unit 15 for, when a plurality of candidate buttons are detected by the button candidate detecting unit, producing a screen showing a state in which at least one of the plurality of button candidates is selected, and a display unit 16 for displaying the screen produced by the display control unit.
Method and system for packetised content streaming optimisation
Tue, 26 Aug 2014 08:00:00 EDT
A method of determining the speech content of a packet carrying speech encoded data missing from speech segment communicated by in a packetized data stream communicated using at least one VOIP link between a server platform and a client platform, the method comprising at the client platform: receiving a plurality of packets carrying speech encoded data forming said packetized data stream; processing each received packet to determine a unique message segment identifier associated with a speech segment of the received packet; processing each received packet to determine if it contains another unique message segment identifier associated with a previously received packet carrying encoded speech data; determining if the unique message segment identifier for the received packet exists in storage means provided on the client platform, and if not, storing the received packet in association with its unique message segment identifier; processing each received packet to determine a sequence identifier; checking if the sequence identifier is contiguous in sequence with a previously received packet stored locally on said client platform, and if not, determining the speech content of one or more missing packet in the sequence sent by the server platform to the client platform by retrieving a packet from said storage means having the same unique message segment identifier as the missing packet.
Accelerometer-based control of wearable audio-reporting watches
Tue, 26 Aug 2014 08:00:00 EDT
Accelerometer-based detection for controlling audio-reporting watches, resulting in button-free operation. A wristwatch can use an accelerometer to detect the orientation and/or movement of a user's wrist and subsequently activate audio time reporting, without requiring the user to find and lush a small button. For example, a talking wristwatch can use this method to automatically report the time whenever a user moves or orients his or her wrist to a natural position for listening. A position such as that in close proximity to the ear can additionally facilitate private listening without disturbing others. Furthermore, the wristwatch can report time using personalized audio time components that the user has previously recorded, so that reporting is in a custom voice or language. In such applications, accelerometer-based control of audio-reporting watches offers significant advantages over conventional means of control, particularly in terms of ease of use and durability.
Methods and system for grammar fitness evaluation as speech recognition error predictor
Tue, 26 Aug 2014 08:00:00 EDT
A plurality of statements are received from within a grammar structure. Each of the statements is formed by a number of word sets. A number of alignment regions across the statements are identified by aligning the statements on a word set basis. Each aligned word set represents an alignment region. A number of potential confusion zones are identified across the statements. Each potential confusion zone is defined by words from two or more of the statements at corresponding positions outside the alignment regions. For each of the identified potential confusion zones, phonetic pronunciations of the words within the potential confusion zone are analyzed to determine a measure of confusion probability between the words when audibly processed by a speech recognition system during the computing event. An identity of the potential confusion zones across the statements and their corresponding measure of confusion probability are reported to facilitate grammar structure improvement.
Text input method which assigns initial phoneme, medial phoneme, and final phoneme to shape of watch
Tue, 26 Aug 2014 08:00:00 EDT
The present invention relates to a text display method and a text input method. The text display method creates a syllable by combining one or more consonants and vowels based on a combination rule of initial, medial and final phonemes, wherein the initial phonemes are assigned with respective consonants based upon the length and direction of the hour hand, the medial phonemes are assigned with respective vowels based upon the length and direction of the minute hand, and the final phonemes are shaped by the length of the second hand and assigned with consonants similar to the initial phonemes but with a shorter length. As such, not only can people quickly learn how to read and write, they can also input text easily.
Method and apparatus for performing voice activity detection
Tue, 26 Aug 2014 08:00:00 EDT
This application relates to a voice activity detection (VAD) apparatus configured to provide a voice activity detection decision for an input audio signal. The VAD apparatus includes a state detector and a voice activity calculator. The state detector is configured to determine, based on the input audio signal, a current working state of the VAD apparatus among at least two different working states. Each of the at least two different working states is associated with a corresponding working state parameter decision set which includes at least one voice activity decision parameter. The voice activity calculator is configured to calculate a voice activity detection parameter value for the at least one voice activity decision parameter of the working state parameter decision set associated with the current working state, and to provide the voice activity detection decision by comparing the calculated voice activity detection parameter value with a threshold.
Speaker verification in a health monitoring system
Tue, 26 Aug 2014 08:00:00 EDT
A method for verifying that a person is registered to use a telemedical device includes identifying an unprompted trigger phrase in words spoken by a person and received by the telemedical device. The telemedical device prompts the person to state a name of a registered user and optionally prompts the person to state health tips for the person. The telemedical device verifies that the person is the registered user using utterance data generated from the unprompted trigger phrase, name of the registered user, and health tips.
Methods and apparatus for generating, updating and distributing speech recognition models
Tue, 26 Aug 2014 08:00:00 EDT
Techniques for generating, distributing, and using speech recognition models are described. A shared speech processing facility is used to support speech recognition for a wide variety of devices with limited capabilities including business computer systems, personal data assistants, etc., which are coupled to the speech processing facility via a communications channel, e.g., the Internet. Devices with audio capture capability record and transmit to the speech processing facility, via the Internet, digitized speech and receive speech processing services, e.g., speech recognition model generation and/or speech recognition services, in response. The Internet is used to return speech recognition models and/or information identifying recognized words or phrases. Thus, the speech processing facility can be used to provide speech recognition capabilities to devices without such capabilities and/or to augment a device's speech processing capability. Voice dialing, telephone control and/or other services are provided by the speech processing facility in response to speech recognition results.
Unsupervised and active learning in automatic speech recognition for call classification
Tue, 26 Aug 2014 08:00:00 EDT
Utterance data that includes at least a small amount of manually transcribed data is provided. Automatic speech recognition is performed on ones of the utterance data not having a corresponding manual transcription to produce automatically transcribed utterances. A model is trained using all of the manually transcribed data and the automatically transcribed utterances. A predetermined number of utterances not having a corresponding manual transcription are intelligently selected and manually transcribed. Ones of the automatically transcribed data as well as ones having a corresponding manual transcription are labeled. In another aspect of the invention, audio data is mined from at least one source, and a language model is trained for call classification from the mined audio data to produce a language model.
Large vocabulary binary speech recognition
Tue, 26 Aug 2014 08:00:00 EDT
This invention describes methods for implementing human speech recognition. The methods described here are of using sub-events that are sounds between spaces (typically a fully spoken word) that is then compared with a library of sub-events. All sub-events are packaged with it's own speech recognition function as individual units. This invention illustrates how this model can be used as a Large Vocabulary Speech Recognition System.
Speech processing apparatus and speech processing method
Tue, 26 Aug 2014 08:00:00 EDT
A signal portion is extracted per frame having a specific duration from an input signal, thus generating a per-frame input signal. The per-frame input signal in the time domain is converted into a per-frame input signal in the frequency domain, thereby generating a spectral pattern of spectra. Peak spectra having peaks are detected in the spectral pattern. A harmonic spectrum is determined, in the peak spectra, having a harmonic structure showing a relationship between a fundamental pitch and a harmonic overtone.
Sound processing apparatus, sound processing method and program
Tue, 26 Aug 2014 08:00:00 EDT
A sound processing apparatus is provided. The apparatus includes an input correction unit that corrects a difference between characteristics of a first input sound input from a first input apparatus and characteristics of a second input sound input from a second input apparatus. The apparatus further includes a sound separation unit that separates the first input sound corrected by the input correction unit and the second input sound into a plurality of sounds. The apparatus further includes a sound type estimation unit that estimates sound types of the plurality of sounds. The apparatus further includes a mixing ratio calculation unit that calculates a mixing ratio of each sound in accordance with the estimated sound type. The apparatus further includes a sound mixing unit that mixes the plurality of sounds separated by the sound separation unit in the mixing ratio calculated by the mixing ratio calculation unit.
Global speech user interface
Tue, 26 Aug 2014 08:00:00 EDT
A global speech user interface (GSUI) comprises an input system to receive a user's spoken command, a feedback system along with a set of feedback overlays to give the user information on the progress of his spoken requests, a set of visual cues on the television screen to help the user understand what he can say, a help system, and a model for navigation among applications. The interface is extensible to make it easy to add new applications.
Character-based automated text summarization
Tue, 26 Aug 2014 08:00:00 EDT
Methods, devices, systems and tools are presented that allow the summarization of text, audio, and audiovisual presentations, such as movies, into less lengthy forms. High-content media files are shortened in a manner that preserves important details, by splitting the files into segments, rating the segments, and reassembling preferred segments into a final abridged piece. Summarization of media can be customized by user selection of criteria, and opens new possibilities for delivering entertainment, news, and information in the form of dense, information-rich content that can be viewed by means of broadcast or cable distribution, “on-demand” distribution, internet and cell phone digital video streaming, or can be downloaded onto an iPod™ and other portable video playback devices.
Real-time data pattern analysis system and method of operation thereof
Tue, 26 Aug 2014 08:00:00 EDT
A method for real-time data-pattern analysis. The method includes receiving and queuing at least one data-pattern analysis request by a data-pattern analysis unit controller. At least one data stream portion is also received and stored by the data-pattern analysis unit controller, each data stream portion corresponding to a received data-pattern analysis request. Next, a received data-pattern analysis request is selected by the data-pattern analysis unit controller along with a corresponding data stream portion. A data-pattern analysis is performed based on the selected data-pattern analysis request and the corresponding data stream portion, wherein the data-pattern analysis is performed by one of a plurality of data-pattern analysis units.
Dialogue speech recognition system, dialogue speech recognition method, and recording medium for storing dialogue speech recognition program
Tue, 26 Aug 2014 08:00:00 EDT
Disclosed is a dialogue speech recognition system that can expand the scope of applications by employing a universal dialogue structure as the condition for speech recognition of dialogue speech between persons. An acoustic likelihood computation means (701) provides a likelihood that a speech signal input from a given phoneme sequence will occur. A linguistic likelihood computation means (702) provides a likelihood that a given word sequence will occur. A maximum likelihood candidate search means (703) uses the likelihoods provided by the acoustic likelihood computation means and the linguistic likelihood computation means to provide a word sequence with the maximum likelihood of occurring from a speech signal. Further, the linguistic likelihood computation means (702) provides different linguistic likelihoods when the speaker who generated the acoustic signal input to the speech recognition means does and does not have the turn to speak.
Off-axis audio suppressions in an automobile cabin
Tue, 26 Aug 2014 08:00:00 EDT
The suppression of off-axis audio in an audio environment is provided. Off-axis audio may be considered audio that does not originate from a region of interest. The off-axis audio is suppressed by comparing a phase difference between signals from two microphones to a target slope of the phase difference between signals originating from the region of interest. The target slope can be adapted to allow the region of interest to move with the location of a human speaker such as a driver.
Method of indicating presence of transient noise in a call and apparatus thereof
Tue, 26 Aug 2014 08:00:00 EDT
A method and an apparatus for indicating presence of a transient noise in a call are provided. The method comprises the steps of determining activity at an endpoint of the call by monitoring presence of a signal input from the endpoint into the call and monitoring presence of a potential source of transient noise at the endpoint. Further, based on the activity determination and the monitoring of the presence of a potential source of transient noise, a signal representative of the presence of a transient noise in the call is sent. The present invention is advantageous in that it enables improvement of the quality of the call.
Method and system for determining a perceived quality of an audio system
Tue, 26 Aug 2014 08:00:00 EDT
The invention relates to a method for determining a quality indicator representing a perceived quality of an output signal of an audio system with respect to a reference signal. The reference signal and the output signal are processed and compared. The processing includes dividing the reference signal and the output signal into mutually corresponding time frames, and includes scaling the intensity of the reference signal towards a fixed intensity level, and then performing measurements on time frames within the scaled reference signal for determining reference signal time frame characteristics. Further on, the loudness of the output signal is scaled towards a fixed loudness level in the perceptual loudness domain. Finally, the loudness of the reference signal is scaled from a loudness level corresponding to the output signal related intensity level towards a loudness level related to the loudness level of the scaled output signal in the perceptual loudness domain.
Dual-band speech encoding
Tue, 26 Aug 2014 08:00:00 EDT
This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.
Encoder, decoder and methods for encoding and decoding data segments representing a time-domain data stream
Tue, 26 Aug 2014 08:00:00 EDT
An apparatus for decoding data segments representing a time-domain data stream, a data segment being encoded in the time domain or in the frequency domain, a data segment being encoded in the frequency domain having successive blocks of data representing successive and overlapping blocks of time-domain data samples. The apparatus includes a time-domain decoder for decoding a data segment being encoded in the time domain and a processor for processing the data segment being encoded in the frequency domain and output data of the time-domain decoder to obtain overlapping time-domain data blocks. The apparatus further includes an overlap/add-combiner for combining the overlapping time-domain data blocks to obtain a decoded data segment of the time-domain data stream.
Method and system for using natural language techniques to process inputs
Tue, 26 Aug 2014 08:00:00 EDT
Systems and methods are provided for utilizing natural language to process queries. The method for analyzing a linguistic input may include receiving the linguistic input, the linguistic input including at least one word, accessing prestored language data for a language corresponding to the linguistic input, converting the linguistic input into a text-possibility representations based on the received language data, determining a meaning of the text possibility based on the prestored language data, generating at least one semantic structure corresponding to the determined meaning, and determining an action to perform based on the generated at least one semantic structure. The prestored language data may be converted from multiple formats into one or more formats that can be algorithmically processed by a computational device.
Environment sensitive predictive text entry
Tue, 26 Aug 2014 08:00:00 EDT
Environmental factors may be used in a predictive text system provided in a device. The device may receive one or more characters entered by a user and determine, based on the one or more characters, words that are predicted to be words being entered by the user of the device, where the words are determined using grammar-based predictive techniques. The device may determine confidence scores, corresponding to the words. The device may refine the scores based on environmental data that includes data that describes an environment associated with the user of the device. The device may select, based on the refined plurality of scores, a subset of the plurality of words and output the subset of the plurality of words.
System and method of extracting clauses for spoken language understanding
Tue, 26 Aug 2014 08:00:00 EDT
A clausifier and method of extracting clauses for spoken language understanding are disclosed. The method relates to generating a set of clauses from speech utterance text and comprises inserting at least one boundary tag in speech utterance text related to sentence boundaries, inserting at least one edit tag indicating a portion of the speech utterance text to remove, and inserting at least one conjunction tag within the speech utterance text. The result is a set of clauses that may be identified within the speech utterance text according to the inserted at least one boundary tag, at least one edit tag and at least one conjunction tag. The disclosed clausifier comprises a sentence boundary classifier, an edit detector classifier, and a conjunction detector classifier. The clausifier may comprise a single classifier or a plurality of classifiers to perform the steps of identifying sentence boundaries, editing text, and identifying conjunctions within the text.
Apparatus and method for constructing verbal phrase translation pattern using bilingual parallel corpus
Tue, 26 Aug 2014 08:00:00 EDT
An apparatus and a method for constructing a verb phrase translation pattern using a bilingual parallel corpus. Each of the apparatus or method recognizes a predicate and an argument using a syntax analysis result and a word alignment result of a source sentence from a plurality of bilingual parallel corpus, extracts a translation pattern candidate and an occurrence frequency using the recognized predicate and argument, and then generates a basic verb phrase translation pattern by verifying the translation pattern candidate, and generalizes the generated basic verb phrase translation pattern to generate a general verb phrase pattern so as to be applied to various language pairs and minimize an error in the verb phrase translation pattern and determine an appropriate generalization level using a co-occurrence frequency of a predicate and an argument of the verb phrase translation pattern and a translation probability of the predicate.

Language Selection

linkedin        twitter

Company Search


Advanced Search   


Services


logo inttranet


logo inttrastats


logo Inttranews


logo Inttrasearch


logo linguists of the year