...A Planet of Partners™

  • Increase font size
  • Default font size
  • Decrease font size

Patents

 
Caption and/or metadata synchronization for replay of previously or simultaneously recorded live programs
Tue, 22 Apr 2014 08:00:00 EDT
A synchronization process between captioning data and/or corresponding metatags and the associated media file parses the media file, correlates the caption information and/or metatags with segments of the media file, and provides a capability for textual search and selection of particular segments. A time-synchronized version of the captions is created that is synchronized to the moment that the speech is uttered in the recorded media. The caption data is leveraged to enable search engines to index not merely the title of a video, but the entirety of what was said during the video as well as any associated metatags relating to contents of the video. Further, because the entire media file is indexed, a search can request a particular scene or occurrence within the event recorded by the media file, and the exact moment within the media relevant to the search can be accessed and played for the requester.
Detection of a user's visual impairment based on user inputs or device settings, and presentation of a website-related data for sighted or visually-impaired users based on those inputs or settings
Tue, 22 Apr 2014 08:00:00 EDT
A data processing device is connected to a web data providing unit that provides web data via a network. The web data has at least one of first web data and second web data corresponding to the first web data. A screen reader can be installed on the data processing device. The displaying unit displays the web data. The screen reader voices the first web data displayed on the displaying unit and fails to voice the second web data displayed on the displaying unit. The acquiring unit acquires the web data from the web data providing unit. The determining unit determines, based on visually impaired information indicating that the user is a visually impaired, whether or not the acquiring unit should acquire the first web data from the web data providing unit even if the user instructs the acquiring unit to acquire the second web data from the web data providing unit.
Customized speech generation
Tue, 22 Apr 2014 08:00:00 EDT
Various approaches enable automatic communication generation based on patterned behavior in a particular context. For example, a computing device can monitor behavior of a user to determine patterns of communication behavior in certain situations. In response to detecting multiple occurrences of the certain situation, a computing device can prompt a user to perform an action corresponding to the pattern of behavior. In some embodiments, a set of speech models corresponding to a type of contact is generated. The speech models include language consistent with patterns of speech between a user and the type of contact. Based on context and on the contact, a message using language consistent with past communications between the user and contact is generated from a speech model associated with the type of contact.
Systems and methods for searching using queries written in a different character-set and/or language from the target pages
Tue, 22 Apr 2014 08:00:00 EDT
Methods and apparatus consistent with the invention allow a user to submit an ambiguous search query and to receive relevant search results. Queries can be expressed using character sets and/or languages that are different from the character set and/or language of at least some of the data that is to be searched. A translation between these character sets and/or languages can be performed by examining the use of terms in aligned text. Probabilities can be associated with each possible translation. Refinements can be made to these probabilities by examining user interactions with the search results.
Mining phrases for association with a user
Tue, 22 Apr 2014 08:00:00 EDT
Techniques for generating and providing phrases are described herein. These techniques may include analyzing one or more sources to generate a first corpus of phrases, each of the phrases for use as an identifier and/or for association with a user for executing a transaction. Once a first corpus of phrases has been generated, these phrases may be filtered to define a second corpus of phrases. Phrases of this second corpus may then be suggested to one or more users. In some instances, the phrases suggested to a particular user are personalized to the user based on information previously known about the user or based on information provided by the user.
Generating and suggesting phrases
Tue, 22 Apr 2014 08:00:00 EDT
Techniques for generating and providing phrases are described herein. These techniques may include analyzing one or more sources to generate a first corpus of phrases, each of the phrases for use as an identifier and/or for association with a user for executing a transaction. Once a first corpus of phrases has been generated, these phrases may be filtered to define a second corpus of phrases. Phrases of this second corpus may then be suggested to one or more users. In some instances, the phrases suggested to a particular user are personalized to the user based on information previously known about the user or based on information provided by the user.
Low-complexity spectral analysis/synthesis using selectable time resolution
Tue, 22 Apr 2014 08:00:00 EDT
The signal processing is based on the concept of using a time-domain aliased frame as a basis for time segmentation and spectral analysis, performing segmentation in time based on the time-domain aliased frame and performing spectral analysis based on the resulting time segments. The time resolution of the overall “segmented” time-to-frequency transform can thus be changed by simply adapting the time segmentation to obtain a suitable number of time segments based on which spectral analysis is applied. The overall set of spectral coefficients, obtained for all the segments, provides a selectable time-frequency tiling of the original signal frame.
Audio encoder, audio decoder, method for encoding an audio information, method for decoding an audio information and computer program using a detection of a group of previously-decoded spectral values
Tue, 22 Apr 2014 08:00:00 EDT
An audio decoder for providing a decoded audio information includes a arithmetic decoder for providing a plurality of decoded spectral values on the basis of an arithmetically-encoded representation of the spectral values and a frequency-domain-to-time-domain converter for providing a time-domain audio representation using the decoded spectral values. The arithmetic decoder is configured to select a mapping rule describing a mapping of a code value onto a symbol code in dependence on a context state. The arithmetic decoder is configured to determine or modify the current context state in dependence on a plurality of previously-decoded spectral values. The arithmetic decoder is configured to detect a group of a plurality of previously-decoded spectral values, which fulfill, individually or taken together, a predetermined condition regarding their magnitudes, and to determine the current context state in dependence on a result of the detection. An audio encoder uses similar principles.
Method and a decoder for attenuation of signal regions reconstructed with low accuracy
Tue, 22 Apr 2014 08:00:00 EDT
The embodiments of the present invention improves conventional attenuation schemes by replacing constant attenuation with an adaptive attenuation scheme that allows more aggressive attenuation, without introducing audible change of signal frequency characteristics.
Audio decoding apparatus and audio decoding method performing weighted addition on signals
Tue, 22 Apr 2014 08:00:00 EDT
An audio decoding apparatus and method are provided. The audio decoding apparatus includes a spectrum converting part configured to divide the first frequency spectrum in each channel of the first audio signal in a time direction or in a frequency direction to calculate a first signal sequence having the same time resolution and the same frequency resolution in all the channels of the first audio signal, a down-mixing part configured to perform weighted addition on the signals at the same time and within the same frequency band included in the first signal sequence in all the channels to calculate a second signal sequence having channels of a second number different from the first number of channels.
Arbitrary shaping of temporal noise envelope without side-information utilizing unchanged quantization
Tue, 22 Apr 2014 08:00:00 EDT
In a first aspect, arbitrary shaping of the temporal envelope of noise is provided in spectral domain coding systems without the need of side-information. In the encoding, a filtered measure of quantization error is applied as a feedback signal to the frequency-domain representation of a discrete time-domain signal prior to quantization, so that the filtering parameters of said filtering affect the shaping of quantization noise in the time domain of the quantized frequency-domain representation with unchanged quantization of the discrete time-domain signal when it is inversely transformed from the frequency domain back to the time domain in decoding. This may be accomplished with respect to each of a plurality of frequency bins or groups of bins. In another aspect, frequency-domain noise-feedback quantizing in digital audio encoding is provided.
Waveform compressing apparatus, waveform decompressing apparatus, and method of producing compressed data
Tue, 22 Apr 2014 08:00:00 EDT
In a waveform compressing apparatus, a trial mode selecting portion selects a trial mode having the highest compression rate from a plurality of candidate modes which have not been selected before as a trial mode for generating a residue code, the selected trial mode comprising a scalar quantization mode or a vector quantization mode. A waveform data compressing portion compresses a given data amount of original waveform data according to the selected trial mode so as to generate the residue code, the data amount being determined in correspondence with the selected trial mode. A waveform data restoring portion generates a restored waveform data from the compressed data using the generated residue code. A determining portion measures an evaluation value of a quantization error contained in the restored waveform data relative to the original waveform data, and determines whether the evaluation value is equal to or smaller than a predetermined allowable value. A mode change instructing portion outputs a mode change instruction for instructing the trial mode selecting portion to select another trial mode when the evaluation value is not smaller than the predetermined allowable value.
Voice application finding and user invoking applications related to a single entity
Tue, 22 Apr 2014 08:00:00 EDT
A computing device is configured to initiate actions in response to speech input that includes a name or other indication of an entity, in a first spoken utterance, followed by an action, in a second spoken utterance. The computing device receives the first spoken utterance, identifies an entity based on the first spoke utterance, and indicates a plurality of available actions based on the identified entity. The computing device then receives the second spoken utterance and identifies a selection of at least one of the available actions based on the second spoken utterance. The computing device then initiates the at least one selected action.
Computerized information and display apparatus
Tue, 22 Apr 2014 08:00:00 EDT
Apparatus useful for obtaining and displaying information. In one embodiment, the apparatus includes a network interface, display device, and speech recognition apparatus configured to receive user speech input and enable performance of various tasks via a remote entity, such as obtaining desired information relating to maps or directions, or any number of other topics. The downloaded data may also, in one variant, be displayed with contextually related advertising or other content.
Intent deduction based on previous user interactions with voice assistant
Tue, 22 Apr 2014 08:00:00 EDT
Methods, systems, and computer readable storage medium related to operating an intelligent digital assistant are disclosed. A text string is obtained from a speech input received from a user. Information is derived from a communication event that occurred at the electronic device prior to receipt of the speech input. The text string is interpreted to derive a plurality of candidate interpretations of user intent. One of the candidate user intents is selected based on the information relating to the communication event.
Electronic equipment and television receiver utilizing multimodal multifunction voice commands
Tue, 22 Apr 2014 08:00:00 EDT
Disclosed an is an electronic equipment including: a command information storage section to store command pieces for controlling the electronic equipment each associated with a plurality of processing contents pertaining respectively to operation control in a plurality of operation states; a speech information obtainment section to obtain speech information; a command information extraction section to perform speech recognition of the obtained speech information so as to extract the corresponding command information stored in the command information storage section; a judgment section to judge an operation state of the electronic equipment when the command information is extracted; a control section to extract one of the plurality of pieces of control information associated with the extracted command information from the command information storage section based on the judged operation state of the electronic equipment so as to control the television receiver based on the extracted control information.
Method and system for sharing speech processing resources over a communication network
Tue, 22 Apr 2014 08:00:00 EDT
A method and system (40) for sharing speech processing resources (54) over a (communication network (21) for handling multiple client types (100, 101, etc.) and multiple media protocol types. The system can include a router (400) coupled to the communication network, a speech response system (500) coupled to the router, and a server (600) coupled to the speech response system and the router. The server can include at least one processor programmed to determine a media protocol and a client type of a client used for speech communication with the server, adapt at least one among encoding or decoding for the speech communication based on the media protocol and the client type, and dynamically and adaptively configure of the speech processing resources based on the media protocol and the client type.
Establishing a multimodal personality for a multimodal application
Tue, 22 Apr 2014 08:00:00 EDT
Methods, apparatus, and computer program products are described for establishing a multimodal personality for a multimodal application that include selecting, by the multimodal application, matching vocal and visual demeanors and incorporating, by the multimodal application, the matching vocal and visual demeanors as a multimodal personality into the multimodal application.
Periodic ambient waveform analysis for enhanced social functions
Tue, 22 Apr 2014 08:00:00 EDT
Client devices periodically capture ambient audio waveforms, generate waveform fingerprints, and upload the fingerprints to a server for analysis. The server compares the waveforms to a database of stored waveform fingerprints, and upon finding a match, pushes content or other information to the client device. The fingerprints in the database may be uploaded by other users, and compared to the received client waveform fingerprint based on common location or other social factors. Thus a client's location may be enhanced if the location of users whose fingerprints match the client's is known. In particular embodiments, the server may instruct clients whose fingerprints partially match to capture waveform data at a particular time and duration for further analysis and increased match confidence.
System for dynamic management of customer direction during live interaction
Tue, 22 Apr 2014 08:00:00 EDT
A system for customer interaction includes a telephony-enabled device for receiving voice calls from customers, a voice recognition engine connected to the telephony-enabled device for monitoring the voice channel, and an application server connected to the voice recognition engine for receiving notification when specific keywords phrases or tones are detected. The system is characterized in that the application server selects scripts for presentation to the customer based at least in part on the notifications received from the voice recognition engine.
Speech signal restoration device and speech signal restoration method
Tue, 22 Apr 2014 08:00:00 EDT
A synthesis filter 106 synthesizes a plurality of wide-band speech signals by combining wide-band phoneme signals and sound source signals from a speech signal code book 105, and a distortion evaluation unit 107 selects one of the wide-band speech signals with a minimum waveform distortion with respect to an up-sampled narrow-band speech signal output from a sampling conversion unit 101. A first bandpass filter 103 extracts a frequency component outside a narrow-band of the wide-band speech signal and a band synthesis unit 104 combines it with the up-sampled narrow-band speech signal.
Audio signal transforming by utilizing a computational cost function
Tue, 22 Apr 2014 08:00:00 EDT
A sequence is received of time domain digital audio samples representing sound (e.g., a sound generated by a human voice or a musical instrument). The time domain digital audio samples are processed to derive a corresponding sequence of audio pulses in the time domain. Each of the audio pulses is associated with a characteristic frequency. Frequency domain information is derived about each of at least some of the audio pulses. The sound represented by the time domain digital audio samples is transformed by processing the audio pulses using the frequency domain information. The sound transformation utilizes overlapping windows and a computational cost function which depends on a product of the number of the pitch periods and the inverse of the minimum fundamental frequency within the window is determined.
Synchronise an audio cursor and a text cursor during editing
Tue, 22 Apr 2014 08:00:00 EDT
A speech recognition device (1) processes speech data (SD) of a dictation and thus establishes recognized text information (ETI) and link information (LI) of the dictation. In a synchronous playback mode of the speech recognition device (1), during the acoustic playback of the dictation a correction device (10) synchronously marks the word of the recognized text information (ETI) which word relates to the speech data (SD) just played back marked by the link information (LI) is marked synchronously, the just marked word featuring the position of an audio cursor (AC). When a user of the speech recognition device (1) recognizes an incorrect word, he positions a text cursor (TC) at the incorrect word and corrects it. Cursor synchronization means (15) now make it possible to synchronize the text cursor (TC) with the audio cursor (AC) or the audio cursor (AC) with the text cursor (TC), by positioning the text cursor at a predetermined position relative to the audio cursor, so that the positioning of the respective cursor (AC, TC) is simplified considerably.
Content and advertising service using one server for the content, sending it to another for advertisement and text-to-speech synthesis before presenting to user
Tue, 22 Apr 2014 08:00:00 EDT
Methods and systems for providing a network-accessible text-to-speech synthesis service are provided. The service accepts content as input. After extracting textual content from the input content, the service transforms the content into a format suitable for high-quality speech synthesis. Additionally, the service produces audible advertisements, which are combined with the synthesized speech. The audible advertisements themselves can be generated from textual advertisement content.
Controllable prosody re-estimation system and method and computer program product thereof
Tue, 22 Apr 2014 08:00:00 EDT
In one embodiment of a controllable prosody re-estimation system, a TTS/STS engine consists of a prosody prediction/estimation module, a prosody re-estimation module and a speech synthesis module. The prosody prediction/estimation module generates predicted or estimated prosody information. And then the prosody re-estimation module re-estimates the predicted or estimated prosody information and produces new prosody information, according to a set of controllable parameters provided by a controllable prosody parameter interface. The new prosody information is provided to the speech synthesis module to produce a synthesized speech.
Voice recognition terminal
Tue, 22 Apr 2014 08:00:00 EDT
A voice recognition terminal executes a local voice recognition process and utilizes an external center voice recognition process. The terminal includes: a voice message synthesizing element for synthesizing at least one of a voice message to be output from a speaker according to the external center voice recognition process and a voice message to be output from the speaker according to the local voice recognition process so as to distinguish between characteristics of the voice message to be output from the speaker according to the external center voice recognition process and characteristics of the voice message to be output from the speaker according to the local voice recognition process; and a voice output element for outputting a synthesized voice message from the speaker.
Applying a structured language model to information extraction
Tue, 22 Apr 2014 08:00:00 EDT
One feature of the present invention uses the parsing capabilities of a structured language model in the information extraction process. During training, the structured language model is first initialized with syntactically annotated training data. The model is then trained by generating parses on semantically annotated training data enforcing annotated constituent boundaries. The syntactic labels in the parse trees generated by the parser are then replaced with joint syntactic and semantic labels. The model is then trained by generating parses on the semantically annotated training data enforcing the semantic tags or labels found in the training data. The trained model can then be used to extract information from test data using the parses generated by the model.
Indexing digitized speech with words represented in the digitized speech
Tue, 22 Apr 2014 08:00:00 EDT
Indexing digitized speech with words represented in the digitized speech, with a multimodal digital audio editor operating on a multimodal device supporting modes of user interaction, the modes of user interaction including a voice mode and one or more non-voice modes, the multimodal digital audio editor operatively coupled to an ASR engine, including providing by the multimodal digital audio editor to the ASR engine digitized speech for recognition; receiving in the multimodal digital audio editor from the ASR engine recognized user speech including a recognized word, also including information indicating where, in the digitized speech, representation of the recognized word begins; and inserting by the multimodal digital audio editor the recognized word, in association with the information indicating where, in the digitized speech, representation of the recognized word begins, into a speech recognition grammar, the speech recognition grammar voice enabling user interface commands of the multimodal digital audio editor.
System and method for selecting audio contents by using speech recognition
Tue, 22 Apr 2014 08:00:00 EDT
A system and method for selecting audio contents by using the speech recognition to obtain a textual phrase from a series of audio contents are provided. The system includes an output module outputting the audio contents, an input module receiving a speech input from a user, a buffer temporarily storing the audio contents within a desired period and the speech input, and a recognizing module performing a speech recognition between the audio contents within the desired period and the speech input to generate an audio phrase and the corresponding textual phrase matching with the speech input.
Methods and apparatus for formant-based voice synthesis
Tue, 22 Apr 2014 08:00:00 EDT
In one aspect, a method of processing a voice signal to extract information to facilitate training a speech synthesis model is provided. The method comprises acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison. In another aspect, the method is performed by executing a program encoded on a computer readable medium. In another aspect, a speech synthesis model is provided by, at least in part, performing the method.

Language Selection

linkedin        twitter

Company Search


Advanced Search   


Services


logo inttranet


logo inttrastats


logo Inttranews


logo Inttrasearch


logo linguists of the year