IISc Logo    Title

etd AT Indian Institute of Science >
Division of Electrical Sciences >
Electrical Engineering (ee) >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2005/592

Title: Music And Speech Analysis Using The 'Bach' Scale Filter-Bank
Authors: Ananthakrishnan, G
Advisors: Ramakrishnan, A G
Keywords: Speech Analysis
Speech Processing
Filter Bank
Musical Transcription
Speech Recognition
Speech - Signal Processing
Audio Signals
Music - Pitch Tracking Algorithms
Music Signals - Analysis
Time Frequency Analysis
Bach Scale
Automated Speech Segmentation
Submitted Date: Apr-2007
Series/Report no.: G21044
Abstract: The aim of this thesis is to define a perceptual scale for the ‘Time-Frequency’ analysis of music signals. The equal tempered ‘Bach ’ scale is a suitable scale, since it covers most of the genres of music and the error is equally distributed for each semi-tone. However, it may be necessary to allow a tolerance of around 50 cents or half the interval of the Bach scale, so that the interval can accommodate other common intonation schemes. The thesis covers the formulation of the Bach scale filter-bank as a time-varying model. It makes a comparative study with other commonly used perceptual scales. Two applications for the Bach scale filter-bank are also proposed, namely automated segmentation of speech signals and transcription of singing voice for query-by-humming applications. Even though this filter-bank is suggested with a motivation from music, it could also be applied to speech. A method for automatically segmenting continuous speech into phonetic units is proposed. The results, obtained from the proposed method, show around 82% accuracy for the English and 85% accuracy for the Hindi databases. This is an improvement of around 2 -3% when the performance is compared with other popular methods in the literature. Interestingly, the Bach scale filters perform better than the filters designed for other common perceptual scales, such as Mel and Bark scales. ‘Musical transcription’ refers to the process of converting a musical rendering or performance into a set of symbols or notations. A query in a ‘query-by-humming system’ can be made in several ways, some of which are singing with words, or with arbitrary syllables, or whistling. Two algorithms are suggested to annotate a query. The algorithms are designed to be fairly robust for these various forms of queries. The first algorithm is a frequency selection based method. It works on the basis of selecting the most likely frequency components at any given time instant. The second algorithm works on the basis of finding time-connected contours of high energy in the ‘Time-Frequency’ plane of the input signal. The time domain algorithm works better in terms of instantaneous pitch estimates. It results in an error of around 10 -15%, while the frequency domain method results in an error of around 12 -20%. A song rendered by two different people will have quite a few different properties. Their absolute pitches, rates of rendering, timbres based on voice quality and inaccuracies, may be different. The thesis discusses a method to quantify the distance between two different renderings of musical pieces. The distance function has been evaluated by attempting a search for a particular song from a database of a size of 315, made up of songs sung by both male and female singers and whistled queries. Around 90 % of the time, the correct song is found among the top five best choices picked. Thus, the Bach scale has been proposed as a suitable scale for representing the perception of music. It has been explored in two applications, namely automated segmentation of speech and transcription of singing voices. Using the transcription obtained, a measure of the distance between renderings of musical pieces has also been suggested.
URI: http://etd.iisc.ernet.in/handle/2005/592
Appears in Collections:Electrical Engineering (ee)

Files in This Item:

File Description SizeFormat
G21044.pdf1.6 MBAdobe PDFView/Open

Items in etd@IISc are protected by copyright, with all rights reserved, unless otherwise indicated.


etd@IISc is a joint service of SERC & IISc Library ||
|| Powered by DSpace || Compliant to OAI-PMH V 2.0 and ETD-MS V 1.01