Acoustical Society of America

Forensic Acoustics Subcommittee

Special Session - 162nd ASA Meeting, San Diego, California, 31 October – 4 November 2011

ASA Home FAS Home
last update: 10 November 2011


Forensic acoustics – On the leading edge of the tidal wave of change about to hit forensic science in the US(?)

Introduction:

Abstract submission:

Registration:



Program:

Thursday 3 November 2011
Pacific Salon 4/5

Invited Presentations

Contributed Presentations

  • Nasal spectra for forensic voice comparison
    (4aSCa5) 10:45 am – 11:00 am

    • Ewald Enzinger1,2, Cuiling Zhang1,3
      1Forensic Voice Comparison Laboratory, School of Electrical Engineering & Telecommunications, University of New South Wales
      2Acoustics Research Institute, Austrian Academy of Sciences
      3Department of Forensic Science & Technology, China Criminal Police University

      • For features to be effective in forensic voice comparison, they must have relatively low within-speaker variability and relatively high between-speaker variability. An understudied source of features which potentially meets these criteria is the acoustic spectrum of nasals. Nasals spectra contain poles and zeros dependent upon nasal cavities. The latter are complex static structures which vary from person to person. Theoretically, nasal spectra may therefore have low within-speaker and high between-speaker variability. This study evaluates different methods for extracting spectral features (e.g., pole-zero models, all-pole models, and cepstra) and using them as part of a likelihood-ratio forensic-voice-comparison system. The validity and reliability of each system is empirically evaluated using /m/ and /n/ token extracted from a database of voice recordings of 60 female speakers of Chinese.

        Data collection was funded by an International Association of Forensic Phonetics and Acoustics Research Grant. Data analysis was funded by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), through the Army Research Laboratory (ARL). All statements of fact, opinion or conclusions contained herein are those of the authors and should not be construed as representing the official views or policies of IARPA, the ODNI, or the U.S. Government. Presentation supported by the Australian Research Council, Australian Federal Police, New South Wales Police, Queensland Police, National Institute of Forensic Science, Australasian Speech Science and Technology Association, and the Guardia Civil through Linkage Project LP100200142. Unless otherwise explicitly attributed, the opinions expressed are those of the authors and do not necessarily represent the policies or opinions of any of the above mentioned organizations.


  • Intra- and inter-speaker variability in duration and spectral properties of English /s/
    (4aSCa6) 11:00 am – 11:15 am

    • Colleen Kavanagh
      Department of Language and Linguistic Science, University of York

      • This study investigates the speaker/specificity of acoustic characteristics of the English fricative /s/ and contributes background population statistics for use in forensic speaker comparison work. The intra- inter-speaker variability in duration and spectral properties of /s/ was investigated in data from 30 young adult male speakers of Cambridge and Leeds English. Read speech was used in the present study to allow for direct comparison across speakers. Segment duration was normalized for speaking rate. Spectra were filtered at 3 kHz in order to explore speaker discrimination performance at settings mimicking the bandpass filter effect of telephone transmission. Additional filters were applied at 8, 16, and 22.05 kHz to investigate discrimination with data from various frequency ranges. Spectral measures were calculated from a 40-ms wide window centred at the midpoint of each token. Although mean values display relatively little inter-speaker variation, the individuals at the extreme high and low ends of the distributions may be the best discriminated, particularly those at the extremes on more than one parameter. Discriminant analyses were conducted to determine the most speaker-specific predictors; relative performance was compared across the four filter conditions. The discriminatory ability of these parameters will also be presented using a likelihood ratio framework.


  • Collecting population statistics: The discriminant power of clicks
    (4aSCa7) 11:15 am – 11:30 am

    • Erica Gold
      Department of Language and Linguistic Science, University of York

      • This research gathers population statistics on clicks for use in likelihood ratios (LRs). As reported in Gold and French (2011), clicks have been analyzed by 57% of experts in forensic speaker comparison cases and 18% of experts find them to be useful speaker discriminants. Eight minutes of speech from 100 male speakers of Southern Standard British English were analyzed from the DyVis Database, using categorical annotations of clicks (Wright 2007). The distribution of click use in subjects is highly skewed with a large majority not clicking. However, the distribution of clicks is highly variable with non-clickers ranging from 25–44% of the population depending on the length of the speech sample. The same 100 speakers were also analyzed for click use when speaking with two additional interlocutors. Again the results are highly variable, which suggests the intra- and inter-speaker instability of clicks, the lack of overall robustness, and the accommodation of clicks in speech. This study serves as a beginning point in incorporating previously unreported population statistics into LRs, and specifically examining the potential of including higher order and paralinguistic features in a Bayesian framework.

        Research funded by the European Community's Seventh Framework Program (FP7/2007-2013) under grant agreement 238803.



  • Question Period
    11:30 am – 11:55 am


  • Lunch
    11:55 am – 1:30 pm


  • Chair’s Introduction
    1:30 pm – 1:35 pm


  • Selection of speech/voice vectors in forensic voice identification
    (4pSCa1) 1:35 pm – 1:50 pm

    • James Harnsberger1, Harry Hollien2
      1Department of Linguistics, University of Florida
      2Institute for Advanced Study of the Communication Processes, University of Florida

      • The case for the use of speech/voice vectors in speaker identification was made by Hollien and Harnsberger (2010). Those vectors found most robust in capturing speaker-specific characteristics were voice quality, vowel quality, speaking fundamental frequency, and temporal features. In this study, the speech cues for each of the four vectors will be compared to each other with respect to their predictive power. In addition, different vector algorithms and/or processing approaches for each will be contrasted in terms of their effects on identification robustness. One example is conversion of the vowel formant frequencies and bandwidth measurements to geometric scaling (semits). Finally, a second dataset of 18 male voices obtained from evidence recordings, paired with exemplars recorded from a speaker pool, were used to test a modified form of this speech/voice vector approach. The data from these subjects will be compared to those from the 1993 study (substantial improvement) and those from the 2010 experiment (confirmation).


  • When to punt on speaker comparison?
    (4pSCa2) 1:50 pm – 2:05 pm

    • Reva Schwartz1, Joseph P. Campbell2, Wade Shen2
      1United States Secret Service
      2MIT Lincoln Laboratory

      • In forensic speaker comparison, it is crucial to decide when completion of the examination may not possible (punt). We explore the factors that make speaker comparison decisions difficult or impossible. These factors may include: duration, noise, speaking style, language/dialect, mental state, number of speakers, type and quality of recording, and deception. The analyst needs criteria to decide to reject case work. We present analysis of some of these factors and their impact on automatic speaker recognition systems. We propose a methodology for setting objective thresholds by which comparison examples can be rejected. This methodology could be used by forensic analysts to decide whether or not to proceed with speaker comparisons involving these factors.

        This work is sponsored by the Department of Defense under Air Force contract FA8721-05-C0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.


  • Defining the default defense hypothesis in likelihood-ratio forensic voice comparison
    (4pSCa3) 2:05 pm – 2:20 pm

    • Felipe Ochoa, Geoffrey Stewart Morrison
      Forensic Voice Comparison Laboratory, School of Electrical Engineering & Telecommunications, University of New South Wales

      • In forensic DNA comparison the person submitting samples for evaluation does not know what properties the samples will have when they are analyzed at the laboratory; samples are submitted as a matter of routine. In contrast, in forensic voice comparison the decision to submit samples for evaluation is based on prior screening: Typically a police officer, a lay person with respect to forensic voice comparison, has listened to the questioned-speaker recording and the known-speaker recording and decided that they sound sufficiently similar that they could be the same speaker and merit evaluation by a forensic scientist. If they do not sound sufficiently similar they are not submitted for evaluation. Unless the defense proposes a more restrictive hypothesis, the forensic scientist should therefore adopt the following as the default defense hypothesis and select a background database accordingly: The known speaker is not the same person as the questioned speaker but is one member of a population of speakers whom to a lay person sound sufficiently similar to the voice on the questioned-voice recording that they would submit recordings of these speakers for forensic comparison with the questioned-voice recording. Examples of how this theory might be applied are discussed.

        Research supported by the Australian Research Council, the Australian Federal Police, New South Wales Police, Queensland Police, the National Institute of Forensic Science, the Australasian Speech Science and Technology Association, and the Guardia Civil via Linkage Project LP100200142. Unless otherwise explicitly attributed, the opinions expressed herein are those of the authors and do not necessarily represent the policies or opinions of any of the above mentioned organizations.


  • Break
    2:20 pm – 2:35 pm


  • Human error rates for speaker recognition
    (4pSCa4) 2:35 pm – 2:50 pm

    • Wade Shen1, Joseph P. Campbell1, Reva Schwartz2
      1
      MIT Lincoln Laboratory
      2United States Secret Service

      • It is commonly assumed that speaker identification by human listeners is an innate skill under certain conditions. As such, human listening tests have served as the benchmark for automatic recognition systems. In recent evaluations comparing human and machine performance on a speaker comparison task, error rates of naïve human listeners far exceed those of machines [special session on Human Assisted Speaker Recognition, IEEE ICASSP, Prague, 2011]. In this presentation, we quantify the performance of naïve listeners in a variety of challenging channel conditions and we compare these results against automatic systems and trained human listeners. The results of these experiments impact the admissibility of both forensic voice analysis and courtroom testimony by human listeners.

        This work is sponsored by the Department of Defense under Air Force contract FA8721-05-C0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.


  • Presentation withdrawn (presenter unable to attend)
    Investigating the acoustic and phonetic correlates of deceptive speech
    (4pSCa5) 2:50 pm – 3:05 pm

    • Christin Kirchhübel
      Audio Laboratory, Department of Electronics, University of York

      • The following study describes an initial investigation into the acoustic and phonetic correlates of deceptive speech using auditory and acoustic analysis. Due to the lack of extant data suitable for acoustic analysis, a laboratory-based experiment was designed which employed a mock-theft paradigm in conjunction with a ‘security interview’ to elicit truthful and deceptive speech as well as control data from a total of 10 male native British English speakers. Using Praat, the control, truthful and deceptive speech samples were analyzed on a range of speech parameters including f0 mean and variability, intensity, vowel formant frequencies and Speaking/Articulation Rate. Preliminary analysis suggests that truth-tellers and liars cannot be differentiated based on these speech parameters. Not only was there a lack of significant changes for the majority of parameters investigated but also, if change was present it failed to reveal consistencies within and between speakers. The remarkable amount of inter and intra-speaker variability underlines the fact that deceptive behavior is individualized and very multifaceted. As well as providing a basis for future research programs, the present study should encourage researchers and practitioners to evaluate critically what is (im)possible using auditory and machine based analyses with respect to detecting deception from speech.


  • Progress toward a forensic voice data format standard
    (4pSCa6) 3:05 pm – 3:20 pm

    • James L. Wayman1, Joseph P. Campbell2, Pedro Torres-Carrasquillo2, Peter T. Higgins3, Alvin Martin4, Hirotaka Nakasone5, Craig Greenberg4, Mark Pryzbocki4
      1Office of Graduate Studies and Research, San José State University
      2Human Language Technology, MIT Lincoln Laboratory
      3Higgins and Associates, International
      4Information Access Division, National Institute of Standards and Technology
      5Digital Evidence Section, Operational Technology Division, Federal Bureau of Investigation

      • The de facto international standard for the forensic exchange of data for biometric recognition is ANSI/NIST ITL-1/2, “Data Format for the Interchange of Fingerprint, Facial & Other Biometric Information”. This format is used by law enforcement, intelligence, military, and homeland security organizations throughout the world to exchange fingerprint, face, scar/mark/tattoo, iris, and palmprint data. To date, however, there is no provision within the standard for the exchange of audio data for the purpose of forensic speaker recognition. During the recent 5-year update process for ANSI/NIST ITL-1/2, a consensus decision was made to advance a voice data format type under the name “Type 11 record”. Creating such an exchange format type, however, is far from straight forward – the problem being not the encoding of the audio data, for which many accepted standards exist, but rather reaching a consensus on the metadata needed to support the varied mission requirements across the stakeholder communities. In this talk, we’ll discuss the progress that has been made to date, the questions that remain, and the requirements for additional input from the broader stakeholder communities.


  • Question Period
    3:20 pm – 3:45 pm


  • Break
    3:45 pm – 4:00 pm


Organizational meeting

  • Meeting for members of the ASA Forensic Acoustics Subcommittee
    4:00 pm – 4:45 pm


Organizer: