Talking heads

From Hackerspace Adelaide
Jump to: navigation, search
Talking Heads
Talking Heads
Our Machinery of Data project 2014
Purpose To present open data in a unique and interesting format
Status Complete

Machinery of Data Competition (part of Unleashed/GovHack)

The Original Idea

  • "to have a row of polystyrene heads (attired, decorated) with speakers inside them and each one has a different voice and stream of data."

Extension of Idea

  • heads can move side to side, motion sensor eg. they don't start talking until something moves in front of them, because the name of the competition is "Machinery" of Data I like the idea of the heads being machinery themed (Kylie).

Changes to Idea

  • no movement of heads
  • no motion sensor

Jobs[edit]

Kylie

  • black polystyrene heads
  • collected electronic, mechanical, plastic and metal components
  • craft gear
  • crafty skills

Hackerspace

  • rasp pi or pc

Pix

  • data
  • hardware

Robyn

  • data
  • hardware

Damien

  • software
  • hardware

Hardware[edit]

Sound in Linux[edit]

ALSA is the interface to the sound hardware. Unless the sound device does hardware mixing - not so common these days - ALSA can't do any itself. PulseAudio can do this; it interfaces with ALSA to play sound.

We need mixing because with ALSA, I don't think more than once process can access a given sound card. We need this to play two different voices out of different speakers.

Talking heads sound architecture.png

ALSA[edit]

With any luck ALSA doesn't need configuring; /proc/asound contains some symlinks naming each card. This can be passed to PulseAudio's module-alsa-sink using the device_id parameter.

Links[edit]

PulseAudio[edit]

I don't think we need this stuff - see the next section - but here's how I started.

Stop PulseAudio resurrecting itself:

echo autospawn = no > $HOME/.config/pulse/client.conf

If you lose all sound, try deleting this file.

We'll probably need to stop PulseAudio trying to configure itself by not loading module-udev-detect.

Now configure a FIFO to send its output to a single speaker. Add this to a file (heads.pa maybe). This one accepts data at 8kHz in μ-law format:

load-module module-pipe-source source_name=head1 file=/tmp/head1 rate=8000 format=ulaw
load-module module-loopback source=head1 channel_map=left

Run PulseAudio:

pulseaudio --log-level=warn --file=heads.pa

Now you can dump data to /tmp/head1 and it comes out of the left speaker.

When you're done, pulseaudio --start is supposed to get your normal sound back, but it's not working for me - the volume control breaks! Maybe try start-pulseaudio-x11.

Links[edit]

paplay[edit]

Playing a sound with paplay: First find your sound card's sink:

$ pactl list short sinks</code>:
0	alsa_output.pci-0000_00_1b.0.analog-stereo	module-alsa-card.c

then

paplay --device=alsa_output.pci-0000_00_1b.0.analog-stereo --channel-map=left myfile.wav

will play the sound on the left channel.

Nerding up[edit]

For extra marks: It might be possible to use DMA to produce several channels of sound on a Raspberry Pi. PiBlaster might be a good start.

Text to speech[edit]

Festival[edit]

Festival by default seems to write its output to a .wav file, then invoke some command to play it. This surprises me a bit because it can't start playing the sound before it's finished generating it. We can use paplay -d to play it to a specific stream.

Combining festival with paplay:

festival> (Parameter.set 'Audio_Method 'Audio_Command)
festival> (Parameter.set 'Audio_Command "paplay --device=alsa_output.pci-0000_00_1b.0.analog-stereo --raw --rate=$SR --format=s16le --channels=1 --channel-map=left $FILE")
festival> (SayText "Foo")
#<Utterance 0x7fd39d3ada10>                                                     

It's possible to have Festival start playing the sound before it's finished generating it.

Interfacing[edit]

Client/server mode[edit]
$ festival --server
server    Sat Jul  5 14:06:19 2014 : Festival server started on port 1314
client(1) Sat Jul  5 14:06:34 2014 : accepted from localhost

then

$ nc localhost 1314
(SayText "Foo")
LP
#<Utterance 0x7fea512413d0>
ft_StUfF_keyOK

The bit starting "LP" is sent after the sound has played. I don't know what should be used to detect errors, when done etc; I haven't found any docs about it.

Libraries[edit]

  • Python, in client/server mode; possibly not working (see _say_server)

Voices[edit]

The basic Festival voice is very artificial, and the ones that are "freely" available aren't much better. Festival's demo page has some really nice ones, these aren't available. At this point we might have to give speech synthesis a miss.

We might be able to pre-render the text using a non-free option like Ivona.

gespeaker and mbrola[edit]

in Ubuntu:

$ sudo aptitude install gespeaker mbrola-en1 mbrola-us1 mbrola-us2
$ sudo ln -s /usr/lib/x86_64-linux-gnu/espeak-data/ /usr/share

(The symlink is due to a bug).

now you can use gespeaker with the mbrola voices.

I've created some samples.

Others[edit]

Ahead-of-time text to speech[edit]

VoiceRSS' "English (Great Britian)" voice isn't too bad; there appears to be no usage restrictions on generated audio.

Data[edit]

Our main source of data is Trove newspapers and from that we're looking at pulling out;

  • letters to the editor
  • advertisements
  • news stories

We have a spreadsheet of the text we want to use.

Heads[edit]

Work in Progress

Machinery of Data exhibition[edit]

Taking heads exhibit.jpg Taking heads description.jpg

  • Our video entry [1]
  • All the video entries for Machinery of Data competition [2]
  • We won a Printrbot 3d printer from Bilby
  • The Renew Adelaide Creative Space prize
  • An honourable mention for the Best Artisitic Use of Open Data