Archive for November, 2011

Treatment of repeating content

Monday, November 28th, 2011

In media monitoring systems and environments, we often have to identity and COUNT the occurrences of some playback event.  Most common examples of such are when you have to monitor all the occurrences of the same commercial audio spot.

Multiple parties are interested in tracking audio spots:

  • broadcasters
  • advertisers (clients)
  • agencies (clients representative)
  • government regulators

Let me briefly cover what do they need to know about playback of the commercial audio spots.


They need proof that they played something at a certain time, to show it to the client and be able to issue invoices for services provided.

Advertisers and agencies

They both need to have a proof that something was played – their own commercials, at certain times, and by correct amount.  However, they might also need to be able to look into other brands so they can track their competitors.

Government regulators

They usually want to know if the proposals or laws requirements on the broadcast media is met.  Such requirements are for example to have no more then 2 minutes of advertisements per hour, or to have commercial blocks clearly separated from the rest of the program by special markers called ‘jingles’ or ‘breaks’.

Let’s get back to..

The problem

Usual workflow for the above is to fill the matching technology with a samples that you want to track, and the technology will give you the locations in the timeline for the samples provided.  That is one thing that PlayKontrol can do for you.  But, what if you don’t have the samples, and still want to discover them?

The rescue

Traditional way would be to go through the known parts of the program, mark them, clip them out and have audio spotter search for all of the occurrences.  With that method, and with lot of clipping, you’ll have some accuracy, and some clips will miss you attention because they aren’t in their place, for example commercial is out of its commercial block.

Other way is to do it with PlayKontrol SelfMatching technology.  It works in a way that whole day of archive is given to the PK, and the result of the process is a list that contains ALL of the matches for the given day.

So every repeating audio clip, no mater how small, will be listed here.  From there, your analyst only task would be to:

  • browse through the clips,
  • listen to them,
  • maybe fine-clip them,
  • tag them and
  • put them into the repository.

I have created a picture containing the results of the process in ‘visual representation’.  Here it is:

Please note the following:

  • both X and Y axes represent time
  • grid divides time in one-hour interval
  • grayed areas are time intervals 00-06 and 18-24 (say night time)
  • size of the points represent length of the clip that is matched

Try to figure out the rest for yourself.  Hint: large blobs are possibly repeating songs.

Capturing and archiving of DVB-T signal

Monday, November 28th, 2011

No matter if it’s for compliance recording so you will capture and save your own broadcast, or you are doing media monitoring and you would like to capture multiple signals of the air, you have some interesting choices here.

Let’s explore in detail your options on the subject, whether it’s one channel or multiple channel recording.

One channel DVB-T recorder

Recording of one channel is simple no matter how you choose to record it.  Let me present two main options here for you, so you could see what is most applicable in your situation.

Simplest way of recording would be to have one set top box for DVB-T, and use it to send composite signal into the computer via the Osprey 210 card.  It is the most robust solution, but it has some (serious) drawbacks:

  • cheap DVB-T tuners can ‘lock’ and freeze the picture
  • low quality tuners can also de-sync audio and video with time – and you need 24/7 operation here
  • you’ll need extra power connector for the set top box
  • STB-s are producing extra heat

Alternative way of recording is to use DVB-T card such as Asus MyCinema-ES3-110, use software such as TubeSink to tune on a frequency and extract the channel required from it (this is called DEMUX-ing) and forward the extracted channel to the VideoPhill Recorder for further processing (recording, streaming, …).

BTW, TubeSink mentioned above can be used even without VideoPhill Recorder, as it DEMUXes the channels and can forward them to any computer on your network as an UDP Transport Stream that can be playable with VLC.  It you want to use it for non-commercial purposes, download it from here.

So in the case on 1 channel DVB-T recording, I would say that it remains uncertain whether to use external set top box with Osprey capture card, or go with pure software solution and some simple of-the-shelf DVB-T tuner.

But in case of…

Multiple channels DVB-T recording facility

Same options are available at multiple channel recording facilities – but here is the catch.  As you might probably know, multiple DVB-T channels are packed and are transmitted at one frequency and that is called multiplexing.  The carrier for the channels that are transmitted is called MULTIPLEX (MUX for short).  In several occasions it has 4 channels, and sometimes it can have as much as 16 or more channels.

Current recommended recorder density (channels per machine) is 4. One machine packed with Osprey 460e will do 4 channels just fine.

So, let’s say that we need 16 channels and they are scattered across 3 MUX-es (we have such situation here in Zagreb).  Using a conventional method (I would say that having 16 STBs is conventional, as bizarre as it seems) you’ll need the:

  • 4 recording servers
  • 4 Osprey 460e cards
  • 16 DVB-T set top boxes
  • PLENTY of mains outlets
  • some kind of distribution to have the signal distributed to all 16STBs

Since you see where I’m coming to, let me suggest the following; let’s use TubeSink to control 3 tuners in TWO MACHINEs, and save on 4 Ospreys and 2 PCs, and the rest of the unnecessary equipment.

We’ll put 2 tuners into one machine, and one tuner in the second machine.  If the channel per MUX distribution is such that each machine has it’s 8 channels, fine.  If not, we’ll instruct TubeSink to forward the Transport Stream to ANOTHER machine and that machine will perform recording.  In that way, load will be completely balanced between two machines, and you’ll have your 16 channels recorder in a nice and compact fashion.

Even more compact?

Yes, it can go even further.  There are dual DVB-T tuners such as WinTV-HVR-2200 that can provide tuning to two frequencies at once, and with it, you could record as much channels there are in two MUXes at one machine.  Today, even desktop processors such as i7 can encode 8 channels of video in real time.  So, with proper CPU (or multiple CPUs on server computers) – even 16 channels could be encoded in one compact 2U rack mounted unit.

However… (serious problem)

Using PC based DVB-T cards will only work with free to air channels.  If any of your channels are encrypted, solution described above will NOT work.

SD-SDI compliance recording

Monday, November 28th, 2011

I will try to write something about creating the archive for the compliance recording purposes from the SD-SDI source.  This post is in a response from an repeated inquiries about system that would create such an archive.

Compliance recording in general

Just to remind us – compliance recording serves only one purpose – to prove or disprove that something went on the air at some time.  That is the first and the last thing that this technology is used for.  There is no any requirement on the content of the signal – it just have to be good enough for someone to see basic stuff that’s in there .

This one business requirement is countered from the other side with need to create the archiving system to be as affordable as possible.  And the cost of the system, if we count out the software involved (system and application part) is highly dependent on the:

  • picture quality
  • number of channels recorded
  • number of possible outputs needed
  • and days of the archive that should be kept

Picture (ie. video) quality is directly proportional to the BITRATE of the video that is recorded.  More bitrate, more apparent quality, both in picture resolution, object motion, and so on.  For example, most recorded videos that you might have on your computer are in between 700 and 1000 kbit and are recorded with DivX or similar encoder.  Watching a movie at 1000 kbit bitrate is highly enjoyable, and everything above that is required ONLY for HD or HD-Ready content.

Compliance recording means recording what’s on the air

Many people forget that when you do compliance recording, you have to record what was coming out of your transmitter array, not the final that you are broadcasting.  In case your link went down, or your output HF amplifier burned, your endpoint picture will be NOTHING, and if you are recording your output, there will be a great discrepancy in your logs.

So suggestion: drop the thinking about recording SD-SDI, and try to record what comes back from the air, in the form of the analog or DVB-T signal.  That is the REAL broadcast that your viewers see.

The conflict of interest

We all want stuff to be as cheap as possible.  In order to build a recording system that is affordable, we have to balance between various things, but here I’ll try to explore the differences when SD-SDI signal capture is required.

I will postulate this input parameters for the archive that will serve as example here:

  • video bitrate: 512 kbit
  • audio bitrate: 64 kbit
  • archive duration (number of days to keep from today): 92 (3 months)
  • number of channels that needs recording: 4
  • required outputs: both WMV (Windows Media Video) and h.264 (at the same time)
  • recording resolution: 3/4 of PAL D1 picture size so: 540×432

In order to calculate hard drive space requirements for this here, I’ll use the online bitrate calculator here on this blog.

Table below taken from the calculator itself.

Video Bitrate Audio Bitrate Total Bitrate Days Channels Disk Space
512 kbit/s 64 kbit/s 2304 kbit/s 92 4 2183.20 GB

So we need 2200GB of net drive space.  In order to provide that kind of space with some fault tolerance, I would recommend having two drives such as WD CAVIAR GREEN 2500GB connected in a MIRROR VOLUME (RAID1).  There is no need for additional system drive, and almost all new motherboards can provide on-board implementation of RAID1 – mirroring.

I’m kind of beating around the bush here – because main point of this article should be to provide you with a choice of whether to use original SD-SDI signal or convert to composite signal (or use the signal that comes back).

Osprey vs DeckLink

The choice that lies before us is either to use Osprey 460e card, or a DeckLink Quad card.  List price for the Osprey is about $1200, and two DeckLink Quad card is about $1000.  So, the first impression is that you’ll go cheaper with DeckLink.  But…

So far, the Osprey has proven itself to be absolute master of video capture.  Every system deployed so far didn’t have any problems with the card whatsoever.  You plug it in, install drivers, and it works.  And with Osprey 460e, you’ll use only ONE PCI-e slot.  Other slot, if there is any, will be used by the graphics card.

Same thing about the slots applies to DeckLink as well.  However, there is “now shipping” image below the DeckLink Quad card name, and it implies something – it is red-hot new.  Despite the fact that VideoPhill Recorder will work with it, I don’t know how reliable it will be in 4 channel simultaneous recording situations.

My point – it has to prove itself – and if it does, it will be a great addition to VideoPhill arsenal.


Refactoring, it’s fun – part 2

Tuesday, November 22nd, 2011

In the first part I tried to set the scene and give you some background of my problem. I’ll try to continue now, and create an interesting and at the same time informative story about performing the refactoring in question.

Most curious of all is that you don’t have to prepare for refactoring. It can happen no matter what, and the price usually isn’t so high after all. Let me remind you – main tool of component design that is dependency injection wasn’t used. Yes, classes do have their responsibilities cleverly defined, and it helps a lot here, because if it weren’t so – whole deal would have to start few steps ‘before’.

I’m not an experienced writer, so I don’t know if I will get the point across, but to me refactoring this was like building level of level of scaffolding, just to use one level to test the next, and at the same time creating scaffolding so it would be used later in a production environment! I guess that there is a name for it, it has to be :)

Step 1:

Creating a duplicate of the main working class, and see how to force the rest of the application to use it when needed.

Say that class name is PCMHasher, and it has following structure (method declarations only):

class PCM_Hash
    public bool Initialize(PCMBufferedReader sourceDataReader);
    public uint GetNextHash();

My goal was to create alternative class to this. I needed the old class to be able to have some reference to get the results from.

So I created class PCMHash_2.  That was my first decision – to create same class as before and try to get same results from it, replacing its guts one step at the time.

Using replaced version wasn’t easy, so I took an interface out and derived from it, and created something like:

IPCM_Hash rph;
if (!_refactoring)
    rph = new PCMHash();
    rph = new PCMHash_R2();

At this time I would like to re-state the fact that I am in the production all the time, and have to decide on my feet, having to weight out all the implications that would arise from it.  I am telling that because everyone could see that some dependency injection was to be used here. But, apart from having to spend much time installing it through the code, I’m not sure how and if it would work anyway.

Why: at a testing time, I want both classes to be able to function side by side.  Concretely, if I have:

class FileHasher
    // will use IPCM_Hash implementation somewhere...

I’ll need:

FileHasher usesOriginal=new FileHasher();
FileHasher usesRefactored=new FileHasher();
//  compare results
object resultFromOriginal=usesOriginal.GetResult();
object resultFromRefactored=usesRefactored.GetResult();
Assert.AreEqual( resultFromOriginal, resultFromRefactored);

Here, the GetResult() method could create new IPCMHash several times, and I am not sure if for example nInject would be able to handle such a thing.

Also, I was toying with an idea to use class factories manually created or send a type of the implementation class in the constructor of the consumer, but that options also faded.

Anyway, there is original class, and there is new refactored class ready to be taken apart and put together again in some different order, preserving functionality that was here before.

Next steps to be added in next editions of the refactoring talks…

Refactoring, it’s fun – part 1

Tuesday, November 22nd, 2011

It’s a story of refactoring when code that should be refactored isn’t prepared for it a single bit.  If I say prepared I guess that it would mean to have test cases, dependency injection code, and so on.  However, I have none of the above in the original code, just the code that works.

Let me explain what I have, what it does, and where it should end.  The purpose of this here refactoring session isn’t about having better performance and keep the same functionality – it is to have same algorithm, already proven in various tests work on different data.

Before (a.k.a. now):

Component creates PK_HASH from a sound file.  By PK_HASH I would mean “code name for our latest tech that can crunch whole audio file to few bytes later to compare that bytes to bytes crunched from the other file and tell you whether it’s the same sound file”.  PK stands for PlayKontrol – the brand name.

So, there are few steps to produce the PK_HASH from sound file:

  • decode and read the file – input is file on the disk, for example .mp3, .wma, .aac, and the output are PCM samples
  • from any kind of PCM samples (stereo, mono, 8-bit, 16-bit) we produce array of shorts that must be rewindable
  • hashing the data and producing the PK_HASH files

Decode and read the file

From the file on disk, that can be any file format that is streamable (we produce files with StreamSink – see the archive example here: That is .mp3, .aac, .wma, .ogg, and whatnot.

Currently it’s done by using simple component that uses DirectShow to create a graph for the audio file, renders that graph and attaches SampleGrabber filter to fetch the samples.  Component goes from file on the disk to whole PCM sample data into memory.  It’s feasible for 5 minute file (5 x 60 x 4 x 44100 = 50MB).  It can work even for 1h files.  However, you can *feel* that this approach IS wrong, especially when told that for the rest of the algorithm, you don’t have to have access to the whole PCM data at once.

Rewindable sample array

PCM Samples are promoted to 16 bit (if needed), and channels are downmixed to mono.  Again, that is done in memory, so for each PCM sample there are 2 bytes of data that are present in memory as a result of the operation.

Hashing and creating the file

Hashing process needs moving and overlapping window over sample array, and since we have everything in the memory, that is a piece of cake now.  We take the data, process it, write it into the another byte array.  Since it’s extremely dense now, I won’t cry about memory at this point, but yeah, it is written into the memory first, and then saved to file on disk.

So here I tried to explain how it works so far.  It goes from the encoded audio file to PCM sample data in memory, downmixes that data in memory to one PCM channel, processes the mono PCM samples to obtain PK_HASH and then write it to file.

So what do we actually need?

If you take a peek at the archive you’ll find that every folder has audio files, and also has .hash file for ever audio file that is present in the directory.  Please note that not every directory is processed, only 20 of those, because processing consumes CPU intensely, and I have only few PCs laying around to scrub the data.  Will improve in the future.  So, for crunching the archive, even POC (proof-of-concept) is OK, as it serve its needs.  It will go through the archive and leave processes PK_HASHes.

Process that goes in parallel is waiting for the PK_HASH file to be created, reads it, and does matching against the database.  However, next step should be taken, and it is REALTIME processing.

To be able to process in REALTIME, architecture goes somehow like this:

  • StreamSink is attached to the network stream of any kind, and provides PCM sample output
  • PCM sample output is downsampled and buffered
  • hashing process uses buffered mono PCM samples and outputs results into the stream
  • PK_HASH stream is again buffered and results processed with MATCHER process

StreamSink PCM decoding

StreamSink is the application that does internet media stream capture.  It can, however, thanks to feature request from DigitalSyphon, process every media stream and provide PCM samples for it in real-time, in a form of the Stream derived class.  So, what part of the process is covered completely.

Buffering PCM samples

Now, new component should be created – something that can buffer PCM samples from the Stream and provide floating, overlapping window reads for hashing process.  With some thinking I combined inner workings of Circular Buffer stereotype with something that can be used almost directly in the hasher process – by replacing class implementation only.

Processing and creating PK_HASHes

Hasher process was reading the buffered MEMORY quasi-stream.  However, it used kind-of simple interface to read the data, so my luck was that that interface could be extracted and implementation done with buffered stream data.  Also, output of the class should be rewritten, since it now doesn’t have any Interface-able part to replace.

And so on – later should be implemented from scratch, so there is no story about refactoring here.

Refactoring pyramid/tower

I can call it pyramid or tower, because after long time of procrastination (subconsciously processing the task at hand) I was able to put my hands on the keyboard and start.  My premise was that everything has to be checked from the ground up, because NOW I have the algorithm that produces desired results, and since there are many steps involved, an error in a single step could be untraceable if I don’t check every step along the way.

Tools used

I am kind of old fashioned, so this paragraph won’t be very long.  I use Visual Studio 2008, and for writing test code snippets I use nUnit as a launcher so I won’t have to have some form to run tests or a console app.

For dependency injection I tested nInject, and it is great, but in this case here, it can’t help me, so I’ll do implementation replacement “by hand’.

I’ll finish this post for now, as this is the current state of affair, and will keep you updated as the story develops, with new post and fresh insights…