Posts Tagged ‘playkontrol’

Setting up an automated ad monitoring service for TV

Friday, October 26th, 2012

So you want to set up your own automated advertisement monitoring for some TV channels?  And you probably have an idea how to sell the reports from the whole system?  Let me try to explain one of the possible ways of doing it.

Overview

Advertisement monitoring system isn’t so complicated, but it isn’t simple either.  You’ll need computers, people, and some kind of service to automatically track advertisements that are spotted once.

Recording

For starters, you have to be able to record all your needed TV channels.  Depending on the TV system used in your country, you’ll have several options for it.  From our shop, we can solve recording for analog tv, DVB-T, DVB-S, IPTV.  In any case, if you can get composite video signal from your set-top box, you will be able to record it with VideoPhill Recorder.

Storing and archiving

Recorded broadcast should go to some storage, depending on the number of days that you want your broadcast archive to be available.  To calculate how much storage space you will need for it, you can use this on-line calculator.

Clipping and tagging

So now we have recordings of the TV broadcast.  Next step is to form a team of people who will find and tag the first occurrence of an advertisement.  Number of people and workstations required for the job depends on many factors:

  • number of channels monitored
  • channel ‘difficulty’ (how easy is to find commercials on the channel)
  • number of shifts that people will do

In short, you’ll need some way of accessing the archive and clipping the portions of it in order to have clips of advertisements extracted and prepared for automated archive search.

One possible way of doing the job is by using VideoPhill Player application.  To see it in action, please see video below…

Automated search

Almost there…  Now, you have your archived broadcast, and you have your clip library.  To find all of the occurrences of all clips on all your channels, you’ll simply pass whole archive and clip library to a PlayKontrol Service and get your results.  Results can be in any format that you require, such as text, excel, PDF, XML, and so on.

Producing reports for your customers

Really final component of the system (apart from selling the reports) is a team of people who will use raw data that PlayKontrol will provide and produce nice reports for your customers.  People on this job should be able to understand the needs of the media buyers and planners, and generate the reports that would be useful for them.

Creating a small (hopefully usable) utility: File Deleter

Wednesday, January 11th, 2012

When you are in media monitoring, you have TONS of files.  For example, look at this:

Multitude of files, StreamSink archive

Another bunch of files, created by PlayKontrol

Every recorder and logger and input produces a number of files on your system.  Of course, each application such as VideoPhill Recorder or StreamSink have the option for deleting a files after they expire (one month for example), but what if you have other way of gathering information (media, metadata, something) that won’t go away by itself?  I have several such data sources, so I opted to create a MultiPurposeHighlyVersatile FileDeleter Application.  I’ll probably find a better name later, for now lets call it ‘deleter’ for short.

The Beginning

The application to delete files must be such a triviality, I can surely open Visual Studio and start to code immediately.  Well, not really.  In my head, that app will do every kind of deleting, so let’s not get hasty, and let’s do it by the numbers.

First, a short paragraph of text that will describe the vision, the problem that we try to solve with the app, in few simple words.  That is the root of our development, and we’ll revisit it several times during the course of the development.

Vision:

‘Deleter’  should able to free the hard drive of staled files (files older than some period) and keep the level of hard drive space at some pre-determined minimum.

Here, it’s simple enough that I can remember it, and I’ll be able to descend down from it and create the next step.

The Next Step

For me, the next step (let’s say in this particular case) would be to try and see what ‘features’ does the app have.  The only way it works for me is to create a mock of the application UI and write down the things that aren’t visible from the UI itself.  Since this UI won’t do anything but gather some kind of parameters that will define behavior of the app, it will be a simple one, and it will be possible to fit it nicely on one screen.

For the sketch I’ll use Visual Studio, because I’m most comfortable with it.  If it wasn’t my everyday tool, I’ll probably use some application such as MockupScreens, which is completely trivialized app sketching gadget with powerful analyst features.

The process of defining the UI and writing down requirements took some time, I repeatedly added something to UI, then to the list below, until I had much clearer picture what I’m actually trying to do.

Features:

  • it should have ability to delete only certain files (defined by ‘mask’ such as *.mp3)
  • it should be able to delete file by their age
  • it should be flexible in determining the AGE of the file:
    • various dates in file properties: created, modified, accessed
    • by parsing the file name of the file
  • it should be able to delete from a multiple directories
  • it should be able to either scan directory as a flat or dig into subdirectories
  • it should be able to delete files by criteria other than age
    • files should be deleted if their total size exceeds some defined size
      • in that case, other files should be taken into account, again by mask
    • files should be deleted if minimum free drive space is less then some defined size
    • file size
  • when deleting by criteria other than file age, specify which files should be first to go
  • should be able to support multiple parameter sets at one time
  • should run periodically in predetermined intervals
  • should be able to load and save profiles
    • profiles should have names
  • should disappear to tray when minimized
  • should have lowest possible process priority
And here’s the screen:

Mock of the Deleter UI used to define and refine the requirements

As you look at the UI mock, you’ll see some mnemonic tricks that I use to display various options, for example:

  • I filled the textboxes to provide even more context to the developer (myself with another cap, in this case)
  • I added vertical scrollbars even if text in multi-line textboxes isn’t overflowing, to suggest that there might be more entries
  • for multiple choice options I deliberately didn’t use combobox (pull down menu) – I used radio button to again provide visual clues to various options without need for interaction with the mock

From Here…

I’ll let it rest for now, and tomorrow I’ll try to see if I can further nail down the requirements for the app.  From there, when I get a good feeling that this is something I’m comfortable with, I’ll create a object interface  that will contain all the options from the screen above.  While doing that, I’ll probably update requirements and the UI itself, maybe even revisit The Mighty Vision above.

BTW, it took me about 2 hours to do both the article and the work.  I excluded my wandering around time, of course :)

How to create a fair and adequate service proposal?

Thursday, January 5th, 2012

Since I’m about to create a media monitoring offer for my first end-user client of such kind, and as this caught me totally unprepared,  I’m on a journey of discovery for prices that would be fair to them but adequate to the company.

Media monitoring here is in a very limited context – they only need advertisement verification service, and that is the only service I can provide at the moment anyway.

For this estimation, I’ll try to use their side of view, and try to provide some added value.  At the end of this post, I’ll summarize the thoughts presented within.

The case

The prospect is a retail store company that has stores all through the country, and they are advertising on all media globally, and of course they use radio.

For every radio advertisement they have to pay some amount, defined by the stations’ price list, subject to various discounts, and so on.  For the payment they always get the invoice, and for most radio stations they also get ‘proof-of-playback’ document.

Proof-of-playback is usually generated from the automation software playout logs and processed with system such as SpotKontrol.  They are accurate most of the time, but sometimes, there are some discrepancies due to operator error or some other intricacy that’s going on.

Everything in the process is being done in a good will, but sometimes advertisements aren’t played and they are shown in the proof-of-playback document, and sometimes it’s the other way around.  Each playback is charged for some amount, and if the proof is incorrect, one party or the other is losing money.  It isn’t great situation for both of them.

So the idea would be to provide a service that could verify the document that is provided by the media by obtaining real and referent information on the playback of the advertisements.

Calculation

Some math and the abstract thinking would be required to read this section.  If you don’t mind reading it, just skip to the end where the results are shown.

In my estimation process I’ll always try to bound the numbers so they will show one extreme side of the possible cost range, and by doing so will come up with a cost that is always AT LEAST that amount.  For example, if there are 10 radio stations in question, and they have various cost of advertisement per second, I’ll use smallest number of them.

First estimation is that such a retail store will have AT LEAST 2 advertisements a day on AT LEAST 10 global radio channels.  We will also say that we will advertise only at workdays, so that gives us AT LEAST 20 days per month, giving us 2 * 10 * 20 = 400 advertisement playbacks.  That is the lowest bound, try to remember that, and we used only 10 global radio stations – most advertisers will go into local advertising as well.

Now, let’s try to estimate how much will each advertisement playback cost.  For that, we’ll use 5 prominnent radio stations and see their price lists.  We will also say that the advertisement in question will be AT LEAST 30″ in duration.  Radio stations:

The price for 30″ advertisement playback for those stations are: 360, 660, 110, 520, 400.  I would recommend Radio Istra to lift their prices of advertising up, and because of them I’ll go with next lowest price to be our estimate here: 360 kn.

From before, we had 400 advertisement playbacks per month, at a rate of 360 kn that amounts to 144.000 kn.  Since we promised we’ll use LOWEST bound, and some would be able to argue that there are various discounts that companies such as this can obtain, let’s say that the maximum amount of discount is 50%, and that will bring our cost down to half, and that is: 72.000 kn spent on advertising, each month.  In reality it is really a different number, but let’s go with this estimate here.

Now we know what are we insuring.  Let’s try to see what would be the cost of manually protecting that investment.

Let’s suppose that we have in place:

  • equipment to record and store 10 radio stations worth of broadcasting material (StreamSink for example)
  • means of reviewing (audibly) the archive of the broadcast material
  • a person that is trained to do all that.

I have the information that such person would cost about $12 in USA and about $4 to $6 in the cheap-labor countries.  Let’s say that our guy will cost $8 = about 50 kn per hour.

From my experience, and by using VideoPhill Player to access the archive, confirming 20 advertisements playback would last about one hour if we do have a proof-of-playback document, and at least 2 hours if we don’t, since whole block of advertisements would have to be under scrutiny.  Also, here we assume that our operator is HIGHLY familiar with scheduling practices of each radio station, and won’t stray too much while searching for the advertisement blocks.

So with all the equipment, and trained staff, it seems that cost of verification for that kind of volume is from 1000 kn to 2000 kn per month.  If we use average number here, and see what’s the ratio of the analyst cost per investment that he protects, we come up with 1:48 or 2%.


Result: we can say that cost of verification for advertisements is about 2% of the cost of the advertising.

It would be even higher (in percentage) if we were considering other media that has lower cost of advertising, since operator would have to scan (at same cost) the material that is paid less.

Conclusion

Since I’m not here to promote the service that isn’t needed by someone, I’ll only try to provide fair price for it for someone that recognizes the need for it.  To do that, I’ll go with 50% discount on my already low estimation, and will try to see if the company can sustain that service at that fee.

That being said, the conclusion is that

to provide a list of played commercials we’ll charge 1% of estimated monthly advertisement cost for that channel(s).

We’ll start from there, and see where it takes us. :)

Why the LinkedIN is so great!

Saturday, December 31st, 2011

Happy New Year to everyone.  I just want to share a joyous event with you, I won’t comment it at all, but just hang the pictures there for you…

 

LinkedIN post on PlayKontrol

Reaction to the post

Another posting, now on StreamSink

And again, interesting reaction...

OK, but WHO is Mr Anant actually?

And then, I got in… (story of data visualization)

Thursday, December 15th, 2011

How to see the data?

If the data is numeric, and it represents some series, it will be mostly represented with a graph of some sort.  There are hundredths types of graphs available, and they all have some purpose, otherwise they would not exist.

However, for some special occasions, you have to see different kind of data.

The problem (this particular instance)

Since I am developing a internet media streaming CAPTURE and ARCHIVE application (StreamSink) I am also continuously testing it on one of my servers.  I am adding channels, removing them, stopping the server, sometimes something goes wrong and the whole thing freezes or crashes, so the archive I have is rather heterogeneous in quality.

Let me go through the operational view – the mere GUI of the StreamSink, so I can present some problems and solutions so far.

StreamSink

Several things were important to the operator of the software that had to be present on the main (status) screen.  For example:

  • whole list of channels should be visible
  • channel status should be visible at first glance
  • I am interested what happened to the system recently
  • I need to know the status of my connection
  • it would be good to know how many disk space is available

I could dwell on it but the main point of this post is something else.

The problem here is that I had to create PlayKontrol report for a demonstration purpose (for them: http://ihg.hr/), that would scan 7 days of the archive (multiple channels, of course), and produce the reports (playlists) for 300 songs.

So the problem is: to

find, in the archive that is damaged in various ways, 7 days of continuous archive that spans multiple channels.

The solution (prelude)

Since I am kind of explorer by nature, I wasn’t inclined to use a solution that would present raw data as an answer, but was into thinking about seeing the data and determining the period and channels ‘visually’.

StreamSink has a integrated feature that is called ‘archive report’, that has data similar to what I need, but with it I would only get limited information.  You can see the report here:

StreamSink Archive Report

Most useful info on the report in this particular situation would be the graph on the right side of the report.  Let me explain…

For each day StreamSink is able to record up to 24 hours of media.  Due to network situations, it sometimes is less then 24 hours, and I decided that I would present that number in the form of percentage that archive is covered for the day.  As you can see from the report, that percentage is shown for the whole archive lifetime, for last month, last week and last 24 hours.

Also, it is shown in the form of graph, where on the leftmost part of the graph is the current day, and as we go to the right, we sink onto the past, having divider lines at each 7 days.  Nice, eh? :)

But, as nice as that report is, I can’t read what 7 days and what channels are to be scanned – I have to find another way in.

 Solution (at last)

For this one, I picked something that I learned from the above mentioned report.  That was:

  • I will have a channel list
  • I will have some sort of calendar
  • I have to see how much is covered for the archive for each day

Also I decided to show each day as a cell in a table-style matrix, where rows would be occupied by channels, and columns will be days.  Time flow was inverted here, so left is past, and right is the present.

Whole thing looks like this:

Archive Digger

Same thing little zoomed in:

Archive Digger Detail

Note: green is the color for the days that have 90% or more archive covered.

At last, you can see from the both pictures that much of the data is revealed at the first glance. For example, 0 means that there were no archive that day at all. Numbers below 90 suggest that either it was some problem with the channel that day, or StreamSink was either started or stopped in the middle of the day.

I could even color-code that information on the chart – but the utility will be expanded further only if there’ll be demand for it, since I know what I needed to know, from it.

BTW, I don’t want to brag here, but to code that utility it took 2-3 hours of thinking and coding, and almost no debugging.  It’s most probably due to fact that I’m doing that stuff over and over again for some years :)

Treatment of repeating content

Monday, November 28th, 2011

In media monitoring systems and environments, we often have to identity and COUNT the occurrences of some playback event.  Most common examples of such are when you have to monitor all the occurrences of the same commercial audio spot.

Multiple parties are interested in tracking audio spots:

  • broadcasters
  • advertisers (clients)
  • agencies (clients representative)
  • government regulators

Let me briefly cover what do they need to know about playback of the commercial audio spots.

Broadcasters

They need proof that they played something at a certain time, to show it to the client and be able to issue invoices for services provided.

Advertisers and agencies

They both need to have a proof that something was played – their own commercials, at certain times, and by correct amount.  However, they might also need to be able to look into other brands so they can track their competitors.

Government regulators

They usually want to know if the proposals or laws requirements on the broadcast media is met.  Such requirements are for example to have no more then 2 minutes of advertisements per hour, or to have commercial blocks clearly separated from the rest of the program by special markers called ‘jingles’ or ‘breaks’.

Let’s get back to..

The problem

Usual workflow for the above is to fill the matching technology with a samples that you want to track, and the technology will give you the locations in the timeline for the samples provided.  That is one thing that PlayKontrol can do for you.  But, what if you don’t have the samples, and still want to discover them?

The rescue

Traditional way would be to go through the known parts of the program, mark them, clip them out and have audio spotter search for all of the occurrences.  With that method, and with lot of clipping, you’ll have some accuracy, and some clips will miss you attention because they aren’t in their place, for example commercial is out of its commercial block.

Other way is to do it with PlayKontrol SelfMatching technology.  It works in a way that whole day of archive is given to the PK, and the result of the process is a list that contains ALL of the matches for the given day.

So every repeating audio clip, no mater how small, will be listed here.  From there, your analyst only task would be to:

  • browse through the clips,
  • listen to them,
  • maybe fine-clip them,
  • tag them and
  • put them into the repository.

I have created a picture containing the results of the process in ‘visual representation’.  Here it is:

Please note the following:

  • both X and Y axes represent time
  • grid divides time in one-hour interval
  • grayed areas are time intervals 00-06 and 18-24 (say night time)
  • size of the points represent length of the clip that is matched

Try to figure out the rest for yourself.  Hint: large blobs are possibly repeating songs.

Refactoring, it’s fun – part 2

Tuesday, November 22nd, 2011

In the first part I tried to set the scene and give you some background of my problem. I’ll try to continue now, and create an interesting and at the same time informative story about performing the refactoring in question.

Most curious of all is that you don’t have to prepare for refactoring. It can happen no matter what, and the price usually isn’t so high after all. Let me remind you – main tool of component design that is dependency injection wasn’t used. Yes, classes do have their responsibilities cleverly defined, and it helps a lot here, because if it weren’t so – whole deal would have to start few steps ‘before’.

I’m not an experienced writer, so I don’t know if I will get the point across, but to me refactoring this was like building level of level of scaffolding, just to use one level to test the next, and at the same time creating scaffolding so it would be used later in a production environment! I guess that there is a name for it, it has to be :)

Step 1:

Creating a duplicate of the main working class, and see how to force the rest of the application to use it when needed.

Say that class name is PCMHasher, and it has following structure (method declarations only):

class PCM_Hash
{
    public bool Initialize(PCMBufferedReader sourceDataReader);
    public uint GetNextHash();
}

My goal was to create alternative class to this. I needed the old class to be able to have some reference to get the results from.

So I created class PCMHash_2.  That was my first decision – to create same class as before and try to get same results from it, replacing its guts one step at the time.

Using replaced version wasn’t easy, so I took an interface out and derived from it, and created something like:

IPCM_Hash rph;
if (!_refactoring)
    rph = new PCMHash();
else
    rph = new PCMHash_R2();

At this time I would like to re-state the fact that I am in the production all the time, and have to decide on my feet, having to weight out all the implications that would arise from it.  I am telling that because everyone could see that some dependency injection was to be used here. But, apart from having to spend much time installing it through the code, I’m not sure how and if it would work anyway.

Why: at a testing time, I want both classes to be able to function side by side.  Concretely, if I have:

class FileHasher
{
    // will use IPCM_Hash implementation somewhere...
}

I’ll need:

FileHasher usesOriginal=new FileHasher();
FileHasher usesRefactored=new FileHasher();
usesRefactored.Refactored=true;
//  compare results
object resultFromOriginal=usesOriginal.GetResult();
object resultFromRefactored=usesRefactored.GetResult();
Assert.AreEqual( resultFromOriginal, resultFromRefactored);

Here, the GetResult() method could create new IPCMHash several times, and I am not sure if for example nInject would be able to handle such a thing.

Also, I was toying with an idea to use class factories manually created or send a type of the implementation class in the constructor of the consumer, but that options also faded.

Anyway, there is original class, and there is new refactored class ready to be taken apart and put together again in some different order, preserving functionality that was here before.

Next steps to be added in next editions of the refactoring talks…

Refactoring, it’s fun – part 1

Tuesday, November 22nd, 2011

It’s a story of refactoring when code that should be refactored isn’t prepared for it a single bit.  If I say prepared I guess that it would mean to have test cases, dependency injection code, and so on.  However, I have none of the above in the original code, just the code that works.

Let me explain what I have, what it does, and where it should end.  The purpose of this here refactoring session isn’t about having better performance and keep the same functionality – it is to have same algorithm, already proven in various tests work on different data.

Before (a.k.a. now):

Component creates PK_HASH from a sound file.  By PK_HASH I would mean “code name for our latest tech that can crunch whole audio file to few bytes later to compare that bytes to bytes crunched from the other file and tell you whether it’s the same sound file”.  PK stands for PlayKontrol – the brand name.

So, there are few steps to produce the PK_HASH from sound file:

  • decode and read the file – input is file on the disk, for example .mp3, .wma, .aac, and the output are PCM samples
  • from any kind of PCM samples (stereo, mono, 8-bit, 16-bit) we produce array of shorts that must be rewindable
  • hashing the data and producing the PK_HASH files

Decode and read the file

From the file on disk, that can be any file format that is streamable (we produce files with StreamSink – see the archive example here: http://access.streamsink.com/archive/). That is .mp3, .aac, .wma, .ogg, and whatnot.

Currently it’s done by using simple component that uses DirectShow to create a graph for the audio file, renders that graph and attaches SampleGrabber filter to fetch the samples.  Component goes from file on the disk to whole PCM sample data into memory.  It’s feasible for 5 minute file (5 x 60 x 4 x 44100 = 50MB).  It can work even for 1h files.  However, you can *feel* that this approach IS wrong, especially when told that for the rest of the algorithm, you don’t have to have access to the whole PCM data at once.

Rewindable sample array

PCM Samples are promoted to 16 bit (if needed), and channels are downmixed to mono.  Again, that is done in memory, so for each PCM sample there are 2 bytes of data that are present in memory as a result of the operation.

Hashing and creating the file

Hashing process needs moving and overlapping window over sample array, and since we have everything in the memory, that is a piece of cake now.  We take the data, process it, write it into the another byte array.  Since it’s extremely dense now, I won’t cry about memory at this point, but yeah, it is written into the memory first, and then saved to file on disk.

So here I tried to explain how it works so far.  It goes from the encoded audio file to PCM sample data in memory, downmixes that data in memory to one PCM channel, processes the mono PCM samples to obtain PK_HASH and then write it to file.

So what do we actually need?

If you take a peek at the archive you’ll find that every folder has audio files, and also has .hash file for ever audio file that is present in the directory.  Please note that not every directory is processed, only 20 of those, because processing consumes CPU intensely, and I have only few PCs laying around to scrub the data.  Will improve in the future.  So, for crunching the archive, even POC (proof-of-concept) is OK, as it serve its needs.  It will go through the archive and leave processes PK_HASHes.

Process that goes in parallel is waiting for the PK_HASH file to be created, reads it, and does matching against the database.  However, next step should be taken, and it is REALTIME processing.

To be able to process in REALTIME, architecture goes somehow like this:

  • StreamSink is attached to the network stream of any kind, and provides PCM sample output
  • PCM sample output is downsampled and buffered
  • hashing process uses buffered mono PCM samples and outputs results into the stream
  • PK_HASH stream is again buffered and results processed with MATCHER process

StreamSink PCM decoding

StreamSink is the application that does internet media stream capture.  It can, however, thanks to feature request from DigitalSyphon, process every media stream and provide PCM samples for it in real-time, in a form of the Stream derived class.  So, what part of the process is covered completely.

Buffering PCM samples

Now, new component should be created – something that can buffer PCM samples from the Stream and provide floating, overlapping window reads for hashing process.  With some thinking I combined inner workings of Circular Buffer stereotype with something that can be used almost directly in the hasher process – by replacing class implementation only.

Processing and creating PK_HASHes

Hasher process was reading the buffered MEMORY quasi-stream.  However, it used kind-of simple interface to read the data, so my luck was that that interface could be extracted and implementation done with buffered stream data.  Also, output of the class should be rewritten, since it now doesn’t have any Interface-able part to replace.

And so on – later should be implemented from scratch, so there is no story about refactoring here.

Refactoring pyramid/tower

I can call it pyramid or tower, because after long time of procrastination (subconsciously processing the task at hand) I was able to put my hands on the keyboard and start.  My premise was that everything has to be checked from the ground up, because NOW I have the algorithm that produces desired results, and since there are many steps involved, an error in a single step could be untraceable if I don’t check every step along the way.

Tools used

I am kind of old fashioned, so this paragraph won’t be very long.  I use Visual Studio 2008, and for writing test code snippets I use nUnit as a launcher so I won’t have to have some form to run tests or a console app.

For dependency injection I tested nInject, and it is great, but in this case here, it can’t help me, so I’ll do implementation replacement “by hand’.

I’ll finish this post for now, as this is the current state of affair, and will keep you updated as the story develops, with new post and fresh insights…