Posts Tagged ‘reports’

And then, I got in… (story of data visualization)

Thursday, December 15th, 2011

How to see the data?

If the data is numeric, and it represents some series, it will be mostly represented with a graph of some sort.  There are hundredths types of graphs available, and they all have some purpose, otherwise they would not exist.

However, for some special occasions, you have to see different kind of data.

The problem (this particular instance)

Since I am developing a internet media streaming CAPTURE and ARCHIVE application (StreamSink) I am also continuously testing it on one of my servers.  I am adding channels, removing them, stopping the server, sometimes something goes wrong and the whole thing freezes or crashes, so the archive I have is rather heterogeneous in quality.

Let me go through the operational view – the mere GUI of the StreamSink, so I can present some problems and solutions so far.

StreamSink

Several things were important to the operator of the software that had to be present on the main (status) screen.  For example:

  • whole list of channels should be visible
  • channel status should be visible at first glance
  • I am interested what happened to the system recently
  • I need to know the status of my connection
  • it would be good to know how many disk space is available

I could dwell on it but the main point of this post is something else.

The problem here is that I had to create PlayKontrol report for a demonstration purpose (for them: http://ihg.hr/), that would scan 7 days of the archive (multiple channels, of course), and produce the reports (playlists) for 300 songs.

So the problem is: to

find, in the archive that is damaged in various ways, 7 days of continuous archive that spans multiple channels.

The solution (prelude)

Since I am kind of explorer by nature, I wasn’t inclined to use a solution that would present raw data as an answer, but was into thinking about seeing the data and determining the period and channels ‘visually’.

StreamSink has a integrated feature that is called ‘archive report’, that has data similar to what I need, but with it I would only get limited information.  You can see the report here:

StreamSink Archive Report

Most useful info on the report in this particular situation would be the graph on the right side of the report.  Let me explain…

For each day StreamSink is able to record up to 24 hours of media.  Due to network situations, it sometimes is less then 24 hours, and I decided that I would present that number in the form of percentage that archive is covered for the day.  As you can see from the report, that percentage is shown for the whole archive lifetime, for last month, last week and last 24 hours.

Also, it is shown in the form of graph, where on the leftmost part of the graph is the current day, and as we go to the right, we sink onto the past, having divider lines at each 7 days.  Nice, eh? :)

But, as nice as that report is, I can’t read what 7 days and what channels are to be scanned – I have to find another way in.

 Solution (at last)

For this one, I picked something that I learned from the above mentioned report.  That was:

  • I will have a channel list
  • I will have some sort of calendar
  • I have to see how much is covered for the archive for each day

Also I decided to show each day as a cell in a table-style matrix, where rows would be occupied by channels, and columns will be days.  Time flow was inverted here, so left is past, and right is the present.

Whole thing looks like this:

Archive Digger

Same thing little zoomed in:

Archive Digger Detail

Note: green is the color for the days that have 90% or more archive covered.

At last, you can see from the both pictures that much of the data is revealed at the first glance. For example, 0 means that there were no archive that day at all. Numbers below 90 suggest that either it was some problem with the channel that day, or StreamSink was either started or stopped in the middle of the day.

I could even color-code that information on the chart – but the utility will be expanded further only if there’ll be demand for it, since I know what I needed to know, from it.

BTW, I don’t want to brag here, but to code that utility it took 2-3 hours of thinking and coding, and almost no debugging.  It’s most probably due to fact that I’m doing that stuff over and over again for some years :)

Treatment of repeating content

Monday, November 28th, 2011

In media monitoring systems and environments, we often have to identity and COUNT the occurrences of some playback event.  Most common examples of such are when you have to monitor all the occurrences of the same commercial audio spot.

Multiple parties are interested in tracking audio spots:

  • broadcasters
  • advertisers (clients)
  • agencies (clients representative)
  • government regulators

Let me briefly cover what do they need to know about playback of the commercial audio spots.

Broadcasters

They need proof that they played something at a certain time, to show it to the client and be able to issue invoices for services provided.

Advertisers and agencies

They both need to have a proof that something was played – their own commercials, at certain times, and by correct amount.  However, they might also need to be able to look into other brands so they can track their competitors.

Government regulators

They usually want to know if the proposals or laws requirements on the broadcast media is met.  Such requirements are for example to have no more then 2 minutes of advertisements per hour, or to have commercial blocks clearly separated from the rest of the program by special markers called ‘jingles’ or ‘breaks’.

Let’s get back to..

The problem

Usual workflow for the above is to fill the matching technology with a samples that you want to track, and the technology will give you the locations in the timeline for the samples provided.  That is one thing that PlayKontrol can do for you.  But, what if you don’t have the samples, and still want to discover them?

The rescue

Traditional way would be to go through the known parts of the program, mark them, clip them out and have audio spotter search for all of the occurrences.  With that method, and with lot of clipping, you’ll have some accuracy, and some clips will miss you attention because they aren’t in their place, for example commercial is out of its commercial block.

Other way is to do it with PlayKontrol SelfMatching technology.  It works in a way that whole day of archive is given to the PK, and the result of the process is a list that contains ALL of the matches for the given day.

So every repeating audio clip, no mater how small, will be listed here.  From there, your analyst only task would be to:

  • browse through the clips,
  • listen to them,
  • maybe fine-clip them,
  • tag them and
  • put them into the repository.

I have created a picture containing the results of the process in ‘visual representation’.  Here it is:

Please note the following:

  • both X and Y axes represent time
  • grid divides time in one-hour interval
  • grayed areas are time intervals 00-06 and 18-24 (say night time)
  • size of the points represent length of the clip that is matched

Try to figure out the rest for yourself.  Hint: large blobs are possibly repeating songs.