Finally – an Osprey Alternative

October 1st, 2014

For years, video capture, at least for media monitoring companies, was dependent on Osprey capture cards.  They are the best there are in the field, and once you try it, you don’t look for anything else anywhere else.  You just pay the price and are satisfied with it.  The card has excellent drivers with tons of options, SimulStream as an (paying) option, …  real real beauty.

However, as we said above, it is pricey.  For Osprey 460e, you need to hand out about $1200 USD.  That’s $300 per channel.

Now, click here:

http://www.vd-shop.de/simultaneously-capture-d130fpsinput-interface-rcabnc-inter-p-591.html

YES!  6 channels for 320 Euro ($400 USD).  I won’t calculate per channel price here, since it is already obvious that Osprey is beaten, at least as price is concerned.

In fact, lets see, on a setup of say 24 channels, how much do you save using new cards:

Osprey: 6 cards, $7200
VCAE: 4 cards, $1600

So only on capture hardware, you could save $5600.  Add to that lower cost of hardware (servers) since you can pack everything in lesser amount of PCs.

So, to follow up on the excitement of finding that this card exists, I immediately ordered a sample to try it with our capture software.  It came in few days, and we went on and installed it…

And the story has to end here, since card works as it should out of the box, enabling media monitoring installations to be even cheaper now.  Not only that, a card has an interesting form factor and low consumption, and will prove ideal in multiple channel scenarios.

I will update the article with 24/7 testing in real world, as soon as we make an installation that has such properties.

Setting up an automated ad monitoring service for TV

October 26th, 2012

So you want to set up your own automated advertisement monitoring for some TV channels?  And you probably have an idea how to sell the reports from the whole system?  Let me try to explain one of the possible ways of doing it.

Overview

Advertisement monitoring system isn’t so complicated, but it isn’t simple either.  You’ll need computers, people, and some kind of service to automatically track advertisements that are spotted once.

Recording

For starters, you have to be able to record all your needed TV channels.  Depending on the TV system used in your country, you’ll have several options for it.  From our shop, we can solve recording for analog tv, DVB-T, DVB-S, IPTV.  In any case, if you can get composite video signal from your set-top box, you will be able to record it with VideoPhill Recorder.

Storing and archiving

Recorded broadcast should go to some storage, depending on the number of days that you want your broadcast archive to be available.  To calculate how much storage space you will need for it, you can use this on-line calculator.

Clipping and tagging

So now we have recordings of the TV broadcast.  Next step is to form a team of people who will find and tag the first occurrence of an advertisement.  Number of people and workstations required for the job depends on many factors:

  • number of channels monitored
  • channel ‘difficulty’ (how easy is to find commercials on the channel)
  • number of shifts that people will do

In short, you’ll need some way of accessing the archive and clipping the portions of it in order to have clips of advertisements extracted and prepared for automated archive search.

One possible way of doing the job is by using VideoPhill Player application.  To see it in action, please see video below…

Automated search

Almost there…  Now, you have your archived broadcast, and you have your clip library.  To find all of the occurrences of all clips on all your channels, you’ll simply pass whole archive and clip library to a PlayKontrol Service and get your results.  Results can be in any format that you require, such as text, excel, PDF, XML, and so on.

Producing reports for your customers

Really final component of the system (apart from selling the reports) is a team of people who will use raw data that PlayKontrol will provide and produce nice reports for your customers.  People on this job should be able to understand the needs of the media buyers and planners, and generate the reports that would be useful for them.

Creating a small (hopefully usable) utility: File Deleter

January 11th, 2012

When you are in media monitoring, you have TONS of files.  For example, look at this:

Multitude of files, StreamSink archive

Another bunch of files, created by PlayKontrol

Every recorder and logger and input produces a number of files on your system.  Of course, each application such as VideoPhill Recorder or StreamSink have the option for deleting a files after they expire (one month for example), but what if you have other way of gathering information (media, metadata, something) that won’t go away by itself?  I have several such data sources, so I opted to create a MultiPurposeHighlyVersatile FileDeleter Application.  I’ll probably find a better name later, for now lets call it ‘deleter’ for short.

The Beginning

The application to delete files must be such a triviality, I can surely open Visual Studio and start to code immediately.  Well, not really.  In my head, that app will do every kind of deleting, so let’s not get hasty, and let’s do it by the numbers.

First, a short paragraph of text that will describe the vision, the problem that we try to solve with the app, in few simple words.  That is the root of our development, and we’ll revisit it several times during the course of the development.

Vision:

‘Deleter’  should able to free the hard drive of staled files (files older than some period) and keep the level of hard drive space at some pre-determined minimum.

Here, it’s simple enough that I can remember it, and I’ll be able to descend down from it and create the next step.

The Next Step

For me, the next step (let’s say in this particular case) would be to try and see what ‘features’ does the app have.  The only way it works for me is to create a mock of the application UI and write down the things that aren’t visible from the UI itself.  Since this UI won’t do anything but gather some kind of parameters that will define behavior of the app, it will be a simple one, and it will be possible to fit it nicely on one screen.

For the sketch I’ll use Visual Studio, because I’m most comfortable with it.  If it wasn’t my everyday tool, I’ll probably use some application such as MockupScreens, which is completely trivialized app sketching gadget with powerful analyst features.

The process of defining the UI and writing down requirements took some time, I repeatedly added something to UI, then to the list below, until I had much clearer picture what I’m actually trying to do.

Features:

  • it should have ability to delete only certain files (defined by ‘mask’ such as *.mp3)
  • it should be able to delete file by their age
  • it should be flexible in determining the AGE of the file:
    • various dates in file properties: created, modified, accessed
    • by parsing the file name of the file
  • it should be able to delete from a multiple directories
  • it should be able to either scan directory as a flat or dig into subdirectories
  • it should be able to delete files by criteria other than age
    • files should be deleted if their total size exceeds some defined size
      • in that case, other files should be taken into account, again by mask
    • files should be deleted if minimum free drive space is less then some defined size
    • file size
  • when deleting by criteria other than file age, specify which files should be first to go
  • should be able to support multiple parameter sets at one time
  • should run periodically in predetermined intervals
  • should be able to load and save profiles
    • profiles should have names
  • should disappear to tray when minimized
  • should have lowest possible process priority
And here’s the screen:

Mock of the Deleter UI used to define and refine the requirements

As you look at the UI mock, you’ll see some mnemonic tricks that I use to display various options, for example:

  • I filled the textboxes to provide even more context to the developer (myself with another cap, in this case)
  • I added vertical scrollbars even if text in multi-line textboxes isn’t overflowing, to suggest that there might be more entries
  • for multiple choice options I deliberately didn’t use combobox (pull down menu) – I used radio button to again provide visual clues to various options without need for interaction with the mock

From Here…

I’ll let it rest for now, and tomorrow I’ll try to see if I can further nail down the requirements for the app.  From there, when I get a good feeling that this is something I’m comfortable with, I’ll create a object interface  that will contain all the options from the screen above.  While doing that, I’ll probably update requirements and the UI itself, maybe even revisit The Mighty Vision above.

BTW, it took me about 2 hours to do both the article and the work.  I excluded my wandering around time, of course :)

How to create a fair and adequate service proposal?

January 5th, 2012

Since I’m about to create a media monitoring offer for my first end-user client of such kind, and as this caught me totally unprepared,  I’m on a journey of discovery for prices that would be fair to them but adequate to the company.

Media monitoring here is in a very limited context – they only need advertisement verification service, and that is the only service I can provide at the moment anyway.

For this estimation, I’ll try to use their side of view, and try to provide some added value.  At the end of this post, I’ll summarize the thoughts presented within.

The case

The prospect is a retail store company that has stores all through the country, and they are advertising on all media globally, and of course they use radio.

For every radio advertisement they have to pay some amount, defined by the stations’ price list, subject to various discounts, and so on.  For the payment they always get the invoice, and for most radio stations they also get ‘proof-of-playback’ document.

Proof-of-playback is usually generated from the automation software playout logs and processed with system such as SpotKontrol.  They are accurate most of the time, but sometimes, there are some discrepancies due to operator error or some other intricacy that’s going on.

Everything in the process is being done in a good will, but sometimes advertisements aren’t played and they are shown in the proof-of-playback document, and sometimes it’s the other way around.  Each playback is charged for some amount, and if the proof is incorrect, one party or the other is losing money.  It isn’t great situation for both of them.

So the idea would be to provide a service that could verify the document that is provided by the media by obtaining real and referent information on the playback of the advertisements.

Calculation

Some math and the abstract thinking would be required to read this section.  If you don’t mind reading it, just skip to the end where the results are shown.

In my estimation process I’ll always try to bound the numbers so they will show one extreme side of the possible cost range, and by doing so will come up with a cost that is always AT LEAST that amount.  For example, if there are 10 radio stations in question, and they have various cost of advertisement per second, I’ll use smallest number of them.

First estimation is that such a retail store will have AT LEAST 2 advertisements a day on AT LEAST 10 global radio channels.  We will also say that we will advertise only at workdays, so that gives us AT LEAST 20 days per month, giving us 2 * 10 * 20 = 400 advertisement playbacks.  That is the lowest bound, try to remember that, and we used only 10 global radio stations – most advertisers will go into local advertising as well.

Now, let’s try to estimate how much will each advertisement playback cost.  For that, we’ll use 5 prominnent radio stations and see their price lists.  We will also say that the advertisement in question will be AT LEAST 30″ in duration.  Radio stations:

The price for 30″ advertisement playback for those stations are: 360, 660, 110, 520, 400.  I would recommend Radio Istra to lift their prices of advertising up, and because of them I’ll go with next lowest price to be our estimate here: 360 kn.

From before, we had 400 advertisement playbacks per month, at a rate of 360 kn that amounts to 144.000 kn.  Since we promised we’ll use LOWEST bound, and some would be able to argue that there are various discounts that companies such as this can obtain, let’s say that the maximum amount of discount is 50%, and that will bring our cost down to half, and that is: 72.000 kn spent on advertising, each month.  In reality it is really a different number, but let’s go with this estimate here.

Now we know what are we insuring.  Let’s try to see what would be the cost of manually protecting that investment.

Let’s suppose that we have in place:

  • equipment to record and store 10 radio stations worth of broadcasting material (StreamSink for example)
  • means of reviewing (audibly) the archive of the broadcast material
  • a person that is trained to do all that.

I have the information that such person would cost about $12 in USA and about $4 to $6 in the cheap-labor countries.  Let’s say that our guy will cost $8 = about 50 kn per hour.

From my experience, and by using VideoPhill Player to access the archive, confirming 20 advertisements playback would last about one hour if we do have a proof-of-playback document, and at least 2 hours if we don’t, since whole block of advertisements would have to be under scrutiny.  Also, here we assume that our operator is HIGHLY familiar with scheduling practices of each radio station, and won’t stray too much while searching for the advertisement blocks.

So with all the equipment, and trained staff, it seems that cost of verification for that kind of volume is from 1000 kn to 2000 kn per month.  If we use average number here, and see what’s the ratio of the analyst cost per investment that he protects, we come up with 1:48 or 2%.


Result: we can say that cost of verification for advertisements is about 2% of the cost of the advertising.

It would be even higher (in percentage) if we were considering other media that has lower cost of advertising, since operator would have to scan (at same cost) the material that is paid less.

Conclusion

Since I’m not here to promote the service that isn’t needed by someone, I’ll only try to provide fair price for it for someone that recognizes the need for it.  To do that, I’ll go with 50% discount on my already low estimation, and will try to see if the company can sustain that service at that fee.

That being said, the conclusion is that

to provide a list of played commercials we’ll charge 1% of estimated monthly advertisement cost for that channel(s).

We’ll start from there, and see where it takes us. :)

Testing 3rd party stream capture application

January 2nd, 2012

This is a response to a question from one of my prospects, and it can be summarized as:

Why should I buy StreamSink at $10.000
when there is Replay A/V that can do same
thing for $100 (if I buy 2 licences for 
2 computers)?

I can make several objections to the idea of having a consumer product in use for business purpose, but instead of that, I’ll try to focus on functionality (at least for this posting).

Purchasing and installing

I quickly purchased Replay A/V for $50, and went on to installing it.  Upon installation, it offered to install WinPopcap (to provide stream discovery) and some other utility for conversion of the saved material.  I declined.

Entering stations

Once installed, I will try to copy my stream list into it and have it record it continuously.

After some investigation, I found out that there is no way to insert the list of the stations at once, so I’m going to enter them one by one.

OK, I entered Antena Zagreb with its stream URL, and went on to fiding the start button for it.  I found it under context menu for the item that was on the list (right click, start-recording, …).

I remembered that I went through the options for a channel and found that you have to explicitly have to enter the option for splitting the file into segments, so I went on and did that.

I’ll leave it run now and will move on to enter the rest of the stations.

I was about 1/4 way down the list, then I got to WMA stream, and was really curious whether it will be accepted, since there is nowhere a option to pick a stream type.  It was, and for now, it seems that it’s captured normally.

When I am entering the data into the software, and it does its file splitting at every 5 minute intervals, whole GUI freezes and becomes unresponsive for 2-3 seconds.  What I am interested in is whether there will be a gap in the recording of the station that is cut.  BTW, the computer I am doing the analysis at isn’t so weak…

Also, it seems that I entered a stream that doesn’t exists.  Application is persistent in trying to connect to it, but while doing so, it freezes again for few seconds.  However, it’s nothing to be alarmed about.

I also found out that in order for the app to be persistent about recurrent connecting, it has to be additionally configured, as it is not the default mode of the operation.

OK, so I finally entered all the stations.  It gets rather annoying after few minutes, because on the 5 minute chunk interval, app gets its freezing moments rather frequently, and despite the fact it doesn’t pose a problem AFTER everything is entered, it really is annoying.  Here is the filled up application:

Testing the recorded stuff

To do that, I will first share the folder with recordings so I would be able to see it from another (this) machine.

As expected, every channel is saved in its appropriate folder:

Now, let’s examine the contents of some folders that are recorded here…

First folder I have is Antena Zagreb, and here it is:

I won’t comment file naming now, but will tell you what happened when I double-clicked .m3u file that should have the list of mp3 files that are recorder. Winamp loaded it and CRASHED my machine completely. I don’t say it will crash yours, but my Winamp, when faced with certain media files that it can’t recognize, goes berserk. The problem here lies in the fact that Antena Zagreb has AACPLUS stream, and it was interpreted erroneously, creating mp3 files that crashed the Winamp. Here is one file for you to try, use it on your own risk.

Antena Zagreb Jan 02_05

Media Player crashed as well, but I could END it, with Winamp I had to restart the whole machine.

Last test I want to do in this post is to see if the subsequent files are saved so there is no gap between.  For that, I have to find a mp3 file that won’t actually break my player.

Found it, and had no luck.  Even with pure mp3 files, Winamp gives up and puts its legs in the air.  Tested the same with Media Player, and it seems that recordings overlap by few seconds, so that checks out.

Before conclusion, let’s just take a look at resource usage of the application:

Conclusion

You might be able to use Replay A/V for your media monitoring purposes, and save great deal of money.  However, please note that:

  • I didn’t find any option for error reporting (which will enable you to see that the stream is off-line for extended time)
  • if all the channels would cut the file at the same time, it would create unresponsive app for at least 2*number_of_channels seconds
  • CPU usage profile is minimal, however I just found out that memory usage rises LINEARLY over time, and that would lead to immanent application death after some time (you do the math)
  • I didn’t use scheduler to create persistent connections, if I would, and am having bad connection with lots of breaks, app would be nearly impossible to use due to freezing upon connection
  • there is no (or I wasn’t able to find it) option for renaming the files so they would use some time-stamped names
  • it doesn’t provide support for VideoPhill Player, which is a archive exploration tool created just for Media Monitors

Additional info…

After several hours (around 6) this is the memory usage that is taken using Procexp.

For those that can’t read memory usage graph, this means that the application has a memory leak, and by this rate, it would exhaust its memory in less then 24 hours, since it is x86 process.  Quick remedy for that would be to raise the interval for the file cutting, because I suspect that memory leak occurs at that time.

Why the LinkedIN is so great!

December 31st, 2011

Happy New Year to everyone.  I just want to share a joyous event with you, I won’t comment it at all, but just hang the pictures there for you…

 

LinkedIN post on PlayKontrol

Reaction to the post

Another posting, now on StreamSink

And again, interesting reaction...

OK, but WHO is Mr Anant actually?

Recording multiple FM radio stations (works for AM, too)

December 27th, 2011

As it seems, we in media monitoring want to record everything.  Good part of everything is still in FM radio spectrum (or AM in some flat-land countries).  An usually, there are plenty of stations on the air that we have to record, at least a dozen at a given location…

Ancient history

Many many years ago, when I was working in FirePlay (great radio automation company and software) we had a task to produce a recorder that would record ONE channel of radio program 24/7.  At that time, encoding MP3 in real-time was some kind of science, and wasn’t available but on most advanced systems that were available (I won’t try to be exact here, but it was something on the lines of Pentium 133Mhz).

So we build FireSave, first version, that was able to handle 1 channel and record it to hard drive, encoded in mp3 format.  We even tried to use some obscure GSM codecs to save space even more…

Ancient history, but without dinosaurs

Setup above required live external tuner to be connected to the Sound Blaster (yeah, really).  We had some multi-channel cards but they were expensive, and using them to record a confidence and/or compliance recording would be waste of money.

Our need was expanded from one channel to several, say 4.  Since we had some expertise running multiple channels, we quickly added more external tuners, replaced Sound Blaster with some multi-channel monster (it was Wave4, then Gina24, then other stuff from EchoAudio, such as Layla 3G) and finally upgraded the software so it could handle multiple channels.

It worked, with 4 external tuners attached to one PC, sometimes more, it looked like an octopus.

Present days (year 2009)

OK, but what if you need and want to record 150 radio station that typical country like Croatia has?  You’ll be able to get some audio cards that will have up to 16 audio inputs (even mono sound will be OK), but to have that kind of external tuners, that is and could provide some kind of a problem.  And yet still, they can’t all be heard in one place, so you’ll have to have multiple recording sites in order to capture everything you need.

Or not?

The simple fact is that every good radio station will have its internet stream so it will be heard on the internet.  And there is a way to capture that stream of the internet and save it to hard drive as you would record it.  There are multiple tools on the internet that would allow you to capture internet audio streams, and you just have to choose one of them, and you’ll be able to record any radio that has its stream.  Before we created StreamSink, I was extensively using StationRipper for my own purposes, and that was the inspiration that was needed to create very similar tool.  It is similar in the respect that it records internet audio (and video) streams, but one thing is very different: all ‘rippers’ including StationRipper are designed to try to cut audio stream at song boundaries, creating a library of songs for the user.  On the other side, our task was to create system to record internet streams in multiple formats in the archive format usable by VideoPhill Player.

It isn’t anything special – just a bunch of files named in some fashion and cut at every five minutes, with special care not to lose single byte of a stream while cutting it.

So with that system, recording 100 radio stations on a single computer is as simple as having an good internet connection present.  Of course, every stream will be recorded as reliably as the server and the internet permits, and there is nothing you can do about it.  When using that method, you must allow yourself to lose some of the archive sometimes, for the unforeseen facts.  Again – better radio stations (the stations that you will need 100% of the archive) will have better sources, better distribution servers, and thus your archive will be better covered.

Expected Archive Coverage

But what to do when there are NO streams?

Lately (Summer 2011), there was a client that needed to record multiple radio stations as well.  However, after initial investigation we concluded that radio stations that needed recording were either badly presented on the internet or not presented at all.  So instead of capturing streams, we were aiming to capture radio signal from the FM directly.  All we had was the antenna that was dipped in the airwaves that contained our radio stations (8 of them).

Strategy was as follows: I have a tool that can capture streams in the format that my application (the Player) needs, but we haven’t the streams.  Let’s create them.

Shoutcast internet radio is on the market for decades.  And it has both free and tremendous support, and their software for creating and distributing internet radio streams are as robust as they can be, since they are field tested in possibly millions of usage scenarios.

As I knew how to encode the stream, how to distribute it (locally) for the StreamSink, I just needed to capture FM signal somehow.  Using 8 external tuners would be funny for the client, and I’ll probably lose them, so I did a little digging and found a beauty in form of a PCI card:

Professional PCI tuner adapter

This little monster (AudioScience ASI8921) is able to capture 8 FM radio channels and give them to the rest of the system in the form of the DirectShow or waveIn API, just what we needed.  Only thing left to do is to connect the antenna to the card and configure shoutcast encoder/server as needed, turn on the StreamSink, and we are recording!

How to reduce hard drive fragmentation

December 17th, 2011

The topic of drive fragmentation might be a little out in this days, but since I spent great deal of my youth watching PC Tools defragment my drive in a graphically pleasing fashion, I am inclined to think that drive fragmentation (when excessive) can severely reduce both computer performance and hard drive life.

As this might be true for the common day-to-day user, it is particularly true for corporate/enterprises that do need their data to be:

  • accessible,
  • quickly accessible,
  • accessible for a long time

In a common computer use scenario, most of the files are there for computer to read an use, either as software that has to be loaded into memory, or documents that have to be shown to the user.  Writing to the hard drive is uncommon operation (when you put it against the number of reads) and thus the drive fragmentation however present is in fact easily ignored.

Continuous stream recording, enter…

In my business (my clients businesses’ to be exact) the hard drives are working in opposite.  They WRITE all the time, and read only on occasions.  And the problem that will surely lead to fragmentation is that in most situations they need to write MULTIPLE long files continuously.  Let me try to explain what, first from the aspect of why, then move to what…

When either running VideoPhill Recorder for recording video, or using StreamSink to record internet media streams, in most cases user has MULTIPLE channels recorded on one computer.  Files that are created by that recording are commonly created at one time (all of them) and are grown continuously until closed.  Since Windows is, as it is now, an operating system that can’t reserve drive space in advance (maybe it can, but software doesn’t know how long the files would be) the space for them will be allocated as the time goes by.  If we have 4 files that are written slowly but concurrently (and are grown at the same time), we’ll certainly have the following situation on the hard drive (I’m talking ONLY about the data that is stored here, and am simplifying physical hard drive storage as a continuous slate):

file1_block1
file2_block1
file3_block1
file4_block1
file1_block2
file2_block2
file3_block2
file4_block2
.
.
.
file1_blockN
file2_blockN
file3_blockN
file4_blockN

That means fragmentation.  File isn’t in continuous blocks, but is scattered in evenly and can’t be read sequentially from the hard drive.  You might be lucky and your blocks could be scattered in a way that sectors on the drive will be adjacent and this won’t pose a problem, but what are the chances? :)

And when file1 gets deleted, what remains on the hard drive?  A blocks filled with nothing, left there for other files to fill them.  New files will try to fill them, and the drive will soon be completely jumbled.  It will all be hidden from you by the OS, but still, OS will have to deal with it.

And that is the story of 4 channels.  What about situation when you have 60 channels recorded on one machine (I’m talking about internet stream recording, of course).  Such an archive could be found here: http://access.streamsink.com/archive/

If you aren’t convinced that this really IS a problem, you can stop reading now.

Rescue #1 – Drive Partitioning

It is feasible in situations where there is low number of channels that needs to be recorded.  If you have 4 channels, you’ll create 4 partitions, and each partition will have nice continuous files written to it.  Done.

However, you can’t have 50 partitions on one drive and get away with it.

Rescue #2 – Queued File Moving

Other solution for large number of channels presents itself in a form of a temporary partition for initial file recording, and then moving out the files to their permanent location later, but ONE FILE at a time, in a queue.

Queued Moving of Files in StreamSink

This is implemented in StreamSink, and it even has an ability to throttle data rate when moving the files to another drive.  Only thing that is of a problem here is wasting of a temporary hard drive, because it gets beaten by fragmentation.

Rescue #3 – Using RAM Drive on Method #2

While I was writing the article about NAS, thought flashed across my mind – can we avoid writing to the temporary drive and reduce the load ONCE more?

Yes, we can.  I know that RAM Drives are also out of fashion, but here one will come handy.  It’s the shame that support for it isn’t included in the system already, so with little googling I found this: http://www.ltr-data.se/opencode.html/#ImDisk

I installed it on the testing server, re-configured the application to use new temporary folder, and from now on, it runs so smooth I can’t hear it anymore :)

Some technical stuff:

  • in this instance, I am currently recording 62 channels and cumulative rate for it is around 5 megabit/second
  • my files have duration of 5 minutes, which means that recorded chunks are closed and moved to permanent storage every 5 minutes
  • during those 5 minutes, each file will grow so much that the whole content for those 5 minutes won’t get over 200megabytes
  • I created 512 megabyte ram drive, just to be safe

Conclusion

Take care of your hard drive, and don’t dismiss old-techs such as RAM Drives just yet.

If I was about to implement this on an application level, I would have to spend a great deal of time, and some media types won’t even be possible to implement – Windows Media for example, writes to disk or to other places if you employ magic…  With use of RAM Drive, it was done in a matter of minutes.

Having NAS is great (or is it?)

December 17th, 2011

During the years I had many deliberations over the fact if either NAS would be used or it wouldn’t be used for the video archives created by video logger system such as VideoPhill Recorder.  At first, I was firm believer in one methodology, then completely turned my side to the other, and now, …  Well, read on, and I’ll take you through it.

Stakes

On the one side of the stake set we have actual requirements, and on the other there are considerations.  Actual requirement are sometimes hard to pinpoint at first, but they always come out sooner or later.

So, let me list possible requirements that might be in effect here…

Common low level requirements

Storage for video recording (for logging purposes) needs to have following abilities:

  • low but constant and sequential write rate – data rate for 4 channels are as low as 5mbit per second (500kbytes/sec) but is CONSTANT and SEQUENTIAL – there won’t be much stress for the hard drive because of constant seeking
  • high durability over time – what gets written once, has to be there.  It should survive single drive failure
  • reading isn’t common, but when done, it has to be sustainable, but again at low data-rate and great predictability (it usually is sequential)
From everything above, I can guess that any data storage expert would read RAID 5 and won’t allow you to create anything else for the video archive storage.

Archive duration scalability

The archive duration is directly proportional with the hard drive space that is available.  To determine what kind of hard drive space you need for your first installation, you can use on-line hard drive size estimator calculator that I created right for this blog.

If you plan extend the duration of your archive one day, you have this requirement, and you have to plan for it.  Having the storage at one place can simplify the adding of the drive space, but can also completely block it.

Let’s say that you have the archive of 40 channels that span 92 days (3 months).  And let’s say that you decided to use 1mbit video with 128 kbit audio for the archive.  By using the calculator above, you’ll find out that you have 42 terabytes of storage already in place.  Even at this date, that kind of storage set in one place is kind-of-a challenge to build.

If you have already invested in 42 TB storage system, and have foresight to plan for an upgrade to say its double size for it, you are in luck.  But, say that after a few more months (just 3) your management decides to expand the requirement to 12 months of storage.  Wow.  Now, you have to have 127 TB total.  If the current system will hold that much drive space, again you are in luck, however – say it doesn’t.  Your options are:

  • add one more to the chain
  • create a bigger unit, copy everything to it, scrape the current one
I’ll stop my train of thought here, and leave you only with few things to think about before I go on with other requirements: who needs used 82 TB system (if you want to sell it), do you know how much it is to COPY 82 TB even at extreme network speeds, adding one more will break the ‘all in one place’ requirement, …

Having it all in one place (i.e. for web publishing)

If you need everything in one place for the publishing, then this is a solid requirement.  Web server will have the content on its local hard drives, and it will publish it smoothly.

But, is that really a requirement?  I must admit that I didn’t see web server that properly served files from the network locations (despite the thing that there are option to do that), but I’m sure that IIS and Windows Server gurus will be able to shut me down and say that this is normal thing that is done routinely.  So, if we know that each channel recorder has its own directory ANYWAY, what’s the use of having

c:\archive\channel1
c:\archive\channel2
c:\archive\channel3

instead of

\\rec1\channel1
\\rec1\channel2
\\rec2\channel3

Reducing single point of failure

This requirement is very common, and having NAS system as a ‘point’ it brings us that having the NAS leads us to having single point of failure.  I understand that there are multiple redundancies that could be installed into the system, such as RAID 5, or obscene configurations such as RAID 1+5.  Note: for later, it seems that the article author has a same opinion on it as me:

Recommended Uses: Critical applications requiring very high fault tolerance. In my opinion, if you get to the point of needing this much fault tolerance this badly, you should be looking beyond RAID to remote mirroring, clustering or other redundant server setups; RAID 10 provides most of the benefits with better performance and lower cost. Not widely implemented.

In my words: if you need such system, it’s better to have recording drives distributed on each recording machine, have RAID 5 there, and additionally have NAS (or some other form of storage) to DUPLICATE everything.

Bandwidth issues

At a configuration with 4 channels recorded at one machine, and with above mentioned data rate for video and audio, each machine will produce 5 megabit of content every second.  Roughly, that is .5 megabyte.  Even ZIP drive could almost handle that.  However, if you have 10 times that (for 10 recorders) and have central storage for the whole bunch of channels, that is 5 megabytes of data at a constant rate that never stops.

Consider that central storage in question is NAS has hard drives configured in RAID 5.  That means that it will have to receive, calculate parity for, move the drive heads, write to drive, …  It will be very busy NAS, and with everything else in mind, it won’t have a second of a break.  Add to that occasional reading of the content from the archive diggers, and you’ll soon figure out that the NAS will have to take it all itself.

On the other hand, archive access applications such as VideoPhill Player doesn’t have anything against having the channels on different recorder machines.

Conclusion (for bandwidth issues) – having each recorder machine handle both encoding and storage for 4 channels will reduce single point of stress for both recording and the archive access.

Overall…

Having dumped my intuition in this few paragraphs, I hope that I presented case that is strong enough against having NAS for video logger/archive storage.  Again, everything said is from my experience on the subject, and I’m no storage expert who will talk petabytes, just a simple consultant trying to get my clients best bang for the buck.  Please, I won’t mind objections to the text, on the contrary…

 

And then, I got in… (story of data visualization)

December 15th, 2011

How to see the data?

If the data is numeric, and it represents some series, it will be mostly represented with a graph of some sort.  There are hundredths types of graphs available, and they all have some purpose, otherwise they would not exist.

However, for some special occasions, you have to see different kind of data.

The problem (this particular instance)

Since I am developing a internet media streaming CAPTURE and ARCHIVE application (StreamSink) I am also continuously testing it on one of my servers.  I am adding channels, removing them, stopping the server, sometimes something goes wrong and the whole thing freezes or crashes, so the archive I have is rather heterogeneous in quality.

Let me go through the operational view – the mere GUI of the StreamSink, so I can present some problems and solutions so far.

StreamSink

Several things were important to the operator of the software that had to be present on the main (status) screen.  For example:

  • whole list of channels should be visible
  • channel status should be visible at first glance
  • I am interested what happened to the system recently
  • I need to know the status of my connection
  • it would be good to know how many disk space is available

I could dwell on it but the main point of this post is something else.

The problem here is that I had to create PlayKontrol report for a demonstration purpose (for them: http://ihg.hr/), that would scan 7 days of the archive (multiple channels, of course), and produce the reports (playlists) for 300 songs.

So the problem is: to

find, in the archive that is damaged in various ways, 7 days of continuous archive that spans multiple channels.

The solution (prelude)

Since I am kind of explorer by nature, I wasn’t inclined to use a solution that would present raw data as an answer, but was into thinking about seeing the data and determining the period and channels ‘visually’.

StreamSink has a integrated feature that is called ‘archive report’, that has data similar to what I need, but with it I would only get limited information.  You can see the report here:

StreamSink Archive Report

Most useful info on the report in this particular situation would be the graph on the right side of the report.  Let me explain…

For each day StreamSink is able to record up to 24 hours of media.  Due to network situations, it sometimes is less then 24 hours, and I decided that I would present that number in the form of percentage that archive is covered for the day.  As you can see from the report, that percentage is shown for the whole archive lifetime, for last month, last week and last 24 hours.

Also, it is shown in the form of graph, where on the leftmost part of the graph is the current day, and as we go to the right, we sink onto the past, having divider lines at each 7 days.  Nice, eh? :)

But, as nice as that report is, I can’t read what 7 days and what channels are to be scanned – I have to find another way in.

 Solution (at last)

For this one, I picked something that I learned from the above mentioned report.  That was:

  • I will have a channel list
  • I will have some sort of calendar
  • I have to see how much is covered for the archive for each day

Also I decided to show each day as a cell in a table-style matrix, where rows would be occupied by channels, and columns will be days.  Time flow was inverted here, so left is past, and right is the present.

Whole thing looks like this:

Archive Digger

Same thing little zoomed in:

Archive Digger Detail

Note: green is the color for the days that have 90% or more archive covered.

At last, you can see from the both pictures that much of the data is revealed at the first glance. For example, 0 means that there were no archive that day at all. Numbers below 90 suggest that either it was some problem with the channel that day, or StreamSink was either started or stopped in the middle of the day.

I could even color-code that information on the chart – but the utility will be expanded further only if there’ll be demand for it, since I know what I needed to know, from it.

BTW, I don’t want to brag here, but to code that utility it took 2-3 hours of thinking and coding, and almost no debugging.  It’s most probably due to fact that I’m doing that stuff over and over again for some years :)