Refactoring, it’s fun – part 2

November 22nd, 2011

In the first part I tried to set the scene and give you some background of my problem. I’ll try to continue now, and create an interesting and at the same time informative story about performing the refactoring in question.

Most curious of all is that you don’t have to prepare for refactoring. It can happen no matter what, and the price usually isn’t so high after all. Let me remind you – main tool of component design that is dependency injection wasn’t used. Yes, classes do have their responsibilities cleverly defined, and it helps a lot here, because if it weren’t so – whole deal would have to start few steps ‘before’.

I’m not an experienced writer, so I don’t know if I will get the point across, but to me refactoring this was like building level of level of scaffolding, just to use one level to test the next, and at the same time creating scaffolding so it would be used later in a production environment! I guess that there is a name for it, it has to be :)

Step 1:

Creating a duplicate of the main working class, and see how to force the rest of the application to use it when needed.

Say that class name is PCMHasher, and it has following structure (method declarations only):

class PCM_Hash
{
    public bool Initialize(PCMBufferedReader sourceDataReader);
    public uint GetNextHash();
}

My goal was to create alternative class to this. I needed the old class to be able to have some reference to get the results from.

So I created class PCMHash_2.  That was my first decision – to create same class as before and try to get same results from it, replacing its guts one step at the time.

Using replaced version wasn’t easy, so I took an interface out and derived from it, and created something like:

IPCM_Hash rph;
if (!_refactoring)
    rph = new PCMHash();
else
    rph = new PCMHash_R2();

At this time I would like to re-state the fact that I am in the production all the time, and have to decide on my feet, having to weight out all the implications that would arise from it.  I am telling that because everyone could see that some dependency injection was to be used here. But, apart from having to spend much time installing it through the code, I’m not sure how and if it would work anyway.

Why: at a testing time, I want both classes to be able to function side by side.  Concretely, if I have:

class FileHasher
{
    // will use IPCM_Hash implementation somewhere...
}

I’ll need:

FileHasher usesOriginal=new FileHasher();
FileHasher usesRefactored=new FileHasher();
usesRefactored.Refactored=true;
//  compare results
object resultFromOriginal=usesOriginal.GetResult();
object resultFromRefactored=usesRefactored.GetResult();
Assert.AreEqual( resultFromOriginal, resultFromRefactored);

Here, the GetResult() method could create new IPCMHash several times, and I am not sure if for example nInject would be able to handle such a thing.

Also, I was toying with an idea to use class factories manually created or send a type of the implementation class in the constructor of the consumer, but that options also faded.

Anyway, there is original class, and there is new refactored class ready to be taken apart and put together again in some different order, preserving functionality that was here before.

Next steps to be added in next editions of the refactoring talks…

Refactoring, it’s fun – part 1

November 22nd, 2011

It’s a story of refactoring when code that should be refactored isn’t prepared for it a single bit.  If I say prepared I guess that it would mean to have test cases, dependency injection code, and so on.  However, I have none of the above in the original code, just the code that works.

Let me explain what I have, what it does, and where it should end.  The purpose of this here refactoring session isn’t about having better performance and keep the same functionality – it is to have same algorithm, already proven in various tests work on different data.

Before (a.k.a. now):

Component creates PK_HASH from a sound file.  By PK_HASH I would mean “code name for our latest tech that can crunch whole audio file to few bytes later to compare that bytes to bytes crunched from the other file and tell you whether it’s the same sound file”.  PK stands for PlayKontrol – the brand name.

So, there are few steps to produce the PK_HASH from sound file:

  • decode and read the file – input is file on the disk, for example .mp3, .wma, .aac, and the output are PCM samples
  • from any kind of PCM samples (stereo, mono, 8-bit, 16-bit) we produce array of shorts that must be rewindable
  • hashing the data and producing the PK_HASH files

Decode and read the file

From the file on disk, that can be any file format that is streamable (we produce files with StreamSink – see the archive example here: http://access.streamsink.com/archive/). That is .mp3, .aac, .wma, .ogg, and whatnot.

Currently it’s done by using simple component that uses DirectShow to create a graph for the audio file, renders that graph and attaches SampleGrabber filter to fetch the samples.  Component goes from file on the disk to whole PCM sample data into memory.  It’s feasible for 5 minute file (5 x 60 x 4 x 44100 = 50MB).  It can work even for 1h files.  However, you can *feel* that this approach IS wrong, especially when told that for the rest of the algorithm, you don’t have to have access to the whole PCM data at once.

Rewindable sample array

PCM Samples are promoted to 16 bit (if needed), and channels are downmixed to mono.  Again, that is done in memory, so for each PCM sample there are 2 bytes of data that are present in memory as a result of the operation.

Hashing and creating the file

Hashing process needs moving and overlapping window over sample array, and since we have everything in the memory, that is a piece of cake now.  We take the data, process it, write it into the another byte array.  Since it’s extremely dense now, I won’t cry about memory at this point, but yeah, it is written into the memory first, and then saved to file on disk.

So here I tried to explain how it works so far.  It goes from the encoded audio file to PCM sample data in memory, downmixes that data in memory to one PCM channel, processes the mono PCM samples to obtain PK_HASH and then write it to file.

So what do we actually need?

If you take a peek at the archive you’ll find that every folder has audio files, and also has .hash file for ever audio file that is present in the directory.  Please note that not every directory is processed, only 20 of those, because processing consumes CPU intensely, and I have only few PCs laying around to scrub the data.  Will improve in the future.  So, for crunching the archive, even POC (proof-of-concept) is OK, as it serve its needs.  It will go through the archive and leave processes PK_HASHes.

Process that goes in parallel is waiting for the PK_HASH file to be created, reads it, and does matching against the database.  However, next step should be taken, and it is REALTIME processing.

To be able to process in REALTIME, architecture goes somehow like this:

  • StreamSink is attached to the network stream of any kind, and provides PCM sample output
  • PCM sample output is downsampled and buffered
  • hashing process uses buffered mono PCM samples and outputs results into the stream
  • PK_HASH stream is again buffered and results processed with MATCHER process

StreamSink PCM decoding

StreamSink is the application that does internet media stream capture.  It can, however, thanks to feature request from DigitalSyphon, process every media stream and provide PCM samples for it in real-time, in a form of the Stream derived class.  So, what part of the process is covered completely.

Buffering PCM samples

Now, new component should be created – something that can buffer PCM samples from the Stream and provide floating, overlapping window reads for hashing process.  With some thinking I combined inner workings of Circular Buffer stereotype with something that can be used almost directly in the hasher process – by replacing class implementation only.

Processing and creating PK_HASHes

Hasher process was reading the buffered MEMORY quasi-stream.  However, it used kind-of simple interface to read the data, so my luck was that that interface could be extracted and implementation done with buffered stream data.  Also, output of the class should be rewritten, since it now doesn’t have any Interface-able part to replace.

And so on – later should be implemented from scratch, so there is no story about refactoring here.

Refactoring pyramid/tower

I can call it pyramid or tower, because after long time of procrastination (subconsciously processing the task at hand) I was able to put my hands on the keyboard and start.  My premise was that everything has to be checked from the ground up, because NOW I have the algorithm that produces desired results, and since there are many steps involved, an error in a single step could be untraceable if I don’t check every step along the way.

Tools used

I am kind of old fashioned, so this paragraph won’t be very long.  I use Visual Studio 2008, and for writing test code snippets I use nUnit as a launcher so I won’t have to have some form to run tests or a console app.

For dependency injection I tested nInject, and it is great, but in this case here, it can’t help me, so I’ll do implementation replacement “by hand’.

I’ll finish this post for now, as this is the current state of affair, and will keep you updated as the story develops, with new post and fresh insights…

Hard Disk Drive : Safer Reliable Databank

March 12th, 2009

Which Media???

Hard Disk Drive(HDD)images1

  • A hard disk drive (HDD), commonly referred to as a hard drive, hard disk, or fixed disk drive, is a non-volatile storage device which stores digitally encoded data on rapidly rotating platters with magnetic surfaces.HDD, itself, was built with multiple read-write, should come as no surprise that this format’s biggest advantage, millions rewrite times.

Digital Versatile Disc(DVD)dvd

  • DVD, also known as “Digital Versatile Disc” or “Digital Video Disc,” is a popular optical disc storage media format.The wavelength used by standard DVD lasers is 650 nm, and thus the light has a red color.Since DVD ROM stands for Digital Versatile Disk – Read Only Memory, it should come as no surprise that this format’s biggest disadvantage is that it can’t be used more than once.

How much Storage???

HDD:  Large storage capacity

  • A typical desktop HDD can store between 120 GB and 2 TB of data (on Current US market data)

DVD: Medium storage capacity

  • 4.7 GB (single-sided single layer)
    8.54 GB (single-sided double layer)
    17.08 GB (double-sided double layer)
  • Lower Data Transfer/access speed

We all need speed/fast access???

HDD:  Transfer/Access Speed

  • HDD Rotate at 5,400 to 10,000rpm and have a media transfer rate of 1Gbit/s or higher (1GB = 109 B; 1Gbit/s = 109 bit/s) Stores and retrieves data much faster than DVD/CD.
  • As of 2008, a typical 7200rpm desktop hard drive has a sustained “disk-to-buffer” data transfer rate of about 70 megabytes persec.
  • Latest, 3.0 Gbit/s SATA, which can send about 300 megabyte/s. from the buffer to the computer, and thus is still comfortably far ahead of today’s DVD-to-buffer transfer rates.
  • Most external hard-disk-drive cases with FireWire or USB interfaces. eSATA, standardized in 2004, provides a variant of SATA meant for external connectivity. Full SATA speed for external disks (115MB/s)
  • eSATA attracts the enterprise and server market, because of its hot plug (and online) capability and low price.
  • Access speed does not reduce, with increase in capacity.
  • Cheap on a cost per megabyte compared to other storage media.(DVD/CD).Recent market survey shows 1TB HDD costs starts from $60 above
  • Hard disks can be replaced and upgraded as necessary Can have two hard disks in a machine, one can act as a mirror of the other and create a back up copy.

DVD:  Transfer/Access Speed123

  • Access speed reduces tremendously with increase DVD space.
  • Very Costly, to get same 1TB of space ($200), moreover not available on 1 disk, multiple DVD Disc’s needs to be used.
  • DVD Duplication/data copy requires burning a blank DVD, which is software & DVD-writer speed dependent process, tedious time consuming process, has compatibility issues with uses on different drives

Longer media life???

HDD: Longevity/Degradation

  • HDD’s have a longer shelf life, Hard disks and cartridges will last longer because the disks are rigid, Compact, Enclosed. More than 15 years is Life expectancy.
  • Because of casing, compactness, external factors have least effect, on internal mechanism.

DVD: Longevity/Degradation

  • DVDs and CDs have a shorter shelf life.DVD/CD technology uses a dye that can fade, especially if exposed to UV light. You should expect life for 5 years
  • Environmental forces will degrade the data layer much faster than the polycarbonate substrate layer (the clear plastic that makes up most of the disc).

Where is Web accessibility???

HDD: Online Storage

  • RAID combines two or more physical hard disks into a single logical unit by using either special hardware or software. Access rate, transfer rate is best for network storage.
  • Cost per online Gigabyte: $0.55
  • Share HDD online: Most simple way is to install a free FTP server (software) like FileZilla, give certain people access to your files through an FTP client or web browser.
  • With portable USB HDD, you can carry your large data (GB’s), and get online anytime. Only USB port needed. (Available on 9 out of 10 machines)

DVD: Offline Storage only

  • Used for Offline storage; very sluggish access over Local Area Network.
  • Sharing across web (online) is not possible.
  • For mobility, of Carrying large data(GB’s), one needs multiple DVD’s, with availability of DVD-Rom on that machine

More for Less: Data storage density???

HDD: Data Storagenew-picture

  • Data is stored in a very orderly pattern on each platter. Bits of data are arranged in concentric, circular paths called tracks. Each track is broken up into smaller areas called sectors. Part of the hard drive stores a map of sectors that have already been used up and others that are still free.
  • When the computer wants to store new information, it takes a look at the map to find some free sectors
  • Typically, Data up to 100 GB’s can be store on single platter
  • With so much information stored in such a tiny amount of space, a hard drive is a remarkable piece of engineering. That brings benefits

DVD: Data Storage12344

  • A DVD is composed of several layers of plastic, polycarbonate base, totaling about 1.2 millimeters thick. Writing data to the DVD is done by a red laser beam modulated by the serial data stream. When the beam turns “on” and hits the dye layer, a distortion (known as a pit) on the surface is made.
  • Dual Layer recording allows DVD-R and DVD+R discs to store significantly more data, up to 8.54 GB’s per side, per disc, compared with 4.7 GB’s for single-layer discs
  • While you will need as much as 300 DVD’s to be able to store that data.

Historical trend HDD Space/Price/Timehard_drive_price_history

By Dec 1999, $20/GB
By July 2001, $5/GB
By Dec 2002, $2/GB
By July 2004, 50¢/GB
By Dec 2005, 20¢/GB

By Dec 2007, 14¢/GB

By Dec 2009,   6¢/GB




Historical trend DVD Space/Price/Time

nea_0209perifig2

By Dec 1999, $17/GB
By July 2001, $4/GB
By Dec 2002, $1.6/GB
By July 2004, 42¢/GB
By Dec 2005, 18¢/GB

By Dec 2007, 11¢/GB

By Dec 2009,   4¢/GB


Windows Media Encoder – loss of conection

January 23rd, 2009

When using WME (look at the title) common problem is that out-of the box application doesn’t handle connection problems when using PUSH method. We adressed that problem with VideoPhill Recorder and made so it reconnects automatically every time connection is lost. So you won’t lose any effort for reconnecting to the server mannually if your connection is shaky.

So, if you have VideoPhill Recorder, and want to do streaming from the same computer, give us a call.

Bitrate Calculator

January 22nd, 2009

In contact with customers, or prospective customers, hard drive capacity questions are most common ones.  To that effect, I prepared two tools: one being PDF file with various bit rates, days, and hard drive spaces, and other being this little calculator that you can use on-line to help yourself see what hard drive you need.

One comment though: manufacturers of hard drives usualy have 1000 where should be 1024.  That means that 1G is 1.000.000.000 bytes for the manufacturer, and in reality it is 1.073.741.824 bytes.  So, after making a calculation with tool below, please add 7.3% to the hard drive space calculated.

Please note: you should enter all values EXCEPT the one marked with radio button.  That values will be calculated when you press ‘calculate’ button.

Here is the table: hard-disk-bitrate

Calculator below can be used to calculate all combinations of values – for example, if you have fixed hard drive space, and fixed days that you need to record, you can use it to estimate bitrate you should use. Just click on the radio button left of bitrate field, and fill all of the remaining fields. On the other hand, if you need to fix your bitrate, and you have disks of fixed capacity, and want to see how many days of recording that will come to, just click on radio button left of days field, fill in rest of the stuff and click calculate.

When experimenting with various drives/bitrates/days etc. you can make yourself a table of values for comparation. Any time you want you can press ‘add’ button and small table will appear at the bottom of the calculator summarizing your choices.

If still are not sure how to use it, click here to see short movie of it’s usage.


 
 
 

Windows Services to Kill

January 21st, 2009

VideoPhill Recorder is a dedicated software, and it should have it’s own machine.  So, bare Windows XP, all the updates, all the right drivers, and that’s it.  By default, if you open Services in Computer Management, you’ll see that many of them are started, and they also have start-up type set to Automatic.  Which means they start with Windows, regardless of their usage.  They eat memory, and some of them eat CPU and other resources.  So I went further and compiled a list of services that VideoPhill users could and should kill, thus freeing about 400MB memory when system is running.  Other Windows XP user could do the same, but maybe to lesser extent.

Here’s the list:

  • Automatic Updates
  • Cryptographic Services
  • Distributed Link Tracking Client
  • Error Reporting Service
  • IPSEC Services
  • Net Logon
  • Print Spooler
  • Protected Storage
  • Remote Registry
  • Secondary Logon
  • Security Center
  • System Restore Service
  • Task Scheduler (if not used)
  • Themes
  • WebClient
  • Windows Firewall/Internet Connection Sharing (ICS)
  • Wireless Zero Configuration (if you don’t use WLAN)

I also plan to make a script, batch file that will make all the services have start-up type = manual.  On this page there is some useful information on how to do it.

Idea overload

January 16th, 2009

The main thing in the (business) world is to have an idea.  Or maybe you should have The Idea.  That is what people with ideas think.  Mostly they wait for the ideas to spring to life all by themselves.

But what when there are many ideas.  And all of them are like beautiful fruits hanging somewhere in the sky.  So you push one of them, give it a nudge, maybe it’ll ripe and fall.

Enough with the metaphor, and back to the real stuff. I needed an idea about ideas.  So called meta-idea.  I needed a way to nurture my ideas and allow new one to come.  I needed RentACoder.com.

If you know what to do, and how you could do it, but don’t have time on your hands, outsource it.  Every accomplished manager will tell you that.  Some rich guy said it like this: “You cannot get rich by being smart, you can get rich by employing smart people”.

Some will say that in order to outsource something, you have to have some money.  No body will work for no money.  How to solve that?  Build a prototype.  Use your power of speech to find a sponsor.  Be a conduit and try to add value between two parties.  That is what I am think I am doing right now.

Some examples: currently my partners on RAC come from Ukraine, Macedonia and Russian Federation.  In recent endeavours, I had great experience with people from Argentina, France and of course Croatia.  So you see, every coder speaks the same language, and it’s easy to do something there.

VolumGuard feedback

January 15th, 2009

I recently released VolumGuard application prototype to few trusted colleagues that I know that will give me proper feedback on it.  VolumGuard is an application that is used in radio broadcast environments and which monitors live audio input (vocals from speaker/talent) and when it goes above certain trigger level drops the output level on selected wave output devices in the PC machine.

Here is the screenshot, but please, I know it’s ugly, and I’m trying to concentrate on functionality first…

VolumGuard Screenshot

VolumGuard Screenshot

You might already notice – application function is trivial, and user interface looks complicated.  I know that is horrible way to go.  It was supposed to be other way around.  So, some phone calls were concerned about it…

One thing I can do here is to remove all of the sliders from the ‘main screen’ of the application and make another main screen that will be very simple and will show that application is running fine.  From that screen, user could select ‘options’ and be given this beautiful and scary window.

Whole application is made in MFC, which is something I got used to many years ago, and for small apps I find it great.  I made graphing control myself, and it’s rather flexible, is able to scroll to the left and draw various real-time data.  It also supports various markers, both horizontal and vertical.

What can be done here to improve all this?  Here are some user suggestions:

  • to be able to turn the volume monitoring on and off
  • to be able to switch profiles with the hot-key
  • to be able to switch volume monitoring with the hot-key
  • to run program in the system tray – pulling it out with the hot-key

For that, I’ll need some kind of global hot-key hook.  I don’t know how dangerous are they to the system, but since there are many programs that use them, I guess that it will be ok for me to use them as well.  Here is a good article on keyboard hooking, but then, how will I protect my program from being recognized as a virus?

Medianet visited

January 13th, 2009

In our continuing mission to find new customers and enable them to remove their VHS bases video archives, we found medianet.hr company.  They do their press-clipping and media-clipping work and they really need video, audio, and other types of archives.  So, we engaged to provide video services for them.

I can say that guys there were really interested in our work.  You can’t really know how motivating that is.  Their main concern is their enormous archive that goes several years in the past, and is stacked onto the shelves of their office.

One repeating issue was brought up on the meeting.  That is our renown DVD export for VideoPhill player.  So, with so many interested parties, it seems like we are obliged to make it.

On to the Internet we go…  to find something that will enable us to do so.  I can take two roads here: GPL road and fully commercial road.  One leads to ffmpeg with dvdflick (or parts of it), and other leads to MainConcept.

Also, something that has troubled me for many times, is that they pay massive fees annually to the company that provides software for video clip recognition.  Since that feature lies exactly on my interest path, I guess that I will try to hit that road also.  I am thinking that some basic clip recognition could be build, but to fully test it we’ll need massive archive.  So another curious task, to build one.  My flat/office already is starting to look like a hi-tech warehouse, who know where it will go from there.

And at the end – it’s now very clear that stream logger project that I started as a pet project needs to be produced commercially.  There really is a strong need for such a product, or a service that we could provide to them.  Again, to test it and develop it massive storage space should be provided.  I found out that Addonics has very interesting gadgets to that effect.  But their shipping is expensive, so I’ll wait some more until I will really need something big from them.

Softing-eu comes to Zagreb

January 9th, 2009

Since the days I started cold-calling my future customers and searching for future customers and partners, I stumbled across KapitalNetwork web site, which is also Internet TV.  I was searching for a customer that will need my companies video logging services, and since there was an video publishing company already in the signature of their web, i proceeded to contact them.

That occurred last year and their director came to Zagreb today, and we had a very interesting meeting.

He found our company to be very interesting, I guess because that we are small and flexible, and technology oriented.  At the same time, I found that his company is one of the perfect corporate-mates that Informacija d.o.o. should have.

We introduced, we talked, and we found that we really have some major connecting issues.