Refactoring, it’s fun – part 2

May 22nd, 2011

In the first part I tried to set the scene and give you some background of my problem. I’ll try to continue now, and create an interesting and at the same time informative story about performing the refactoring in question.

Most curious of all is that you don’t have to prepare for refactoring. Kanken 20L It can happen no matter what, and the price usually isn’t so high after all. nike air max pas cher Let me remind you – main tool of component design that is dependency injection wasn’t used. Yes, classes do have their responsibilities cleverly defined, and it helps a lot here, because if it weren’t so – whole deal would have to start few steps ‘before’.

I’m not an experienced writer, so I don’t know if I will get the point across, but to me refactoring this was like building level of level of scaffolding, just to use one level to test the next, and at the same time creating scaffolding so it would be used later in a production environment! I guess that there is a name for it, it has to be :)

Step 1:

Creating a duplicate of the main working class, and see how to force the rest of the application to use it when needed.

Say that class name is PCMHasher, and it has following structure (method declarations only):

class PCM_Hash {  public bool Initialize(PCMBufferedReader sourceDataReader);  public uint GetNextHash(); }

My goal was to create alternative class to this. nike air max 1 homme I needed the old class to be able to have some reference to get the results from.

So I created class PCMHash_2. Nike Air Max 2016 Italia That was my first decision – to create same class as before and try to get same results from it, replacing its guts one step at the time.

Using replaced version wasn’t easy, so I took an interface out and derived from it, and created something like:

IPCM_Hash rph; if (!_refactoring)  rph = new PCMHash(); else  rph = new PCMHash_R2();

At this time I would like to re-state the fact that I am in the production all the time, and have to decide on my feet, having to weight out all the implications that would arise from it. nike air max 90 pas cher I am telling that because everyone could see that some dependency injection was to be used here. nike air max 2017 heren grijs But, apart from having to spend much time installing it through the code, I’m not sure how and if it would work anyway.

Why: at a testing time, I want both classes to be able to function side by side.

Refactoring, it’s fun – part 1

May 22nd, 2011

It’s a story of refactoring when code that should be refactored isn’t prepared for it a single bit. If I say prepared I guess that it would mean to have test cases, dependency injection code, and so on. Nike Air Max 2017 Dames roze However, I have none of the above in the original code, just the code that works.

Let me explain what I have, what it does, and where it should end. chaussures de foot puma The purpose of this here refactoring session isn’t about having better performance and keep the same functionality – it is to have same algorithm, already proven in various tests work on different data.

Before (a.k.a. nike air max 2017 zwart now):

Component creates PK_HASH from a sound file. By PK_HASH I would mean “code name for our latest tech that can crunch whole audio file to few bytes later to compare that bytes to bytes crunched from the other file and tell you whether it’s the same sound file”. PK stands for PlayKontrol – the brand name.

So, there are few steps to produce the PK_HASH from sound file:

  • decode and read the file – input is file on the disk, for example .mp3, .wma, .aac, and the output are PCM samples
  • from any kind of PCM samples (stereo, mono, 8-bit, 16-bit) we produce array of shorts that must be rewindable
  • hashing the data and producing the PK_HASH files

Decode and read the file

From the file on disk, that can be any file format that is streamable (we produce files with StreamSink – see the archive example here: http://access.streamsink.com/archive/). That is .mp3, .aac, .wma, .ogg, and whatnot.

Currently it’s done by using simple component that uses DirectShow to create a graph for the audio file, renders that graph and attaches SampleGrabber filter to fetch the samples. Component goes from file on the disk to whole PCM sample data into memory. It’s feasible for 5 minute file (5 x 60 x 4 x 44100 = 50MB). It can work even for 1h files. However, you can *feel* that this approach IS wrong, especially when told that for the rest of the algorithm, you don’t have to have access to the whole PCM data at once.

Rewindable sample array

PCM Samples are promoted to 16 bit (if needed), and channels are downmixed to mono. Again, that is done in memory, so for each PCM sample there are 2 bytes of data that are present in memory as a result of the operation.

Hashing and creating the file

Hashing process needs moving and overlapping window over sample array, and since we have everything in the memory, that is a piece of cake now. We take the data, process it, write it into the another byte array. Nike Air Max Goedkoop Since it’s extremely dense now, I won’t cry about memory at this point, but yeah, it is written into the memory first, and then saved to file on disk.

So here I tried to explain how it works so far. It goes from the encoded audio file to PCM sample data in memory, downmixes that data in memory to one PCM channel, processes the mono PCM samples to obtain PK_HASH and then write it to file.

So what do we actually need?

If you take a peek at the archive you’ll find that every folder has audio files, and also has .hash file for ever audio file that is present in the directory. Please note that not every directory is processed, only 20 of those, because processing consumes CPU intensely, and I have only few PCs laying around to scrub the data. Fjallraven Kanken No.2

Will improve in the future. So, for crunching the archive, even POC (proof-of-concept) is OK, as it serve its needs. It will go through the archive and leave processes PK_HASHes.

Process that goes in parallel is waiting for the PK_HASH file to be created, reads it, and does matching against the database. However, next step should be taken, and it is REALTIME processing.

To be able to process in REALTIME, architecture goes somehow like this:

  • StreamSink is attached to the network stream of any kind, and provides PCM sample output
  • PCM sample output is downsampled and buffered
  • hashing process uses buffered mono PCM samples and outputs results into the stream
  • PK_HASH stream is again buffered and results processed with MATCHER process

StreamSink PCM decoding

StreamSink is the application that does internet media stream capture. It can, however, thanks to feature request from DigitalSyphon, process every media stream and provide PCM samples for it in real-time, in a form of the Stream derived class. So, what part of the process is covered completely.

Buffering PCM samples

Now, new component should be created – something that can buffer PCM samples from the Stream and provide floating, overlapping window reads for hashing process. With some thinking I combined inner workings of Circular Buffer stereotype with something that can be used almost directly in the hasher process – by replacing class implementation only.

Processing and creating PK_HASHes

Hasher process was reading the buffered MEMORY quasi-stream. However, it used kind-of simple interface to read the data, so my luck was that that interface could be extracted and implementation done with buffered stream data. Also, output of the class should be rewritten, since it now doesn’t have any Interface-able part to replace.

And so on – later should be implemented from scratch, so there is no story about refactoring here.

Refactoring pyramid/tower

I can call it pyramid or tower, because after long time of procrastination (subconsciously processing the task at hand) I was able to put my hands on the keyboard and start. My premise was that everything has to be checked from the ground up, because NOW I have the algorithm that produces desired results, and since there are many steps involved, an error in a single step could be untraceable if I don’t check every step along the way.

Tools used

I am kind of old fashioned, so this paragraph won’t be very long.

How to reduce hard drive fragmentation

May 17th, 2011

The topic of drive fragmentation might be a little out in this days, but since I spent great deal of my youth watching PC Tools defragment my drive in a graphically pleasing fashion, I am inclined to think that drive fragmentation (when excessive) can severely reduce both computer performance and hard drive life.

As this might be true for the common day-to-day user, it is particularly true for corporate/enterprises that do need their data to be:

  • accessible,
  • quickly accessible,
  • accessible for a long time

In a common computer use scenario, most of the files are there for computer to read an use, either as software that has to be loaded into memory, or documents that have to be shown to the user. Writing to the hard drive is uncommon operation (when you put it against the number of reads) and thus the drive fragmentation however present is in fact easily ignored.

Continuous stream recording, enter…

In my business (my clients businesses’ to be exact) the hard drives are working in opposite. They WRITE all the time, and read only on occasions. And the problem that will surely lead to fragmentation is that in most situations they need to write MULTIPLE long files continuously. adidas 2017 pas cher Let me try to explain what, first from the aspect of why, then move to what…

When either running VideoPhill Recorder for recording video, or using StreamSink to record internet media streams, in most cases user has MULTIPLE channels recorded on one computer. Files that are created by that recording are commonly created at one time (all of them) and are grown continuously until closed. Since Windows is, as it is now, an operating system that can’t reserve drive space in advance (maybe it can, but software doesn’t know how long the files would be) the space for them will be allocated as the time goes by. If we have 4 files that are written slowly but concurrently (and are grown at the same time), we’ll certainly have the following situation on the hard drive (I’m talking ONLY about the data that is stored here, and am simplifying physical hard drive storage as a continuous slate):

file1_block1
file2_block1
file3_block1
file4_block1
file1_block2
file2_block2
file3_block2
file4_block2
.
.
.
file1_blockN
file2_blockN
file3_blockN
file4_blockN

That means fragmentation. fjallraven kanken soldes File isn’t in continuous blocks, but is scattered in evenly and can’t be read sequentially from the hard drive. fjallraven kanken rugzakken You might be lucky and your blocks could be scattered in a way that sectors on the drive will be adjacent and this won’t pose a problem, but what are the chances? :)

And when file1 gets deleted, what remains on the hard drive? A blocks filled with nothing, left there for other files to fill them. New files will try to fill them, and the drive will soon be completely jumbled. It will all be hidden from you by the OS, but still, OS will have to deal with it.

And that is the story of 4 channels. What about situation when you have 60 channels recorded on one machine (I’m talking about internet stream recording, of course). Such an archive could be found here: http://access.streamsink.com/archive/

If you aren’t convinced that this really IS a problem, you can stop reading now.

Rescue #1 – Drive Partitioning

It is feasible in situations where there is low number of channels that needs to be recorded. If you have 4 channels, you’ll create 4 partitions, and each partition will have nice continuous files written to it. Done.

However, you can’t have 50 partitions on one drive and get away with it.

Rescue #2 – Queued File Moving

Other solution for large number of channels presents itself in a form of a temporary partition for initial file recording, and then moving out the files to their permanent location later, but ONE FILE at a time, in a queue.

Queued Moving of Files in StreamSink

This is implemented in StreamSink, and it even has an ability to throttle data rate when moving the files to another drive. adidas schoenen Only thing that is of a problem here is wasting of a temporary hard drive, because it gets beaten by fragmentation.

Rescue #3 – Using RAM Drive on Method #2

While I was writing the article about NAS, thought flashed across my mind – can we avoid writing to the temporary drive and reduce the load ONCE more?

Yes, we can. Mochilas Kanken No.2 I know that RAM Drives are also out of fashion, but here one will come handy. It’s the shame that support for it isn’t included in the system already, so with little googling I found this: http://www.ltr-data.se/opencode.html/#ImDisk

I installed it on the testing server, re-configured the application to use new temporary folder, and from now on, it runs so smooth I can’t hear it anymore :)

Some technical stuff:

  • in this instance, I am currently recording 62 channels and cumulative rate for it is around 5 megabit/second
  • my files have duration of 5 minutes, which means that recorded chunks are closed and moved to permanent storage every 5 minutes
  • during those 5 minutes, each file will grow so much that the whole content for those 5 minutes won’t get over 200megabytes
  • I created 512 megabyte ram drive, just to be safe

Conclusion

Take care of your hard drive, and don’t dismiss old-techs such as RAM Drives just yet.

If I was about to implement this on an application level, I would have to spend a great deal of time, and some media types won’t even be possible to implement – Windows Media for example, writes to disk or to other places if you employ magic…

Having NAS is great (or is it?)

May 17th, 2011

During the years I had many deliberations over the fact if either NAS would be used or it wouldn’t be used for the video archives created by video logger system such as VideoPhill Recorder. air max pas cher At first, I was firm believer in one methodology, then completely turned my side to the other, and now, … Well, read on, and I’ll take you through it.

Stakes

On the one side of the stake set we have actual requirements, and on the other there are considerations. Actual requirement are sometimes hard to pinpoint at first, but they always come out sooner or later.

So, let me list possible requirements that might be in effect here…

Common low level requirements

Storage for video recording (for logging purposes) needs to have following abilities:

  • low but constant and sequential write rate – data rate for 4 channels are as low as 5mbit per second (500kbytes/sec) but is CONSTANT and SEQUENTIAL – there won’t be much stress for the hard drive because of constant seeking
  • high durability over time – what gets written once, has to be there. Nike Air Max TN Homme It should survive single drive failure
  • reading isn’t common, but when done, it has to be sustainable, but again at low data-rate and great predictability (it usually is sequential)
From everything above, I can guess that any data storage expert would read RAID 5 and won’t allow you to create anything else for the video archive storage.

Archive duration scalability

The archive duration is directly proportional with the hard drive space that is available. To determine what kind of hard drive space you need for your first installation, you can use on-line hard drive size estimator calculator that I created right for this blog.

If you plan extend the duration of your archive one day, you have this requirement, and you have to plan for it. Having the storage at one place can simplify the adding of the drive space, but can also completely block it.

Let’s say that you have the archive of 40 channels that span 92 days (3 months). And let’s say that you decided to use 1mbit video with 128 kbit audio for the archive. By using the calculator above, you’ll find out that you have 42 terabytes of storage already in place. Even at this date, that kind of storage set in one place is kind-of-a challenge to build.

If you have already invested in 42 TB storage system, and have foresight to plan for an upgrade to say its double size for it, you are in luck. But, say that after a few more months (just 3) your management decides to expand the requirement to 12 months of storage. Wow. nike air max 90 pas cher Now, you have to have 127 TB total. If the current system will hold that much drive space, again you are in luck, however – say it doesn’t. Nike Air Max 2016 Heren blauw Your options are:

  • add one more to the chain
  • create a bigger unit, copy everything to it, scrape the current one
I’ll stop my train of thought here, and leave you only with few things to think about before I go on with other requirements: who needs used 82 TB system (if you want to sell it), do you know how much it is to COPY 82 TB even at extreme network speeds, adding one more will break the ‘all in one place’ requirement, …

Having it all in one place (i.e. for web publishing)

If you need everything in one place for the publishing, then this is a solid requirement. Scarpe Adidas Web server will have the content on its local hard drives, and it will publish it smoothly.

But, is that really a requirement? I must admit that I didn’t see web server that properly served files from the network locations (despite the thing that there are option to do that), but I’m sure that IIS and Windows Server gurus will be able to shut me down and say that this is normal thing that is done routinely. So, if we know that each channel recorder has its own directory ANYWAY, what’s the use of having

c:\archive\channel1
c:\archive\channel2
c:\archive\channel3

instead of

\\rec1\channel1
\\rec1\channel2
\\rec2\channel3

Reducing single point of failure

This requirement is very common, and having NAS system as a ‘point’ it brings us that having the NAS leads us to having single point of failure. I understand that there are multiple redundancies that could be installed into the system, such as RAID 5, or obscene configurations such as RAID 1+5. Note: for later, it seems that the article author has a same opinion on it as me:

Recommended Uses: Critical applications requiring very high fault tolerance. In my opinion, if you get to the point of needing this much fault tolerance this badly, you should be looking beyond RAID to remote mirroring, clustering or other redundant server setups; RAID 10 provides most of the benefits with better performance and lower cost. Not widely implemented.

In my words: if you need such system, it’s better to have recording drives distributed on each recording machine, have RAID 5 there, and additionally have NAS (or some other form of storage) to DUPLICATE everything.

Bandwidth issues

At a configuration with 4 channels recorded at one machine, and with above mentioned data rate for video and audio, each machine will produce 5 megabit of content every second. Roughly, that is .5 megabyte. Even ZIP drive could almost handle that. However, if you have 10 times that (for 10 recorders) and have central storage for the whole bunch of channels, that is 5 megabytes of data at a constant rate that never stops.

Consider that central storage in question is NAS has hard drives configured in RAID 5. That means that it will have to receive, calculate parity for, move the drive heads, write to drive, … It will be very busy NAS, and with everything else in mind, it won’t have a second of a break. Add to that occasional reading of the content from the archive diggers, and you’ll soon figure out that the NAS will have to take it all itself.

On the other hand, archive access applications such as VideoPhill Player doesn’t have anything against having the channels on different recorder machines.

Conclusion (for bandwidth issues) – having each recorder machine handle both encoding and storage for 4 channels will reduce single point of stress for both recording and the archive access.

Overall…

Having dumped my intuition in this few paragraphs, I hope that I presented case that is strong enough against having NAS for video logger/archive storage. Nike Air Max Goedkoop Again, everything said is from my experience on the subject, and I’m no storage expert who will talk petabytes, just a simple consultant trying to get my clients best bang for the buck.

And then, I got in… (story of data visualization)

May 15th, 2011

How to see the data?

If the data is numeric, and it represents some series, it will be mostly represented with a graph of some sort. There are hundredths types of graphs available, and they all have some purpose, otherwise they would not exist.

However, for some special occasions, you have to see different kind of data.

The problem (this particular instance)

Since I am developing a internet media streaming CAPTURE and ARCHIVE application (StreamSink) I am also continuously testing it on one of my servers. I am adding channels, removing them, stopping the server, sometimes something goes wrong and the whole thing freezes or crashes, so the archive I have is rather heterogeneous in quality.

Let me go through the operational view – the mere GUI of the StreamSink, so I can present some problems and solutions so far.

StreamSink

Several things were important to the operator of the software that had to be present on the main (status) screen. For example:

  • whole list of channels should be visible
  • channel status should be visible at first glance
  • I am interested what happened to the system recently
  • I need to know the status of my connection
  • it would be good to know how many disk space is available

I could dwell on it but the main point of this post is something else.

The problem here is that I had to create PlayKontrol report for a demonstration purpose (for them: http://ihg.hr/), that would scan 7 days of the archive (multiple channels, of course), and produce the reports (playlists) for 300 songs.

So the problem is: to

find, in the archive that is damaged in various ways, 7 days of continuous archive that spans multiple channels.

The solution (prelude)

Since I am kind of explorer by nature, I wasn’t inclined to use a solution that would present raw data as an answer, but was into thinking about seeing the data and determining the period and channels ‘visually’.

StreamSink has a integrated feature that is called ‘archive report’, that has data similar to what I need, but with it I would only get limited information. You can see the report here:

StreamSink Archive Report

Most useful info on the report in this particular situation would be the graph on the right side of the report. Nike Air Max 2016 Heren wit Let me explain…

For each day StreamSink is able to record up to 24 hours of media. Fjallraven d’Occasion Due to network situations, it sometimes is less then 24 hours, and I decided that I would present that number in the form of percentage that archive is covered for the day. As you can see from the report, that percentage is shown for the whole archive lifetime, for last month, last week and last 24 hours.

Also, it is shown in the form of graph, where on the leftmost part of the graph is the current day, and as we go to the right, we sink onto the past, having divider lines at each 7 days. air max 1 pas cher Nice, eh? :)

But, as nice as that report is, I can’t read what 7 days and what channels are to be scanned – I have to find another way in.

Solution (at last)

For this one, I picked something that I learned from the above mentioned report. Cheap Fjallraven Kanken Outlet That was:

  • I will have a channel list
  • I will have some sort of calendar
  • I have to see how much is covered for the archive for each day

Also I decided to show each day as a cell in a table-style matrix, where rows would be occupied by channels, and columns will be days. Time flow was inverted here, so left is past, and right is the present.

Whole thing looks like this:

Archive Digger

Same thing little zoomed in:

Archive Digger Detail

Note: green is the color for the days that have 90% or more archive covered.

At last, you can see from the both pictures that much of the data is revealed at the first glance. For example, 0 means that there were no archive that day at all. Nike Max Shoes UK Numbers below 90 suggest that either it was some problem with the channel that day, or StreamSink was either started or stopped in the middle of the day.

I could even color-code that information on the chart – but the utility will be expanded further only if there’ll be demand for it, since I know what I needed to know, from it.

BTW, I don’t want to brag here, but to code that utility it took 2-3 hours of thinking and coding, and almost no debugging.

Windows Media Encoder – loss of conection

May 23rd, 2009

When using WME (look at the title) common problem is that out-of the box application doesn’t handle connection problems when using PUSH method. air max pas cher We adressed that problem with VideoPhill Recorder and made so it reconnects automatically every time connection is lost. ray ban femme pas cher So you won’t lose any effort for reconnecting to the server mannually if your connection is shaky.

So, Asics Pas Cher if you have VideoPhill Recorder, nike goedkoop and want to do streaming from the same computer,

Bitrate Calculator

May 22nd, 2009

In contact with customers, or prospective customers, hard drive capacity questions are most common ones. new balance Chaussures To that effect, I prepared two tools: one being PDF file with various bit rates, days, and hard drive spaces, and other being this little calculator that you can use on-line to help yourself see what hard drive you need.

One comment though: manufacturers of hard drives usualy have 1000 where should be 1024. new balance femme blanche 996 That means that 1G is 1.000.000.000 bytes for the manufacturer, and in reality it is 1.073.741.824 bytes. Kopen Nike Air Max 2017 Goedkoop So, after making a calculation with tool below, please add 7.3% to the hard drive space calculated.

Please note: you should enter all values EXCEPT the one marked with radio button. nike air max That values will be calculated when you press ‘calculate’ button.

Here is the table: hard-disk-bitrate

Calculator below can be used to calculate all combinations of values – for example, if you have fixed hard drive space, and fixed days that you need to record, you can use it to estimate bitrate you should use. Under Armour CurryPas Cher Just click on the radio button left of bitrate field, and fill all of the remaining fields. air max soldes On the other hand, if you need to fix your bitrate, and you have disks of fixed capacity, and want to see how many days of recording that will come to, just click on radio button left of days field, fill in rest of the stuff and click calculate.

When experimenting with various drives/bitrates/days etc. you can make yourself a table of values for comparation.

Windows Services to Kill

May 21st, 2009

VideoPhill Recorder is a dedicated software, and it should have it’s own machine. Nike Air Max 90 Pas Cher Pour Homme So, bare Windows XP, all the updates, all the right drivers, and that’s it. baskets ASICS By default, if you open Services in Computer Management, you’ll see that many of them are started, and they also have start-up type set to Automatic. Nike Air Max 1 Heren Which means they start with Windows, regardless of their usage. They eat memory, and some of them eat CPU and other resources. lunette de soleil ray ban So I went further and compiled a list of services that VideoPhill users could and should kill, thus freeing about 400MB memory when system is running. cheap adidas uk Other Windows XP user could do the same, but maybe to lesser extent.

Here’s the list:

  • Automatic Updates
  • Cryptographic Services
  • Distributed Link Tracking Client
  • Error Reporting Service
  • IPSEC Services
  • Net Logon
  • Print Spooler
  • Protected Storage
  • Remote Registry
  • Secondary Logon
  • Security Center
  • System Restore Service
  • Task Scheduler (if not used)
  • Themes
  • WebClient
  • Windows Firewall/Internet Connection Sharing (ICS)
  • Wireless Zero Configuration (if you don’t use WLAN)

I also plan to make a script, batch file that will make all the services have start-up type = manual.

Idea overload

May 16th, 2009

The main thing in the (business) world is to have an idea. Or maybe you should have The Idea. That is what people with ideas think. Mostly they wait for the ideas to spring to life all by themselves.

But what when there are many ideas. And all of them are like beautiful fruits hanging somewhere in the sky. Adidas 2017 Goedkoop So you push one of them, give it a nudge, maybe it’ll ripe and fall.

Enough with the metaphor, and back to the real stuff. I needed an idea about ideas. So called meta-idea. Kanken 16L I needed a way to nurture my ideas and allow new one to come. Nike Air Max 2016 Heren I needed RentACoder.com.

If you know what to do, and how you could do it, but don’t have time on your hands, outsource it. mochilas kanken Every accomplished manager will tell you that. Some rich guy said it like this: “You cannot get rich by being smart, you can get rich by employing smart people”.

Some will say that in order to outsource something, you have to have some money. No body will work for no money. How to solve that? Build a prototype. Use your power of speech to find a sponsor. and as a consequence paperwork web-based smart essay writers or even media web-based

href=”http://www.lakewoodatthegrand.com/images/”>air max pas cher Be a conduit and try to add value between two parties. That is what I am think I am doing right now.

Some examples: currently my partners on RAC come from Ukraine, Macedonia and Russian Federation. In recent endeavours, I had great experience with people from Argentina, France and of course Croatia.