Data Science With RapidMiner: August 2011

Monday 15 August 2011

Visualizing discrete wavelet transforms: part II

Here is a process that takes the discrete wavelet transform (it happens to be the Daubechies 4 wavelet in this case rather than the Haar but the results are similar) of some fake data and plots the corresponding results. This is different from the maximum overlap discrete wavelet transform from the previous post.

The result looks like this.

(the z2 attribute is plotted as the colour using log scaling)

The bottom row is the clean signal (with scaling to make it show up), the 2nd row from the bottom is the noisy signal and the third row from the bottom is the result of the discrete wavelet transform. This is only included for completeness since it does not correspond in the original domain to the signal. For an interpretation of this I found the following in the code of the discrete wavelet transform (in file DiscreteWaveletTransformation.java)

In the case of the "normal" DWT the output value series has just one dimension, and these
coefficients are to be interpreted as follows: the first N/2 coefficients are the wavelet
coefficients of scale 1, the following N/4 coefficients of scale 2, the next N/8 of scale
4 etc. (dyadic subsampling of both the time and scale dimension). The last remaining
coefficient is the last scaling coefficient.

So this means the 4th row from the bottom corresponds to the coefficients of scale 1, the 5th row to scale 2 and so on. The coefficients have been replicated as many times as required to match the x scale.

Note that this differs from the view tradionally presented in the literature where the high frequencies are presented at the top. That's an exercise for another day.

The main difference between this and the previous MODWT example is the unpacking of the DWT result. For this I used a Groovy script. This takes 2 example sets as input and copies the correct parts of the first (the DWT result) into the second (the output that will eventually be de-pivoted).

The output example set that is fed into the Groovy script is created using a "Generate Data" operator since I found this to be the easiest way to generate the example set with the right number of rows and columns.

As before, the graphic shows that the algorithm has seen the presence of the low frequency signal at the expected location from x = 5000 and there is perhaps a hint that something has been spotted at x = 100.

Thursday 11 August 2011

Visualizing discrete wavelet transforms

RapidMiner can transform data using wavelet transforms within the value series extension. As part of my endeavour to learn about these I made a process that allows visualisation of the results of a MODWT transform. It's intended to show at a glance what the transformation has done to the data.

Amongst others, it uses the "data to series", "series to data" and "de-pivot" operators and of course the "discrete wavelet transform".

The process creates some fake data consisting of 8192 records. A high frequency square wave is located from position 100 to 600 and a lower frequency wave is located from 5000 to 5500. A significant amount of noise is also added to hide the signal.

If you plot the results of the de-pivot operation and use the block plotter, choose x, y and z2 and set the z2 axis to a log scale, you should see something like this.

The bottom row corresponds to the pure signal (note its amplitude has been scaled to make it show up better), the next row up the noisy signal and all the rows above that correspond to the different output resolutions of the MODWT transform. The top row is the average for all the signals and should be 0 owing to the normalisations performed on the input data. All of this is produced from the MODWT output using the de-pivot operator after a certain amount of joining gymnastics.

The plot shows that the transform has detected a match from the 5000 point for the original signal. The signal from 100 is not so obvious.

The individual outputs from the MODWT operation are also available. Here for example is a plot of the 6th output (i.e. the 8th row in the graphic above).

Compare this with the raw noisy data.

Clearly there is something in the data and the transform is able to isolate this to a certain extent.

My next process will be one to visualise the DWT rather than the MODWT output.

Sunday 7 August 2011

Finding processes in the repository

I have a lot of processes and it's sometimes hard to remember the name of one containing something useful. A metadata view and search capability of the repository would be very useful.

While that is being invented, here's a process that scans through a repository and determines the name and location of the processes it finds. It also counts the number of operators used to give a sense of how big the process is. The process recursively scans files with extension ".rmp" from a starting folder. Various document operators including "extract information" and "cut documents" using Xpath are used to process the files and extract information from them. The "aggregate" operator is used to count operators.

One day I will make it into a report that can be viewed using RapidAnalytics.

Other data is also extracted and this could be used if desired.

The default location is the default Windows location of the samples processes; change this to the folder where your repository is.

Some gymnastics were required to get the loop files operator to work correctly. This seems to present zero sized directories when iterating so I was forced to use the "branch" operator to eliminate these. Pointing the loop files operator to the C:\ drive usually results in an error that I haven't got to the bottom of yet.

Data Science With RapidMiner

Search this blog

Monday 15 August 2011

Visualizing discrete wavelet transforms: part II

Thursday 11 August 2011

Visualizing discrete wavelet transforms

Sunday 7 August 2011

Finding processes in the repository

About Me

Labels

Blog Archive