%% Lesson 13: Event-Related fMRI and 'Deconvolution'
% 
% Since the expected fMRI response is the convolution of the impulse
% response with the neuronal response, it's possible work backwards and
% estimate the hemodynamic response that best predicts a measured fMRI
% signal.  This is essentially undoing the convolution process, or
% 'deconvolution'.
%
% Let 'h' be be the vector representing the hemodynamic response function,
% and 's' be the vector of zeros and ones containing the time-points when
% 'events' occured.  A linear system predicts that the fMRI response, y, is
% the convolution of the h and s.  Here's an example:

dt = 1;  %step size (seconds)
maxt = 15;  %ending time (seconds)
th = 0:dt:(maxt-dt);
n = length(th);  %n is the length of the hdr.

k = 1; %seconds
nCascades = 3;
delay = 2; %seconds
h = gamma(nCascades,k,th-delay)';  %make it a column vector

figure(1)
clf
plot(th,h);
xlabel('Time (s)');

% Here's the event sequence:
maxt = 255;  %ending time (seconds)
t = 0:dt:(maxt-dt);
m = length(t);  %m is the length of the event sequence

prob = .5;  %probability of an event for each time point.
s = floor(rand(1,m)+prob)';
y = conv(s,h)*dt;
y = y(1:m);

%% Deconvolution
%
% Suppose, now, that we only know the response, y, and our event sequence,
% s.  How can we reconstruct the hemodynamic response?  Specifically, if
% the hdr has, say, 15 time points, then we need to find the best 15
% numbers that will predict our fMRI response. A brute-force method would
% be to use a search algorithm to find the best 15 numbers that minimizes
% the error between the predicted and the measured fMRI response.
% Fortunately, there's an easier and more efficent way using linear
% algebra.
%
% The trick is to think of this is a giant linear algebra problem with 15
% unknowns and n knowns, where n is the length of the fMRI response.  We
% can see that this is a linear algebra problem by noting that convolution
% can be conducted through matrix multiplication (indeed, this is how
% matlab does it).  This is done by transforming our stimulus sequence, s,
% into a 'design matrix' X that, when multipied by the hdr, performs the
% convolution.
%
% Remember that convolution is the process of multiplying shifted copies of
% the hdr with the input and adding.  Equivalently (since convolution is
% commutative) this can be done with shifted copies of the input instead.
%
% The columns of the design matrix, X, aregenerated by successive shifts of
% the stimulus, like this:

X = zeros(m,n);
temp = s;
for i=1:n
    X(:,i) = temp;
    temp = [0;temp(1:end-1)];
end

figure(1)
clf
imagesc(X)
axis equal
axis off
colormap(gray)

%%
% convolution of s and h is done by multiplying the design matrix X and h
% (as a column vector):

r = X*h;

% we can compare this to matlab's convolution by taking the norm of the
% difference between the two vectors.  A small number means they're the
% same (the norm of a vector is the square root of the sum of squares).

rconv = conv(s,h); rconv = rconv(1:m);
norm(r-rconv)

%%
% Here's where the linear algebra comes in.  if r = X*h, then if this were
% normal algebra we could solve for h by dividing r by X (h = r/X).  When r
% and h are vectors and X is a matrix, this 'division' is done by finding
% the least-squared solution with the 'psueudo-inverse'.  How this works is
% beyond our scope (but elegant). Details can be found in most engineering
% and linear algebra textbooks.

PX = inv(X'*X)*X';

% Matlab has its own function that does this called 'pinv'

PX = pinv(X);

hest = PX*r;

%%
% We'll use 'norm' again to compare hest to h:

norm(hest-h)

%%
% What did we just do?  To recap, we made up a hemodynamic response
% function, h, convolved it with a random stimulus sequence, s, to get an
% fMRI response, r.  We then did the whole thing in reverse and
% reconstructed h from r and s by deconvolving via the pseudo-inverse.
%
% Real fMRI data is noisy.  Let's add independently and identically
% distributed ('iid') noise to the response and see how it affects our
% ability to reconstruct the hemodynamic response

noiseSD = .2;
rnoise = r+noiseSD*randn(m,1);

figure(1)
clf

plot(t,r,'b-');
hold on
plot(t,rnoise,'bo');
legend({'actual response','response+noise'});

hest = PX*rnoise;

figure(2)
clf
plot(th,h,'b-');
hold on
plot(th,hest,'bo','MarkerFaceColor','k')
xlabel('Time');
legend({'actual hdr','reconstructed hdr'});

norm(hest-h)

%%
% You might be surprised how robust our estimate of the hdr is considering
% how much noise we added to the response.  This is because we have so many
% knowns to help constrain the number of unknowns.  As you can guess, the
% longer the event sequence, the better we will be at estimating the true
% hdr.
%
% Go ahead and play with the magnitude of the noise (noiseSD) and see how
% it affects the estimate of the hdr.
%
% It turns out that the average value of norm(h-hest) grows in proportion
% to the standard deviation of the added noise (or equivalently the sums of
% squared error grows in proportion to the variance of the noise)  We can
% demonstrate this with a simulation:

noiseVar = linspace(0,1,21);  %list of noise variances
nReps = 5000;  %number of repetitions per noise variance level

err=  zeros(nReps,length(noiseVar));
for i=1:length(noiseVar)
    for j=1:nReps
        rnoise = r+sqrt(noiseVar(i))*randn(m,1);
        hest = PX*rnoise;
        err(j,i) = norm(hest-h)^2;
    end
end

meanErr=  mean(err);

figure(1)
clf
plot(noiseVar,meanErr,'ro','MarkerFaceColor','r');
xlabel('Noise Variance');
ylabel('SSE');

%% Efficiency of a stimulus sequence
%
% The slope of this line tells us something about the way the ability to
% reconstruct the hdr falls apart with increasing noise.  A lower slope is
% better.  Different event sequences will have different slopes. Try it out
% yourself by running the program with a different kind of sequence.
%
% Since low slopes are good, the inverse of the slope has a special
% meaning, called the 'efficiency' of the event sequence.  Let's calculate
% this number from our simulated data:
%
%Fit a line to the data
p = polyfit(noiseVar,meanErr,1);
bestLine = polyval(p,noiseVar);
hold on
plot(noiseVar,bestLine,'k-');
legend({'Average of norm(h-hest)','Best-fitting line'});


%%
% It also turns out that there's a simple way of calculating the efficiency
% without a simulation.  Here's how:

E = 1/trace(inv(X'*X));

%%
% The trace of a matrix is the sum of its diagonals.  The derivation of
% this equation is, once again, beyond the scope of this lesson (and beyond
% the scope of my full understanding).  But let's just assume it's true.
% This number should be close to the inverse of the slope from the
% simulation:

title(sprintf('1/Slope = %5.3g, Efficiency = %5.3g',1/p(1),E));

%%
% So we now have an easy way to assess how well a given event sequence will
% be able to reconstruct the hemodynamic response (assuming that the
% 'hemodynamic coupling' process is linear, and that noise is additive,
% which can't be completely right).
%
% Here's something interesting: the calculation of efficiency doesn't
% depend on the shape of the hdr.  It's kind of an amazing fact if you
% think about it.  We could have chosen any crazy shape for the hdr and the
% ability to reconstruct it from noise data will be exactly the same.  This
% is a very convenient fact, too, since we don't know the actual shape of
% the hdr (it's what we're trying to find out).
%
%% m-sequences
%
% There is one kind of sequence that has the greatest efficiency - the
% m-sequence.  M-sequences (or maximum-length shift register sequences)
% were originally developed about 50 years ago and have been used in
% encryption, error-correcting codes - any time you want to generate a
% signal that is minimally corruptible by noise.  Generation of m-sequences
% is not intuitive - it involves prime numbers and modulo arithmetic.  But
% the result is a sequence that has a zero autocorrelation function.  I've
% provided a function 'mseq' that generates these sequences for you. 'mseq'
% takes in a minimum of two parameters.  The first is the number of event
% types (here we've only talked about zeros and ones, so the number is 2,
% since blank trials count as an event type).  The second is a 'power
% value' that determines the length of the sequence, which will be
% nTypes^(powerVal-1).
%
% A more thorough discussion of m-sequences applied to event-related fMRI
% can be found here:
%
% G.T. Buracas and G.,M. Boynton (2002), "Efficient Design of Event-Related
% fMRI Experiments Using M-Sequences" , NeuroImage 16, 801-813

powerVal = 8;
ms = mseq(2,powerVal);

X = zeros(m,n);
temp = ms;

%Calculate the efficiency:
for i=1:n
    X(:,i) = temp;
    temp = [0;temp(1:end-1)];
end
Emseq = 1/trace(inv(X'*X));

disp(sprintf('Efficiency of m-sequence: %5.3g',Emseq));

%This number should be greater than our example random sequence (it had
%better be anyway).

%% Multiple event types
%
% Event-related fMRI often involves more than one type of stimulus in a
% given scan.  Stimuli may include, for example, a parametric range of
% stimulus values, like contrast.  Or they may be different cognitive or
% attentive tasks.  There's an easy way to incorporate multiple event types
% into your design and analysis.  The trick is to simply concatenate the
% separate design matrices horizontally (with zeros and ones in each
% matrix).  The resulting estimate of the hemodynamic response will be a
% single vector with the estimate of the response to each condition in
% succession.  We'll demonstrate this by creating four 'fake' hdrs,
% generating a response with convolution, adding noise, and reconstructing
% the hdrs.

amps = [1,2,3,4];  %amplitudes of the hdr for the four event types (e.g. for increasing stimulus contrast).

h = zeros(n,4);
for i=1:4
    delay = 2; %seconds
    h(:,i) = amps(i)*gamma(nCascades,k,th-delay)';  %make it a column vector
end

figure(1)
clf
plot(th,h);
xlabel('Time (s)');

%%
%get an m-sequence with four event types - plus null events.  This vector
%will have values from 0 to 4 corresponding to what happens on each time
%step.

s = mseq(5,3);
m = length(s);
t = 0:dt:(m-1);
%%
%Generate a concatenated design matrix
X = [];
for j=1:4
    Xj = zeros(m,n);
    temp = s==j;

    for i=1:n
        Xj(:,i) = temp;
        temp = [0;temp(1:end-1)];
    end
    X = [X,Xj];
end

figure(1)
clf
imagesc(X)
axis equal
axis off
colormap(gray)

r = X*h(:);

%add noise
noiseSD = .25;
rNoise = r+noiseSD*randn(size(r));

hest = pinv(X)*rNoise;

%%
% The resulting hdr estimates are all in one long vector.  To show them in
% the form of the original hdr we can reshape this vector back into an nx4
% matrix:

hest = reshape(hest,n,4);

figure(1)
clf
plot(th,h);
hold on
plot(th,hest,'o');
xlabel('Time (s)');

%% Other regressors:
%
% In addition to the additive-type noise, fMRI signals are contaminated
% with low-frequency noise, often characterized as linear trend either up
% or down, along with a large y-intercept.  The y-intercept is just the
% mean of the image which doesn't carry any functional information (the
% mean is a T2-weighted image that contains structural information,
% however).  The linear trend is more troublesome - it can dominate the
% time-course of a voxel when viewed by eye, yet its causes aren't well
% understood.  So we just subtract it out.  Scary?  Yes.
%
% It's easy to remove these trends in the same step as the deconvolution
% process.  This is because each column in the design matrix 'X' represents
% the time-course to be explained by each free parameter.  The first column
% is for the first time-point in the hdr, for example.  To regress out a
% DC, then, we simply add a column of 1's to the end of the design matrix.
% The resulting last parameter will be the best-fitting parameter for the
% DC (which will be the mean). Similarly, if we add a ramp to X, the
% regressor will be the best fitting scale factor that makes that ramp fit
% the data.  We don't use these parameters, instead by adding them to the
% matrix their influences get factored, or regressed out.  
%
% First, let's see how a DC and trend screws up our estimation of the hdr
% before regressing them out:

rtrend = r+100 + linspace(0,10,m)';

figure(1)
clf
plot(t,rtrend);
xlabel('Time (s)');

%%
% Now THAT looks more like fMRI data.  Here's the estimated hdr:

hest = pinv(X)*rtrend;
hest = reshape(hest,n,4);

figure(1)
clf
plot(th,h);
hold on
plot(th,hest,'o');
xlabel('Time (s)');

%%
% What a mess.  Let's add a column of ones to X to deal with the DC:

X(:,n*4+1) = ones(m,1);
hest = pinv(X)*rtrend;

%%
% The DC regressor is the last element in hest.  It should be about 100
dc = hest(end)

hest = reshape(hest(1:n*4),n,4);

figure(1)
clf
plot(th,h);
hold on
plot(th,hest,'o');
xlabel('Time (s)');

%%
% Better, but we have to get rid of the linear trend, too

X(:,n*4+2) = 0*linspace(0,1,m)';

hest = pinv(X)*rtrend;
lowFac = hest(end-1:end);
hest = reshape(hest(1:end-2),n,4);

figure(1)
clf
h1=  plot(th,h);
hold on
h2= plot(th,hest,'o');
xlabel('Time (s)');



%% 
% It's back.  You can see how important it is to get rid of these factors -
% the DC is essential because the deconvolution process only works if you
% either remove the DC via regression, or if you remove it ahead of time by
% subtracting out the mean.  
%
% In this discussion of deconvolution, we assumed that the fMRI response is
% a linear transformation of the neuronal response with iid noise.  It's
% known that the noise in the signal has temporal correlations in it.  For
% a discussion on how to deal with that via deconvolution, see an excellent
% paper by Anders Dale:
%
% Dale, A. (1999) "Optimal Experimental Design for Event-Related fMRI",
% Human Brain Mapping 8:109–114

%% Exercises
%
% 1) The efficiency of a random sequence depends on the probability of an
% event occuring on a given trial (the variable 'prob' above).  Plot a
% graph of efficiency as a function of prob (between 0 and 1) to see what
% probability yields the most efficient sequence.
%
% 2) Generate a bunch of random 50% probability event sequences to see if
% you get anything close to the efficiency of an m-sequence.  Do you think
% this random procedure is worthwhile?  What are its advantages?
%
% 3) Removing the DC and linear trend is one way of removing
% 'low-frequency' factors in the fMRI signal.  Using what we learned from
% Lesson 12, how would you remove low-frequency stuff using the frequency
% domain?  Many fMRI software packages do this sort of thing.