Understanding style transfer

‘Style transfer’ is a method based on deep networks which extracts the style of a painting or picture in order to transfer it to a second picture. For example, the style of a butterfly image (left) is transferred to the picture of a forest (middle; pictures by myself, style transfer with deepart.io):

Stylus_butterfly

Early on I was intrigued by these results: How is it possible to clearly separate ‘style’ and ‘content’ and mix them together as if they were independent channels? The seminal paper by Gatys et al., 2015 (link) referred to a mathematically defined optimization loss which was, however, not really self-explanatory. In this blog post, I will try to convey the intuitive step-by-step understanding that I was missing in the paper myself.

The resulting image (right) must satisfy two constraints: Content (forest) and style (butterfly). The ‘content’ is well represented by the location-specific activations of the different layers of the deep network. For the ‘style’, Gatys et al. suggest to calculate the joint activation patterns, i.e., correlations between activation patterns of different feature maps. These correlation matrices are mathematically speaking the Gram matrices of the network layers. This means that the Gram matrices of the butterfly image (left) and of the resulting style-transferred image (right) should be optimized to be as similar as possible. But what does this Gram matrix actually mean? Why does this method work?

A slightly better understanding comes with an earlier paper of Gatys et al. on texture synthesis (link). From there it becomes clear that the Gram matrix does not appear from nowhere but is inspired by comparably old-fashioned texture synthesis papers, especially by Portilla and Simoncelli, 2000 (link). This paper deals with the statistical properties that define what humans perceive consistently as the same texture, and the key word here is ‘joint statistics’. More precisely, they argue that it is not sufficient to look at the distributions of features (like edginess or other simple filters), but at the joint occurrence of features. This could be high spatial frequencies (feature 1) co-occurring with horizontal edges (feature 2). Or small spirals (feature 1) co-occurring with a blue color (feature 2). Co-occurences can be intuitively quantified by using the spatial correlation between each pair of feature maps correlations, since correlations are simply a measure of similarity between two (or more) things. As an important side-effect of the inner product associated with the correlation, the Gram matrix is invariant to the positions of features, which makes sense in the context of textures.

On a sidenote, Portilla and Simoncelli are not the first to have had a close look at joint statistics of textures. This is going back at least to Béla Julesz (1962), who conjectured that two images with the same second order statistics (= joint statistics) have textures that are indistinguishable for humans. (Later, he disproved his own conjecture based on counterexamples, but the idea of using joint statistics for texture synthesis remained useful.)

In old-school texture synthesis, features were handcrafted and carefully selected. When working with deep networks, features are much more numerous: they are simply the activation patterns of the layers. Each layer of a deep network consists not only of a single representation or feature map, but many of them (up to 100s or 1000s). Some are locally activated by edges, others by colors or parallel lines, etc. For the visualizations shown below, I’ve set up a Jupyter notebook to make it as transparent as possible. All of it is based on a GoogLeNet, pre-trained on the Imagenet dataset. Here are the feature maps (= activation patterns) of four input pictures (four columns). Green indicates high activation, blue low activation.

Activations

For the textures of cracked concrete, the feature maps 4 and 5 (second and third rows) are very similar to each other (correlated) and 15 is highly dissimilar (anti-correlated). Feature map 15 seems to have learned to detect large, bright and smooth surfaces like clouds. Therefore, the Gram matrix entry for the feature pair [4,5] will be consistently high for input images of cracked concrete, but low for cloud images. These are only few examples, but I think it makes pretty clear why correlations of feature maps are a better indicator of a texture than the simple mean activation of single feature maps.

To complement this analysis, I generated an input on the right-most column that optimizes the activation of the respective layers (see below for an explanation how I did this). Whereas feature maps 4 and 5 show edges and high-frequency structures, feature map 15 seems to prefer smooth textures.

Next, let’s have a look at how a full-blown Gram matrix looks like! But which layer would choose for this analysis?  I’m using a variant of the Inception network/GoogLeNet here, which seems to be a little bit less well-suited for style transfer than the VGG network typically used for style transfer. To find out a layer that is indicative of style, I applied 20 images of cloud textures and 20 images of cracked concrete. Then I measured both the confusion matrix of the Gram matrices for each layer, allowing to find the layer that optimally distinguishes these two textures (it is layer ‘mixed4c_3x3_pre_relu/conv’, more details are in the Jupyter notebook). As inputs, I have used greyscale images to prevent the color channels from dominating similarity measurements.

GreyPics

For the 16 inputs above, here come the 16 corresponding 256×256 Gram matrices of the chosen layer, arising from 256 feature maps. To clarify the presentation, I have rearranged the features in the matrices to highlight the clustering. The x- and y-axes of each matrix can be interpreted as the features of this layer, and the clustering highlights some of the feature similarities.

Clustering

From that, it is quite clear that all cloud pictures display similar Gram matrices. The lower two rows with pictures of cracked concrete exhibit a more or less common pattern as well, which in turn is distinct from the cloudy Gram matrices.

As is clearly visible, the feature space is rather large. Therefore, since the contribution of single features is small, it does not make sense to look e.g. at a single feature pair that is highly correlated for clouds and anti-correlated for cracked concrete. Instead, let’s reduce the complexity and have a look at the clusters shown above.

To understand what those clusters of feature maps are encoding, I used the deep dream technique, based on a Jupyter notebook by Alexander Mordvintsev. Basically, it uses gradient descent on the input activations of the network to compute an image that evokes high activity in the respective feature maps. This yields the following deep dreams, starting from a random noise input. The feature maps, of which the activation has been optimized, correspond to the clusters 4, 5 and 7 shown above in the Gram matrices (yellow highlights).

ClusteredDreams

Cluster 4 clearly prefers smooth and cloudy inputs, whereas cluster 5 likes smaller tiles, separated by dark edges. However, it is difficult to say what the network makes out of those features. First, they will interact with other feature maps. Second, the Gram matrix analysis does not tell whether the feature map clusters are active at all for a given input, or at which locations. Second, as mentioned before, textures are not determined by patterns of feature activations, but by correlations of feature activations.

So let’s go one step further and modify the deep dream algorithm in order to maximize the correlational structure within a cluster of features in a layer, instead of the simple activations of the features. Here comes the result for cluster 4, with the deep dream maximizing either the activity in this cluster of features (left) or maximizing the correlational structure across features within in this cluster (right).

ContentVsStyle

The result is, maybe surprisingly, little informative. It shows that the texture of clouds is not located in a single cluster of the Gram matrix (which is optimized for in the right-hand image), but distributed across the full spectrum of features, and probably also across several layers.

Together, the analysis so far has shown how Gram matrices look like, how they cluster and to how these clusters can be interpreted. However, the complexity and the distributed nature of computations in the network make it very difficult to intuitively understand what is going on and to predict what would happen to specific layers or feature maps or Gram matrices when exposed to a given input picture.

To sum it up, correlated features (= Gram matrices) can be used to compare the textures of two images and can be employed by a loss function to measure texture similarity. This works both for texture synthesis and style transfer. As a byproduct, the correlation matrix of feature maps, the Gram matrix, can be used to understand how the feature space is divided up by a bunch of clusters of similarly tuned channels. If you want to play around with this, my Jupyter notebook on Github could be a good starting point.

An interesting aspect is the fact that joint statistics – a somewhat arbitrary and empirical measurement – are sufficient to generate textures that seem natural to humans. Would it not be a good idea for the human brain, when it comes to texture instead of object recognition, to read out correlated activity of ‘feature neurons’ of the same receptive field and them simply average over all receptive fields? The target neurons that read out co-active feature neurons would thus see some the Gram matrix of the feature activations. There is already work in experimental and theoretical neuroscience that goes somewhat into this direction (Okazawa et al., 2014, link, short summary here).

For further reading, I can recommend Li et al., 2017 (link), who reframe the Gram matrix method by describing it as a Maximum Mean Discrepancy (MMD) minimization with a specific kernel. In addition, they show that other kernels are also useful to measure distances between feature distributions, thereby generalizing the style transfer method. (On the other hand, this paper did not really improve my intuitive understanding of style transfer.)
For an overview of implementations of the style transfer method, there is a nice and recent review on style transfer by Ying et al., 2017 (link). It is not really well-written, but very informative and concise.

Posted in Data analysis, machine learning | Tagged , , , | Leave a comment

Can two-photon scanning be too fast?

The following back-of-the-envelope calculations do not lead to any useful result, but you might be interesting in reading through them if you want to get a better understanding of what happens during two-photon excitation microscopy.

The basic idea of two-photon microscopy is to direct so many photons onto a single confined location in the sample that two photons interact with a fluorophore roughly at the same time, leading to fluorescence. The confinement in time seems to be given by the duration of the laser pulse (ca. 50-500 fs). The confinement in space is in the best case given by the resolution limit (let’s say ca. 0.3 μm in xy and 1 μm in z).

However, since the laser beam is moving around, I wondered whether this may influence the excitation efficiency (spoiler: not really). I thought that his would be the case if the scanning speed in the sample is so high that the fs-pulse is stretched out so much that it spreads over a distance that is greater than the lateral beam size (0.3 μm FWHM).

For normal 8 kHz resonant scanning, the maximum speed (at the center of the FOV) times the temporal pulse width is, assuming a large FOV (1 mm) and a laser pulse that is strongly dispersed through optics and tissue (FWHM = 500 fs):

Δx1 = vmax × Δt = 1 mm × π × 8 kHz × 500 fs = 0.01 nm

This is clearly below the critical limits. Is there anything faster? AOD scanning can run at 100 kHz (reference), although it can not really scan a 1 mm FOV.  TAG lenses are used as scanning devices for two-photon point scanning (reference) and for two-photon light sheet microscopes (reference). They run at up to 1000 kHz sinusoidal. This is performed in the low-resolution direction (z) and usually covers only few hundred microns, but even if it were to cover 1 mm, the spatial spread of the laser pulse would be

Δx1 = 1 mm × π × 1000 kHz × 500 fs = 1 nm

This is already in the range of the size of a typical genetically expressed fluorophor (ca. 2 nm or a bit more for GFP), but clearly less than the resolution limit.

However, even if the infrared pulse was smeared over a couple of micrometers, excitation efficiency would still not be decreased in reality. Why is this so? It can be explained by the requirement that the two photons arriving at the fluorophor have to be absorbed almost ‘simultaneously’. I was unable to find a lot of data on ‘how simultaneous’ this must be, but this interaction window in time seems to be something like Δt < 1 fs (reference). What does this mean? It reduces the true Δx to a fraction of the above results:

Δx2 = 1 mm × π × 1000 kHz × 1 fs = 0.003 nm

Therefore, smearing the physical laser pulses (Δx1) does not really matter. What matters, is the smearing of the temporal interaction window Δt over a spatial distance larger than the resolution limit (Δx2). This, however, would require a line scanning frequency in the GHz range – which will never, ever happen. The scan rate must always be significantly higher than the repetition rate of pulsed excitation. The repetition rate, however,  is limited to <500 MHz due to fluorescence lifetimes of >1-3 ns. Case closed.

Posted in Imaging, Microscopy | Tagged , , , , | 4 Comments

The basis of feature spaces in deep networks

In a new article on Distill, Olah et al. write up a very readable and useful summary of methods to look into the black box of deep networks by feature visualization. I had already spent some time with this topic before (link), but this review pointed me to a couple of interesting aspects that I had not noticed before. In the following, I will write about one aspect of the article in more depth: whether a deepnetwork encodes features rather on a neuronal basis, or rather on a distributed, network basis.

‘Feature visualizations’ as discussed here means to optimize the input pattern (the image that is fed into the network) such that it maximizes the activity of a selected neuron somewhere in the network. The article discusses strategies to prevent this maximization process from generating non-naturalistic images (“regularization” techniques). On a sidenote, however, they also asks what happens when one optimizes the input image not for a single neuron’s activity, but for the joint activity of two or more neurons.

Colah1

Joint optimization of the activity of two neurons. From Colah et al., Distill (2017) / CC BY 4.0.

Supported by some examples, and pointing at some other examples collected before by Szegedi et al., they write:

Individual neurons are the basis directions of activation space, and it is not clear that these should be any more special than any other direction.

It is a somehow natural thought that individual neurons are the basis of coding/activation space, and that any linear combination could be used for coding equally well as any single neuron-based representation/activation. In linear algebra, it is obvious that any rotation of the basis that spans the coding space does not change anything about the processes and transformations that are taking place in this space.

However, this picture breaks down when switching from linear algebra to non-linear transformations, and deep networks are by construction highly non-linear. My intuition would be that the non-linear transformation of inputs (especially by rectifying units) sparsens activity patterns with increasing depth, thereby localizing the activations to fewer and fewer neurons, without any sparseness constraint during weight learning. This does not necessarily mean that the preferred input images of random directions in activation space would be meaningless; but it would predict that the activation patterns of to-be-classified inputs are not pointing into random directions of activation space, but have an activation direction that prefers the ‘physical’, neuronal basis.

I think that this can be tested more or less directly by analyzing the distributions of activation patterns across layers. If activation patterns were distributed, i.e., pointing into random directions, the distribution would be rather flat across the activation units of each layer. If, on the other hand, activation directions were aligned with the neuronal basis, the distribution would be rather skewed and sparse.

Probably this needs more thorough testing than I’m able to do by myself, but for starters I used the Inception network, trained on the ImageNet dataset, and I used this Python script on the Tensorflow Github page as a starting point. To test the network activation, I automatically downloaded the first ~200 image hits on Google for 100×100 JPGs of “animal picture”, fed it into the network and observed the activation pattern statistics across layers. I uploaded a Jupyter Notebook with all the code and some sample pictures on Github.

The result is that activation patterns are sparse and tend to become sparser with increasing depth of the layers. The distribution is dominated by a lot of zero activations, indicating a net input less or equal to zero. I have excluded the zeros from the histograms and instead given the percentage of non-zero activations as text in the respective histogram. The y-axis of each histogram is in logscale.

figure_16

It is also interesting that the sparseness decreases with depth, but reaches a bottleneck at a certain level (here from ‘mixed_7’ until ‘mixed_9’ – the mixed layers are inception modules) and becomes less sparse afterwards when approaching the (small) output layer.

A simple analysis (correlation between activation patterns stemming from different input images) shows that de-correlation (red), that is, a decrease of correlation between activations by different input images, is accompanied by sparsening of the activation levels (blue):

figure_23

It is a bit strange that the network layers 2, 4 and 6 generate sparser activations patterns than the respective previous layers (1, 3 and 5), accompanied by less decorrelated activity. It would be interesting to analyze the correlational structure in more depth. For example, I’d be curious to understand activation patterns of input patterns that lead to the same categorization in the output layer, and to see from which layer on they start to exhibit correlated activations.

Of course there is a great body of literature in neuroscience, especially theoretical neuroscience, that discusses local, sparse or distributed codes and the advantages and disadvantages that come with it. For example, according to theoretical work by Kanerva, sparseness of memory systems helps to prevent different memories from interfering too much with each other, although it is unclear until now whether something similar is implemented in biological systems (you would find many experimental papers with evidence in favor and against it, often for the same brain area). If you would like to read more about sparse and dense codes, Scholarpedia is a good starting point.

Posted in machine learning, Network analysis, Neuronal activity | Tagged , , , , | 1 Comment

All-optical entirely passive laser scanning with MHz rates

Is it possible to let a laser beam scan over an angle without moving any mechanical parts to deflect the beam? It is. One strategy is to use a very short-pulsed laser beam: A short pulse width means a finite spectral width of the laser (->Heisenberg). A dispersive element like a grating can then be used to automatically diffract the beam into smaller beamlets which in turn can somehow be used to scan or de-scan an object. This technique is called dispersive fourier transformation, although there seem to be different names for only slighly different methods. (I have no experience in this field and am not aware of the current state of the art, but I found this short introductory review useful as a primer.)

Recently, I stumbled over an article that describes a similar scanning technique, but without dispersing the beam spectrally: Multi-MHz laser-scanning single-cell fluorescence microscopy by spatiotemporally encoded virtual source array. First I didn’t believe this could be possible, but apparently it is. In simple words, the authors of the study have designed a device that uses a single laser pulse as an input and outputs several laser pulses, separated in time and with different propagation directions – which is scanning.

Wu et al. from the University of Hong Kong describe their technique in more detail in an earlier paper in Light Science & Applications, and in even more detail in its supplementary information, which I found especially interesting. First, it looked like a Fabri-Pérot interferometer to me, but it is actually completely different and is not even based on wave optics.

The idea is to shoot an optically converging pulsed beam (e.g. coming from an ultra-fast Ti:Sa laser) into an area that is bounded by two mirrors that are almost parallel, but slightly misaligned by an angle α<1°. The authors call these two misaligned mirrors a ‘FACED device’. Due to the misalignment, the beam will be reflected multiple times, but come back once it hits the surface orthogonally (see e.g. the black light path below). Therefore, the continuous spectrum of incidence angles will be automatically translated into a discrete set of mini-pulses coming out of this device, because either a part of the beam gets reflected 14 times, or 15 times – obviously, there is no such thing as 14.5 reflections, at least in ray optics. This difference of 1 in number of reflections makes the 15-reflection beam spend more time in the device, Δt ≈ 2S/c, with S being the separation of the two mirrors, and c the speed of light.

It took me some time to understand how this works and how these pulselets coming out of the FACED device look like, but I have to admit that I find it really cool. The schematic drawings in the supplementary information, especially figures S1 and S5, are very helpful for understanding what is going on.

ScanSchemeSchematic drawing (adapted) from Wu et al., LS&A (2016) / CC BY 4.0.

As the authors note (without showing any experiments), this approach could be used for multi-photon imaging as well. It is probably true that there are some hidden difficulties and finite size-effects that make an implementation of this scanning technique challenging in practice, but let’s imagine for one minute how this could look like.

Ideally, we want laser pulses that are spaced with a temporal distance of the flourescence lifetime (ca. 3 ns) in order to prevent temporal crosstalk during detection. This would require the two FACED mirrors to be spaced by S = 50 cm, according to the formula mentioned above. Next, we want to resolve, say, 250 points along this fast-scanning axis, which means that the FACED device would need to split the original pulse into 250 delayed pulselets. The input pulsed beam therefore would need to have a pulse repetition rate of ca. 1.3 MHz (which is then also the line scanning frequency), and each of those pulses would need enough power for the whole line scan.

How long would the FACED mirrors need to be? This is difficult to answer, since the answer depends on the divergence angle of the input pulsed beam that hits the FACED device, but I would guess that it needs to be a couple of meters long, given the spacing of the mirrors (50 cm) and the high number of pulselets that are desired (250). (In a more modest scenario, one could envision to split up one pulse of 80 Mhz in only 4 pulselets, thereby achieving multiplexing of additional regular scanning similar to approaches described before.)

However, I would also ask myself whether the created beamlets are not too much dispersed in time, thereby precluding the two-photon effect. And I also wonder how all this behaves like when transitioning from geometric rays to wave optics. Complex things might happen in this regime. – Certainly a lot of work is required to transition this from an optical table to a biologist’s microscope, but I hope that somebody accepts this challenge and maybe, maybe replaces the kHz scanners of typical multi-photon microscopes by a device that achieves MHz scanning in a couple of years.

Posted in Calcium Imaging, Imaging, Microscopy | Tagged , , | Leave a comment

The most interesting machine learning AMAs on Reddit

It is very clear that Reddit is part of the rather wild zone of the internet. But especially for practical questions, Reddit can be very useful, and even more so for anything connected to the internet or computer technology, like machine learning.

In the machine learning subreddit, there is a series of very nice AMAs (Ask Me Anything) with several of the most prominent machine learning experts (with a bias for deep learning). To me, as somebody who is not working directly in the field, but nevertheless curious about what is going on, it is interesting to read those experts talking about machine learning in a less formal environment, sometimes also ranting about misconceptions or wrong directions of research attention.

Here are my top picks, starting with the ones I found most interesting to read:

  • Yann LeCun, director of Facebook AI research, is not a fan of ‘cute math’.
    .
  • Jürgen Schmidhuber, AI researcher in Munich and Lugano, finds it obvious that ‘art and science and music are driven by the same basic principle’ (which is ‘compression’).
    .
  • Michael Jordan, machine learning researcher at Berkeley, takes an opportunity ‘to exhibit [his] personal incoherence’ and describes his interest in Natural Language Processing (NLP).
    .
  • Geoffrey Hinton, machine learning researcher at Google and Toronto, thinks that the ‘pooling operation used in convolutional neural networks is a big mistake’.
    .
  • Yoshua Bengio, researcher at Montreal, suggests that the ‘subject most relevant to machine learning’ is ‘understanding how learning proceeds in brains’.
    .

And if you want more of that, you can go on with Andrew Ng and Adam Coates from Baidu AI, or Nando de Freitas, a scientist at Deepmind and Oxford. Or just discover the machine learning subreddit yourself.

Enjoy!

P.S. If you think that there might be similarly interesting AMAs with top neuroscientists: No, there aren’t.

Posted in Data analysis, machine learning | Tagged , , , | Leave a comment

How deconvolution of calcium data degrades with noise

How does the noisiness of the recorded calcium data affect the performance of spiking-inferring deconvolution algorithms? I cannot offer a rigorous treatment of this question, but some intuitive examples. The short answer: If a calcium transient is not visible at all in the calcium data, the deconvolution will miss the transient as well. It seems that if the signal-to-noise drops below 0.5-0.7, the deconvolution quickly degrades.

To make this a bit more quantitative, I used an algorithm based on convolutional networks (developed by Stephan Gerhard and myself; you can find it on Github, and it’s described here) and a small part of the Allen Brain Observatory dataset.

I assumed that the standard deviation of the raw calcium traces measures ‘Signal’ (a reasonable approximation), and I took the standard deviation of the Gaussian noise that I added on top as ‘Noise’. Then I deconvolved both noisified and unchanged calcium traces and computed the correlation of the spiking traces of calcium+noise vs. calcium alone. If the correlation (y-axis) is high, the performance of the algorithm is not much affected by the noise. The curve is dropping steeply at a SNR of 0.5-0.7.

CorrVsSNR

To get some intuition, let’s give some examples, left the calcium trace plus Gaussian noise, right the deconvolved spiking probabilities (numbers to the left indicate SNR and correlation to ground truth, respectively):

Neuron3

The next example was perturbed with the same absolute amount of noise, but due to the larger signal, the spike inference remained largely unaffected for all but the highest noise levels.

Neuron20

The obvious thing to note is the following: When transients are no longer visible in the calcium trace, they disappear in the deconvolved traces as well. I’d also like to note that both calcium timeseries from the examples above are from the same mouse, the same recording, and even the same plane, but the SNR of the recordings is a lot different. Therefore, lumping together neurons of the same recording, but of different recording quality combines different levels of detected detail. An alternative way would be to set a SNR threshold for the neurons to be included – depending on the precision required from the respective analysis.

Posted in Calcium Imaging, Data analysis, electrophysiology, Imaging, machine learning, Neuronal activity | Tagged , , , , | Leave a comment

A convolutional network to deconvolve calcium traces, living in an embedding space of statistical properties

As mentioned before (here and here), the spikefinder competition was set up earlier this year to compare algorithms that infer spiking probabilities from calcium imaging data. Together with Stephan Gerhard, a PostDoc in our lab, I submitted an algorithm based on convolutional networks. Looking back at the few days end of April when we wrote this code, it was a lot of fun to work together with Stephan, who brought in his more advanced knowledge on how to optimize and refactor Python code and how to explore hyper-parameter spaces very efficiently. In addition, our algorithm performed quite well and ranked among the top submissions. Other high-scoring algorithms were submitted by Ben BolteNikolay Chenkov/Thomas McColganThomas DeneuxJohannes FriedrichTim MachadoPatrick MineaultMarius PachitariuDario Ringach, Artur Speiser and their labs.

The detailed results of the competition will be covered and discussed very soon in a review paper (now on bioRxiv), and I do not want to scoop any of this. The algorithm, which is described in the paper in more detail, goes a bit beyond a simple convolutional network. In simple words, the algorithm creates a space of models. Then, the algorithm chooses a location in this space for the current task based on statistical properties of the calcium imaging data that are to be analyzed. The idea of this step is that it will allow the model to generalize to datasets that it has not seen before.

The algorithm itself, which we wrote in Python3.6/Keras, should be rather straightforward to test with the Jupyter notebook that is provided or the plain Python file. We do not intend to publish the algorithm in a dedicated paper since everything will be described in the review paper, and the algorithm is already published on Github. It should be pretty self-explanatory and easy to set up (if not, let me know!).

So if you have some calcium imaging data that you would like to deconvolve, and if you want to get some hands-on experience with small-scale deep learning methods, this is your best chance …

I also was curious how some random calcium imaging traces of mine would look like after deconvolution based on my network. Sure, there is no spiking ground truth for these recordings, but one can still look at the results and immediately see whether it is a complete mess or something that looks more or less realistic. Here is one example from a very nice calcium recording that I did in 2016 in the dorsal telencephalon of an adult zebrafish using this 2P microscope. The spiking probabilities (blue) seem to be realistic and very reliable, but the recording quality was also extremely good.

stripe1

I was also curious about the performance of the algorithm for somebody else’s data as input. The probably most standardized calcium imaging dataset for mice that there is can be retrieved from the Allen Brain Observatory. Fluorescence traces can be accessed via the Allen SDK (yet another Jupyter notebook to start with). I deconvolved 20 traces each 60 min of recording @ 30 Hz framerate, which took me in total ca. 20 min (on a normal CPU, no GPU!). Let me show you some examples of calcium traces (blue) and the corresponding deconvolved spiking probability estimates (orange) for a couple of neurons; x-axis is time in seconds, y-axis is scaled arbitrarily:

Overall, the deconvolved data clearly seem to be less noisy than most of the predictions from the Spikefinder competition, probably due to better SNR of the calcium signal. False positives from the baseline are not very frequent. There are still some small most likely unwanted bumps, depending on the noisiness of the respective recorded neuron. For very strong calcium responses, the network sometimes tends to overdo the deconvolution, leading to a kind of slight negative overshooting of the spiking probability, or, put differently, ringing of the deconvolution filter. This could have been fixed by forcing the network to give back only positive values, but the results also look pretty fine without this fix.

Of course, if you want to try out your own calcium imaging data with this algorithm, I’d be happy to see the results! And if you are absolutely not into Python yet and don’t want to install anything before seeing some first results, you can also send me some of your calcium imaging traces for a quick test run.

Posted in Calcium Imaging, Data analysis, electrophysiology, Imaging, machine learning, Neuronal activity | Tagged , , , , | Leave a comment