## Deep learning, part IV (2): Compressing the dynamic range in raw audio signals

In a recent blog post about deep learning based on raw audio waveforms, I showed what effect a naive linear dynamic range compression from 16 bit (65536 possible values) to 8 bit (256 possible values) has on audio quality: Overall perceived quality is low, mostly because silence and quiet parts of the audio signal will get squished. The Wavenet network by Deepmind, however, uses a non-linear compression of the audio amplitude that allowed to map the signal to 8 bit without major losses. In the next few lines, I will describe what this non-linear compression is, and how well it performs on real music.

Nonlinear dynamic range compression

A quick search shows that the compression scheme used by the Wavenet is far from original, having been installed as an international standard for communication (although there are actually two standards, the European A-law standard and the US µ-law standard). The transformations, here the µ-law for 8 bit,

$F(x) =sgn(x)\cdot \ln(1+ 255 \cdot |x|)/\ln(256)$

are more or less logarithmic, which corresponds roughly to the psychophysics of human perception. To be precise, the subjective loudness of audio stimuli rather follows a power law than a logarithmic expression, but this is only important for the limit regimes (high and low frequencies).

A second possibility of compression that one could think of for reducing audio file size (in order to generate compact training data for machine learning) is the method used for mp3s and similar formats. It is called MDCT (modified discrete cosine transform) and is basically a derivative of a discrete Fourier transform (DFT). The output of the transform therefore lives in Fourier/frequency space. Deep learning based on such datasets could still be efficient. But then there would be no real need for convolutional filters, since time as a variable is already eliminated by the MDCT. Also, operating in the frequency instead of the time domain would be less intuitive for exploration. Which is a disadvantage: In my opinion not only the output of a network is important, but also the ease of understanding what happens inside the network during learning and recall.

Minimal dynamic bit depth for J.S. Bach

To examine the performance of the non-linear compression, I will use a real (piano music) example to showcase the effects of linear/non-linear dynamic compression. As a test sample, here is a rather calm intro to a piece by J.S. Bach. First the original, 16 bit:

Next follows the naive, linearly compressed dynamic range (8 bit) version. Brace yourself for some unpleasant noise:

However, if the signal is compressed non-linearly via the µ-law algorithm to 8 bit as described above (I tried the A-law as well: little difference; I slightly preferred the µ-law):

This sounds much better, despite some remaining whispering in the background. Not bad! But are all of those 8 bit really required? Why not reduce it to, e.g., 4 bit dynamic range? To test this, I resampled the excerpt at a bit depth that is increasing over time, starting with 4 bit (everything below is rather painful).

The change in bitdepth over time is plotted below. Even in the raw signal, the reduced bitdepth can be seen from the coarsely discretized amplitude values at the beginning – at 4 bit, the volume can take only 2^4 = 16 different values. This is clearly not good enough for human ears.

With 6 bits, it starts sounding better, and is almost fully re-covered at a bit depth of 11-12 (this depends on the quality of your earphones – with standard loudspeakers, I would guess that 8-9 bits might already sound the same as the original). However, the decisive quality increase seems to happen between 6 and 8 bit. Let’s make a fairer comparison.

5 bit (which sounds like footsteps in the snow …):

My conclusion: If I wanted to create a dataset with piano music for deep learning, I would go for 6-7 bit dynamic range. 5-6 bit seems to be too low quality to me. The reduction from 8 to 6-7 bit seems to be small, but it reduces the possible values from 256 to ca 100. This can be important for an implementation like Wavenet, where these 256 or 100 values are discrete categories upon which the input of the network is mapped. Leaving the network with too many category choices, it is probably much easier to make it fail with a given task.

P.S. Matlab script

For completeness, here is a small Matlab script that reads in an mp3, reduces the bit depth according to the µ-law and writes it to a sound file (*.wav). If you comment the blue lines, you will get the naive linear 8 bit compression variant and a lot of noise.

% read audio file
% transform
bitdepth = 8;
audioX = sign(audioX).*log(1+(2^bitdepth-1).*abs(audioX))./log(2^bitdepth);
% discretize to 8 bit
audioX = double(round(audioX*(2^(bitdepth-1))))/(2^(bitdepth-1));
% transform back to normal
audioX = sign(audioX).*((2^bitdepth).^abs(audioX) - 1)./(2^bitdepth - 1);
% play 8 bit audio
soundsc(audioX,Fs);
% save audio back to file
audiowrite('JSBach_mod.wav',audioX,Fs);

Newer versions of Matlab (which I do not have) already have implemented the compression algorithm in the function compand(). In Python, the audioop library will help you out.

Posted in machine learning | | 4 Comments

## Preamplifier bandwidth & two ways of counting photons

For two-photon point scanning microscopy, the excitation laser is typically pulsing at a repetition rate of 80 MHz, that is one pulse each 12.5 ns. To avoid aliasing, it was suggested to synchronize the sampling clock to laser pulses. For this, it is important to know over how much time the signal is smeared, that is, to measure the duration of the transient.

The device that smooths the PMT signal over time is the current-to-voltage amplifier. As far as I know, the two most commonly used ones are the Femto DHPCA-100 (variable gain, although mostly used with the 80 MHz bandwidth setting) and the Thorlabs model (60 MHz fixed bandwidth).

Observing single transients for different preamplifiers

However, 80 MHz bandwidth does not mean that everything below 80 MHz is transmitted and everything beyond suppressed. The companies provied frequency response curves, but in order to get a better feeling, I measured the transients of the above-mentioned preamplifiers when they amplified a single photon detected by a PMT. All transients are rescaled in y-direction (left-hand plot). I also determined a sort of gain for the single events by measuring the amplitude (right-hand plot). I also used two preamplifiers for each model, but could not make out any performance difference between two of the same kind.

For the 80 MHz bandwidth setting (Femto), the transient does not fully decay even after 15 ns or later; the Thorlabs preamp is even slower than this. Both exhibit a smooth multi-step shape during the decay phase.

At first glance, the Thorlabs preamp seems to be the obvious best choice, since the bandwidth is similar to the Femto 80 MHz and the gain is 3x higher. But for functional imaging, electrical noise is not a big problem, since the main source of noise is simply photon shot noise. In my hands, neither of both clearly outperformed the other, although this might be different for a completely different set of PMT/fluorophor/SNR.

The 200 MHz Femto setting would be perfect for lock-in sampling (and I already used it for that purpose), but the gain is, at least for my PMT, at the limit where electrical noise can become dominant (see figure on the right side). The 15 MHz setting, on the other hand, does not give any advantage, except if one samples with much less than 80 MHz.

Counting photons using an oscilloscope vs. using Poisson statistics

Looking at the raw oscilloscope traces when the microscope is scanning a biological sample, one can make another interesting observation. Shown here is a time window spanning 4000 ns, that is, 320 laser pulses. But those laser pulses only manage to elicit 14 photon-detecting events.

Let’s put this into some context. Normally, I bin every 8 laser pulses to one pixel; therefore, counting the detected photons per 8 pulses gives the photons per pixel. The outcome of doing this for each oscilloscope trace gives a couple of numbers (# photons per pixel bin), which are plotted below as single data points and histogram (figure to the right). Here, the number of photons per pixel bin is between 0.1 and 0.9 photons. In other words, only every 10-50th laser pulse creates a fluorescence photon that is detected lateron.

For the same sample, I also recorded an image and determined the number of photons per pixel as described earlier (based on a Labrigger blog post). As far as I can tell from the small statistics for photon counting using oscilloscope traces, there is a surprisingly close agreement between the two independent measurements. It might be off by a factor 1.5 or less.

On a side note, the reason for the low number of photons compared to this earlier example is a different fluorophor. In the previously shown example, it was the artificial calcium sensor rhod-2 which features a high baseline fluorescence; here, the tissue is weakly labeled with GCaMP6f, which has a much lower baseline fluorescence (actually, the tissue shown here is actually probably dead and therefore a little bit brighter than normal GCaMP6f-labeled neurons).

What I learned from this:

1. Neither the Femto DHCP-100 at 80 MHz nor the Thorlabs preamplifier are very well-suited for lock-in sampling to 80 MHz laser pulses.
.
2. For normal samples (by which I mean: functional calcium imaging), despite a difference in gain (factor 3), both preamps perform similarly, since for both the signal is stronger than electrical noise.
.
3. Normally, I (and certainly others with dim samples) get < 1 photon per pixel for GCaMP, and << 1 photon per laser pulse. Given by how many infrared photons the tissue is hit (around 10^9 photons per pulse!), this is astounding.
.
4. Calculating the gain of the system and the number of photons per pixel using the method mentioned before yields results that are consistent with (oscilloscope) photon counting. It appears as if one really can rely on this easy method to estimate the number of photons per pixel. This can be an important piece of information when one needs to decide whether to implement a photon counting hardware solution for a given applicatoin or not.

Update [ 2016-09-22 ] :

Upon Dario’s comment, I looked into what happens if you sample at 80 MHz (i.e., synced to the laser pulses), but at different phase lags, as described on the Scanbox blog by Tobias Rose. Using the transients measured above, one can simulate the phase-dependence of the signal intensity. The intensity ratio of best timing vs. worst timing is around 3 for the Femto 80 MHz, and <2 for the Thorlabs 60 MHz. This means that the sampling phase relative to the laser pulses is definitely important, and more for the Femto preamp than for the Thorlabs one. (I do not see this strong dependence with my microscope, but this may be due to a badly behaving laser sync signal.)

To make things more complicated, this factor of up to 3 means a threefold increase in signal, but not a threefold increase in SNR. SNR is mainly determined by shot noise, not voltage amplitude; however, I would bet that electrical noise can become quite prominent in the trough of the phase plot, and phase optimization still increases SNR.

Posted in Calcium Imaging, Imaging, Microscopy | | 2 Comments

## Deep learning, part IV: Deep dreams of music, based on dilated causal convolutions

As many neuroscientists, I’m also interested in artificial neural networks and am curious about deep learning networks. I want to dedicate some blog posts to this topic, in order to 1) approach deep learning from the stupid neuroscientist’s perspective and 2) to get a feeling of what deep networks can and can not do. Part I, Part II, Part III, Part IVb.

One of the most fascinating outcomes of the deep networks has been the ability of the deep networks to create ‘sensory’ input based on internal representations of learnt concepts. (I’ve written about this topic before.) I was wondering why nobody tried to transfer the deep dreams concept from image creation to audio hallucinations. Sure, there are some efforts (e.g. this python project; the Google project Magenta, based on Tensorflow and also on Github; or these LSTM blues networks from 2002). But to my knowledge no one had really tried to apply convolutional deep networks on raw music data.

Therefore I downsampled my classical piano library (44 kHz) by a factor of 7 in time (still good enough to preserve the musical structure) and cut it into some 10’000 fragments of 10 sec, which yields musical pieces each with 63’000 data points – this is slightly fewer datapoints than are contained by 256^2 px images, which are commonly used as training material for deep convolutional networks. So I thought this could work as well. However, I did not manage to make my deep convolutional network classify any of my data (e.g., to decide whether a sample was Schubert or Bach), nor did the network manage to dream creatively of music. As most often with deep learning, I did not know the reasons why my network failed.

Now, Google Deepmind has published a paper that is focusing on a text-to-speech system based on a deep learning architecture. But it can also be trained using music samples, in order to lateron make the system ‘dream’ of music. In the deepmind blog entry you can listen to some 10 sec examples (scroll down to the bottom).

As key to their project, they used not only convolutional filters, but so-called dilated convolutions, thereby being able to span more length-(that is: time-)scales with fewer layers – this really makes sense to me and explains to some extent why I did not get anything with my normal 1d convolutions. (Other reasons why Deepmind’s net performs much better include more computational power, feedforward shortcut connections, non-linear mapping of the 16bit-resolved audio to 8bit for training and possibly other things.)

The authors also mention that it is important to generate the text/music sequence point by point using a causal cut-off for the convolutional filter. This is intuitively less clear to me. I would have expected that musical structure at a certain point in time could very well be determined also by future musical sequences. But who knows what happens in these complex networks and how convergence to a solution looks like.

Another remarkable point is the short memory of the musical hallucinations linked above. After 1-3 seconds, a musical idea is faded because of the exponential decaying memory; a bigger structure is therefore missing. This can very likely be solved by using networks with dilated convolutions that span 10-100x longer timescales and by subsampling the input data (they apparently did not do it for their model, probably because they wanted to generate naturalistic speech, and not long-term musical structure). With increasing computational power, these problems should be overcome soon. Putting all this together, it seems very likely that in 10 years you can feed the full Bach piano recordings into a deep network, and it will start composing like Bach afterwards, probably better than any human. Or, similar to algorithms for paintings, it will be possible to input a piano piece written by Bach and let a network which has learned different musical styles continuously transform it into Jazz.

On a different note, I was not really surprised to see some sort of convolutional networks excel at hallucinating musical structure (since convolutional filters are designed to interpret structure), but I am surprised to see that they seem to outperform recurrent networks for generation of natural language (this comparison is made in Deepmind’s paper). Long short-term memory recurrent networks (LSTM RNNs, described e.g. on Colah’s blog, invented by Hochreiter & Schmidhuber in ’97) solve the problem of fast-forgetting that is immanent to regular recurrent neuronal networks. I find it a bit disappointing that these problems can also be overcome by blown-up dilated convolutional feed-forward networks, instead of neuron-intrinsic (more or less) intelligent memory in a recurrent network like in LSTMs. The reason for my disappointment is due the fact that recurrent networks seem to be more abundant in biological brains (although this is not 100% certain), and I would like to see research in machine learning and neuronal networks also focus on those networks. But let’s see what happens next.

### Update – 30/9/2016 ###

Since I was asked about the piano dataset in the comments, here are a few more words on that topic. First, why did I down-sample the recordings by a factor of seven? mp3 recordings are typically encoded at 44.1 kHz, which is roughly Nyquist times the hearing limit of humans. The higher frequencies, however, are costly, but almost unimportant. For example, frequencies of human speech are well below 44 kHz.

For my dataset, I chose two piano composers of different style, J.S. Bach and F. Schubert. Here is a 10-sec piece by Bach:

And here downsampled to 6.3 kHz:

One can still perceive the structure, and for this project, I was mainly concerned about larger musical structures, not the overtone structures, so this would be totally fine, and I decided for myself that this is roughly the compromise I wanted to make. Now, let’s compress the dynamic range from the standard 16 bit to 8 bit:

This seems still acceptable, although one problem is apparent: the encoding of background silence in is not very good, and it seems as if there is some white noise added up to all frequencies.

Now, let’s have a look at Schubert, first the unperturbed original:

Then downsampled to 6.3 kHz:

And with reduced dynamic range (8 bit):

Now, it becomes clear that 8 bit is not enough if the dynamic range that is used is large – which is definitely more the case for the typically erratic Schubert sonatas than for Bach prelude recordings. Clarity is difficult to encode properly!

I therefore chose to downsample in time by 7, but keep 16 bit in dynamic range. Deepmind (see above) showed how to compress audio to 8 bit without these losses, by using a non-linear mapping of amplitudes from raw data to 8 bit.

###

Now some details on how I generated the dataset; there are most likely better ways, but here is my improvised solution. I had a couple of folders with relevant mp3s, either Bach or Schubert. In Matlab, I opened each mp3, downsampled the time course by 7, chunked into 10-sec pieces and saved it as a binary mat file for each mp3 file. The 10-sec pieces are not independent, but overlapping by 1 sec in order to generate more training data:

And here is the code:

% list of folders
FolderList = dir();

counter_file = 1;
for jj = 3:9 % here, I had 7 folders with different composers

cd(FolderList(jj).name)
% list of mp3s
FileList = dir('*.mp3');

bitelength = 6300; % 1.0 sec; one chunck will be 10 sec

chunk_counter = 0;
for i = 1:numel(FileList)
disp(strcat('mp3 #',32,num2str(i),32,'out of',32, ...
num2str(numel(FileList)),32,'within folder "', ...
FolderList(jj).name,'".'));
% downsample by factor 7
yy = zeros(floor(size(y,1)/7),2);
for k = 1:7
yy = yy + y((7:7:end)-k+1,:);
end
yy = yy/7;

% expected number of 10 sec chunks
num_chunks = floor(size(yy,1)/bitelength)-9;

% generate chunked pieces
chunked_song = zeros(bitelength*10,num_chunks,'int16');
for p = 1:num_chunks
piece = yy((1:bitelength*10)+bitelength*(p-1),1);
% 16bit, therefore multiply by 2^15-1=32767 for signed integer
piece = piece*32767;
chunked_song(:,p) = int16(piece);
end
% keep track of number of chunks for this folder
chunk_counter = chunk_counter + num_chunks;
% save as mat file (can be read in Python and Matlab)
save(strcat(FileList(i).name(1:end-4),'_chunks.mat'),'chunked_song');
end
% go back
cd ..
% write a mat file with metadata about the number of chunks contained
% in the respective folder
Chunk_Counter(counter_file).filename = FolderList(jj).name;
Chunk_Counter(counter_file).chun_counter = chunk_counter;
counter_file = counter_file + 1;
end

save('metaData.mat','Chunk_Counter')

For training, I wrote a Python script to read in the dataset. Together with the Matlab code, you can find it on Github.

The dataset itself consisted of a complete recording of the piano sonatas by Schubert; the Goldberg variations by Bach with two different pianists; some fugues and preludes by Bach; and then again by Bach some partitas and diverse other recordings. In compressed form, it is ca. 5 GB large.

The deep learning classifier I trained was supposed to learn to assign each 10-sec fragment either to Bach or Schubert; after learning, I would let the network imagine its own music based on its internal structure. However, the network never learned to classify properly, so I had to give up the project.

Posted in machine learning | | 7 Comments

## Whole-cell patch clamp, part 1: introductory reading

Ever since I my interested in neuroscience become more serious, I was fascinated by the patch clamp technique, especially applied for the whole cell. Calcium imaging or multi-channel electrophysiology (recent review) is the way to go in order to get an idea what a neuronal population is doing on the single-cell level, but it occludes fast dynamics like bursting, fast oscillations and subthreshold membrane potential dynamics (calcium imaging), or unambiguous assignment of activity to single neurons (multi-channel ephys). That’s exactly what whole-cell patch clamp can do (and much more).

Some months ago, I started using the technique on an adult zebrafish brain ex vivo preparation. This image shows a z-stack of a patched cell that was imaged after the electrical recording. The surrounding cells are labeled with GCaMP; the brighter labeling of the patched neuron was done by a fluorophor inside the pipette that was diffusing into the cell, with which the pipette ideally forms a single electrical compartment. The fluorophor fills up the soma and some of the dendrites. The pipette position is shown as an overlay in the right-hand side image.

Electrophysiology is a very unrewarding and difficult activity, compared to calcium imaging. The typical, old-school electrophysiologist is always alone with his rig, through long nights of a never-ending series of failures, intercepted by few successfully patched and nicely behaving neurons. On average, frustration dominates, no matter how successful he/she is in the end; as a consequence, he fiercely protects his rig from anybody else who wants to touch it and might interfere with the labile stability of his setup. Therefore, over time, he becomes more and more annoyed by any interaction with fellow humans. At least that is what people say about electrophysiologists …

Despite this asocial component, nothing is more encouraging for beginners like me than hearing from others and about their struggles with electrophysiology. I will therefore write about my own experience with electrophysiology so far, and although I’m lacking the year-long experience of older electrophysiologist, I share my experience with the hope to encourage others.

To begin with, here’s a list of useful books and manuals for learning, if one does not have an experienced colleague who shows every single detail:

• Areles Molleman, Patch Clamping: An Introductory Guide to Patch Clamp Electrophysiology
A very short book which does not go into the details e.g. of analog electrical circuits of a cell, but gives useful pragmatic advice and how-to-dos for patching (both single channel and whole-cell). Very useful starting point for the beginner.
.
• In Labtimes, there’s a 2009 short first-hand report by Steven Buckingham that highlights some of the difficulties of patching and gives precise and concise advice.
.
• The Axon Guide for Electrophysiology & Biophysics Laboratory Techniques
If you have time for 250 pages of technical descriptions, this is your choice. The document might be quite old, but there haven’t been many revolutions to patching anyway. For several troubleshooting issues, I have found good advice in this document.
.
• If you are lacking the theoretical background of how neurons, membrane potentials and ions work together, I would recommend online lectures like these slides that have a focus on theoretical underpinnings of measurements and not on measurements and troubleshooting.
.
• For a more in-depth description of everything related to membrane potentials and ions: Ion Channels of Excitable Membranes (3rd Ed.) by Bertil Hille. It’s 15 years old, but still the best book that I’ve seen so far. Especially for somebody with a physics background, it is very rewarding to read.
.
• For questions related to applications of patching (and other single neuron-specific tools), I can recommend Dendrites  by Stuart, Spruston, Häusser et al., although I have not yet checked the newest, very recent edition (2016)..

Soon, I hope that I will have time to write about some more technical aspects of patching.

## The zebrafish, and the other zebrafish

Zebrafish are often used as a model organism for in vivo brain imaging, because they are transparent. Or at least that is what many people think who do not work with zebrafish. In reality, most people use zebrafish larvae for in vivo imaging, typically not older than 5 days (post fertilization). At this developmental stage, the total larval body length is still less than the brain size of the adult fish. After 3-4 weeks, the fish look less like tadpoles and more like fish, measuring 10-15 mm in size (see also video below). They attain the full body length of approx. 25 to 45 mm within 3-4 months.

This video shows a zebrafish larva (7 days old), two adult zebrafish (16 months old) and a juvenile zebrafish (4.5 weeks old).

.
After 4-5 days, the brain size of larvae exceeds the dimensions that can be imaged with cellular resolution in vivo using light sheet or confocal microscopy when embedded in agarose. After approx. 4 weeks, even for unpigmented fish the thickened skull makes imaging of deeper brain regions very difficult. Superficial brain regions like the tectum are better accessible, but fish of this age are too strong to be restrained by agarose embedding. Brain imaging for adult fish is still possible in ex vivo whole brain preparations [1], but with loss of behavioral readout. Use of toxins for immobilization is an option (e.g. with curare in zebrafish [2] or in other fish species [3]), but not a legal one in some countries, including Switzerland. These are some of the reasons why most people stick to the simple zebrafish larva. My PhD lab is one of the few that does physiology in adult zebrafish.

Recently, I’ve been to the Basel ICON conference, where the recent Nobel laureate Eric Betzig gave an impressive talk on microscopy techniques (including lattice light sheet, SIM and expansion microscopy). Some days ago, I found a similar talk by Eric Betzig (although with less recent results) simply on youtube.
The advantages of online videos compared to live talks are obvious, and I wonder why people do not use them more often, both to learn about research and to communicate their own research. Additionally, compared to research papers, the personality of a researcher is much more obvious from a talk – which is important to know for students interested in working in his/her lab. Here just a small collection of some good talks by neuroscientists which are not as flashy and fancy as TED talks, but much more informative and interesting.

Here is Eve Marder on central pattern generators in lobsters and crabs. She had been developing experiments and models for this seemingly simply system for more than 40 years, thereby exposing the complexity of a system consisting of only 30 neurons.

Ken Harris, a mathematician by training and now more interested in large-scale brain activity recordings in mice, gives a rather technical, but very understandable talk on advances and problems in spike sorting for multielectrode arrays.

Larry Abbott, one of the most well-known theoreticians in neuroscience, with a very interesting talk about experimental findings in the olfactory system of the fruit fly.

Christof Koch on the search for the neuronal correlates of consciousness. In the late 90s, he was one of the pioneers in this field together with Francis Crick; more recently, he is working together with Giulio Tononi.

Edvard Moser on spatial navigation and place cells/grid cells. For this topic he was awarded the Nobel prize 2014.

Haim Sompolinsky with a theoretical perspective on sensory representations. Coming from physics, Haim Sompolinsky helped transferring the physics of phase transitions to the mathematical modeling of neuronal network models in the late 80s.

Some of the links might be outdated in a couple of years, but I hope that researchers will start uploading more recent and well-prepared talks in the years to come, replacing overcrowded plenary talks by often jet-lagged speakers.

Update [2016-07-20]: Maybe this is the right place to mention a very nice series of podcasts, featuring interviews with leading neuroscientists, e.g. Michael Shadlen or Peter Jonas, or of my thesis supervisor in Basel, Rainer Friedrich. Thanks to Anne Urai who posted a link to this webpage on her blog.

## Large field of view microscopes for mouse brain imaging

For typical confocal or two-photon microscopes that maintain (sub)cellular resolution, a high-magnification objective is needed (typically 16x, 20x or 25x). This in turn limits the field of view (FOV) to ⌀ 1.0-1.5 mm.

For imaging in the mouse brain cortex, which is basically a big unwrinkled surface of a size of the order of 10 mm, a bigger FOV would be nice to have for some applications. Recently, a couple of papers came out that tried to increase the FOV, while using optical engineering to maintain the resolution. (Please don’t hesitate to tell me if I missed a relevant publication.)

Few years ago, I would have expected such papers to be published in Nature Methods, but apparently the time has come where optical engineering and improvement of existing techniques is not considered enough for passing the novelty bar. However, the three papers offer some very interesting lessons on engineering a two-photon microscope, of which I want to pick a few:

• The use of large-aperture (15 mm) galvo scanners by Tsai et al. in order to avoid large scanning angles that would create large aberrations. (Thus the design cannot be used with resonant scanners which have much smaller apertures.) The large beam diameter at the galvos allows to use a low-magnification scan lens-tube lens pair, which demagnifies the scan angle to a lesser extent. It is important to understand that the scan lens-tube lens telescope magnifies the beam diameter, but at the same time decreases the scan angle by the same factor.
.
• Due to the extremely large and heavy costum-designed objectives (click here for a picture of the Stirman et al. objective), remote focusing is necessary for fast switching of the axial focus. Stirman et al. use optotune lenses; Sofroniew, Flickinger et al. take advantage of a remote mirror technique that has been developed 2007-2012, but use a voice coil motor for mirror displacement – interestingly, I had converged onto the same solution when I constructed my z-scanning module (see this previous blog post or the dedicated paper).
The optotune solution by Stirman et al. is in my opinion less well-suited for remote scanning, since resolution cannot be maintained over large z-ranges due to optical issues (although this is not mentioned in the paper). It’s probably good enough for small z-ranges, but it has to be considered in the optical design from the beginning.
.
• Sofroniew, Flickinger et al. use something they call a virtually conjugated galvo pair (VCGP) to avoid annoying relay optics. I do not understand why they came up with this strange name or whether this design has been used before, but the principle is quite nice.
.
• Stirman et al. use temporal multiplexing to image two independently chosen locations using separated, delayed light paths, similar to this 2011 paper (also check out this thesis for less polished pictures).
.
• A chapter that I found interesting to read is “Tolerancing and sensitivity analysis” in the Stirman et al. paper.
.
• Sofroniew, Flickinger et al. employed oil interfaces close to the PMT surface to increase the photon collection NA.
On a related note, it is interesting to read their speculations about inhomogeneities of the PMT photocathode.

.
The problem with these microscope designs is – how to adapt them to one’s own lab? The goal should be to generate something which does not work for a paper only, but as a reliable and robust tool. I can see two (non mutually exclusive) possibilities how this could be achieved. The first is transparency, with the free and open distribution of Zemax files and software to anybody who wants it. In this spirit, I liked a lot figure 2 in the paper by Sofroniew, Flickinger et al., because it clearly shows the optical design a) as a scheme, b) as a CAD drawing, and c) as a real life picture.

A second way would be to license the design to companies like Scientifica, Thorlabs or maybe even smaller spin-offs/start-ups like Neurolabware or Vidrio Technologies. Similar to turn-key femtosecond lasers that revolutionized the field of 2P microscopy when they became available, one can hope for a company that puts together modular units that are stable, robust and working out of the box to enable complex microscopes (with z-scanning, with multi-region scanning, with simultaneous spatially patterned optogenetics, with multiple detection channels, with >1100 nm coatings for excitation, with adaptive optics for deep tissue imaging, etc.) in normal neuroscience labs. I’m probably not the only one who is fascinated by these technologies, but if someone has a neuroscientific question, he does not want to spend his whole life on the development of a high-end microscope. (Moreover, it would not be very rewarding, because this type of engineering and optimization process will not be rewarded by any kind of top-level publication.) Similarly, nobody wants to build his own femtosecond-pulsed laser for 2P imaging these days (although there are always exceptions, for example this one).

Posted in Calcium Imaging, Imaging, Microscopy | | 3 Comments