Saturday, 17 June 2017

Prospecting for kryptonite: the value of null results

This blogpost doesn't say anything new – it just uses a new analogy (at least new to me) to make a point about the value of null results from well-designed studies. I was thinking about this after reading this blogpost by Anne Scheel.

Think of science like prospecting for kryptonite in an enormous desert. There's a huge amount of territory out there, and very little kryptonite. Suppose also that the fate of the human race depends crucially on finding kryptonite deposits.

Most prospectors don't find kryptonite. Not finding kryptonite is disappointing: it feels like a lot of time and energy has been wasted, and the prospector leaves empty-handed. But the failure is nonetheless useful. It means that new prospectors won't waste their time looking for kryptonite in places where it doesn't exist.  If, however, someone finds kryptonite, everyone gets very excited and there is a stampede to rush to the spot where it was discovered.

Contemporary science works a bit like this, except that the whole process is messed up by reporting bias and poor methods which lead to false information.

To take reporting bias first: suppose the prospector who finds nothing doesn't bother to tell anyone. Then others may come back to the same spot and waste time also finding nothing. Of course, some scientists are like prospectors in that they are competitive and would like to prevent other people from getting useful information. Having a competitor bogged down in a blind alley may be just what they want for their rivals. But where there is an urgent need for new discovery, there needs to be a collaborative rather than competitive approach, to speed up discovery and avoid waste of scarce funds. In this context, null results are very useful.

False information can come from the prospector who declares there is no kryptonite on the basis of a superficial drive through a region. This is like the researcher who does an underpowered study that gets an inconclusive null result. It doesn't allow us to map out the region with kryptonite-rich and kryptonite-empty areas – it just leaves us having to go back and look again more thoroughly. Null results from poorly designed studies are not much use to anyone.

But the worst kind of false information is fool's kryptonite: someone declares they have found kryptonite, but they haven't. So everyone rushes off to that spot to try and find their own kryptonite, only to find they have been deceived. So there are a lot of wasted resources and broken hearts. For a prospector who has been misled in this way, this situation is worse than just not finding any kryptonite, because their hopes have been raised and they may have put a disproportionate amount of effort and energy into pursuing the false information.

Pre-registering a study is the equivalent of a prospectors declaring publicly that they are doing a comprehensive survey of a specific region, and will declare what they have found, so that the map can gradually be filled in, with no duplication of effort.

Some will say, what about exploratory research? Of course the prospector may hit lucky and find some other useful mineral that nobody had anticipated. If so, that's great, and it may even turn out more important than kryptonite. But the point I want to stress is that the norm for most prospectors is that they won't find kryptonite or anything else. Really exciting findings occur rarely, yet our current incentive structures create the impression that you have to find something amazing to be valued as a scientist.  It would make more sense to reward those who do a good job of prospecting, producing results that add to our knowledge and can be built upon.

I'll leave the last word to Ottoline Leyser, who in an interview for The Life Scientific said: "There's an awful lot of talk about ground-breaking research…. Ground-breaking is what you do when you start a building. You go into a field and you dig a hole in the ground. If you're only rewarded for ground-breaking research, there's going to be a lot of fields with a small hole in, and no buildings."

Sunday, 28 May 2017

Which neuroimaging measures are useful for individual differences research?

The tl;dr version

A neuroimaging measure is potentially useful for individual differences research if variation between people is substantially greater than variation within the same person tested on different occasions. This means that we need to know about the reliability of our measures, before launching into studies of individual differences.
High reliability is not sufficient to ensure a good measure, but it is necessary.

Individual differences research

Psychologists have used behavioural measures to study individual differences - in cognition and personality - for many years. The goal is complementary to psychological research that looks for universal principles that guide human behaviour: e.g. factors affecting learning or emotional reactions. Individual differences research also often focuses on underlying causes, looking for associations with genetic, experiential and/or neurobiological differences that could lead to individual differences.

Some basic psychometrics

Suppose I set up a study to assess individual differences in children’s vocabulary. I decide to look at three measures.
  • Measure A involves asking children to define a predetermined set of words, ordered in difficulty, and scoring their responses by standard criteria.
  • Measure B involves showing the child pictured objects that have to be named.
  • Measure C involves recording the child talking with another child and measuring how many different words they use.
For each of these measures, we’d expect to see a distribution of scores, so we could potentially rank order children on their vocabulary ability. But are the three measures equally good indicators of individual differences?

We can see immediately one problem with Test B: the distribution of scores is bunched tightly, so it doesn’t capture individual variation very well. Test C, which has the greatest spread of scores, might seem the most suitable for detecting individual variation. But spread of scores, while important, is not the only test attribute to consider. We also need to consider whether the measure assesses a stable individual difference, or whether it is influenced by random or systematic factors that are not part of what we want to measure.

There is a huge literature addressing this issue, starting with Francis Galton in the 19th century, with major statistical advances in the 1950s and 1960s (see review by Wasserman & Bracken, 2003). The classical view treats test scores as a compound, with a ‘true score’ part, plus an ‘error’ part. We want a measure that minimises the impact of random or systematic error.

If there is a big influence of random error, then the test score is likely to change from one occasion to the next. Suppose we measure the same children on two occasions a month apart on three new three tests, and then plot scores on time 1 vs time 2. (To simplify this example, we assume that all three tests have the same normal distribution of scores - the same as for test A in Figure 1, and there is an average gain of 10 points from time 1 to time 2).

Figure 2

We can see that Test F is not very reliable: although there is a significant association between the scores on two test occasions, individual children can show remarkable changes from time to time. If our goal is to measure a reasonably stable attribute of the person, then Test F is clearly not suitable. aov
Just because a test is reliable, it does not mean it is valid. But if it is not reliable, then it won’t be valid. This is illustrated by this nice figure from

What about change scores?

Sometimes we explicitly want to measure change: for instance, we may be more interested in how quickly a child learns vocabulary, rather than how much they know at some specific point in time. Surely, then, we don’t want a stable measure, as it would not identify the change? Wouldn’t test F be better than D or E for this purpose?

Unfortunately, the logic here is flawed. It’s certainly possible that people may vary in how much they change from time to time, but if our interest is in change, then what we want is a reliable measure of change. There has been considerable debate in the psychological literature as to how best to establish the reliability of a change measure, but the key point is that you can find substantial change in test scores that is meaningless, and that the likelihood of it being meaningless is substantial if the underlying measure is unreliable. The data in Figure 2 were simulated by assuming that all children changed by the same amount from Time 1 to Time 2, but that tests varied in how much random error was incorporated in the test score. If you want to interpret a change score as meaningful, then the onus is on you to convince others that you are not just measuring random error.

What does this have to do with neuroimaging?

My concern with the neuroimaging literature, is that measures from functional or structural imaging are often used to measure individual differences, but it is rare to find any mention of reliability of those measures. In most cases, we simply don’t have any data on repeated testing using the same measures - or if we do, the sample size is too small, or too selected, to give a meaningful estimate of reliability. Such data as we have don’t inspire confidence that brain measurements achieve high level of reliability that is aimed for in psychometric tests. This does not mean that these measures are not useful, but it does make them unsuited for the study of individual differences.

I hesitated about blogging on this topic, because nothing I am saying here is new: the importance of reliability has been established in the literature on measurement theory since 1950. Yet, when different subject areas evolve independently, it seems that methodological practices that are seen as crucial in one discipline can be overlooked in another that is rediscovering the same issues but with different metrics.

There are signs that things are changing, and we are seeing a welcome trend for neuroscientists to start taking reliability seriously. I started thinking about blogging on this topic just a couple of weeks ago after seeing some high-profile papers that exemplified the problems in this area, but in that period, there have also been some nice studies that are starting to provide information on reliability of neuroscience measures. This might seem like relatively dull science to many, but to my mind it is a key step towards incorporating neuroscience in the study of individual differences. As I commented on Twitter recently, my view is that anyone who wants to using a neuroimaging measure as an endophenotype should first be required to establish that it has adequate reliability for that purpose.

Further reading

This review by Dubois and Adolphs (2016) covers the issue of reliability and much more, and is highly recommended.
Other recent papers of relevance:
Geerligs, L., Tsvetanov, K. A., Cam-CAN, Henson, R. N. 2017 Challenges in measuring individual differences in functional connectivity using fMRI: The case of healthy aging. Human Brain Mapping
Nord, C. L., Gray, A., Charpentier, C. J., Robinson, O. J., Roiser, J. P. 2017 Unreliability of putative fMRI biomarkers during emotional face processing.Neuroimage.

Note: Post updated on 17th June 2017 because figures from R Markdown html were not displaying correctly on all platforms.

Monday, 1 May 2017

Reproducible practices are the future for early career researchers

This post was prompted by an interesting exchange on Twitter with Brent Roberts (@BrentWRoberts) yesterday. Brent had recently posted a piece about the difficulty of bringing about change to improve reproducibility in psychology, and this had led to some discussion about what could be done to move things forward. Matt Motyl (@mattmotyl) tweeted:

I had one colleague tell me that sharing data/scripts is "too high a bar" and that I am wrong for insisting all students who work w me do it

And Brent agreed:

We were recently told that teaching our students to pre-register, do power analysis, and replicate was "undermining" careers.

Now, as a co-author of a manifesto for reproducible science, this kind of thing makes me pretty cross, and so I weighed in, demanding to know who was issuing such rubbish advice. Brent patiently explained that most of his colleagues take this view and are skeptics, agnostics or just naïve about the need to tackle reproducibility. I said that was just shafting the next generation, but Brent replied:

Not as long as the incentive structure remains the same.  In these conditions they are helping their students.

So things have got to the point where I need more than 140 characters to make my case. I should stress that I recognise that Brent is one of the good guys, who is trying to make a difference. But I think he is way too pessimistic about the rate of progress, and far from 'helping' their students, the people who resist change are badly damaging them.  So here are my reasons.

1.     The incentive structure really is changing. The main drivers are funders, who are alarmed that they might be spending their precious funds on results that are not solid. In the UK, funders (Wellcome Trust and Research Councils) were behind a high profile symposium on Reproducibility, and subsequently have issued statements on the topic and started working to change policies and to ensure their panel members are aware of the issues. One council, the BBSRC, funded an Advanced Workshop on Reproducible Methods this April. In the US, NIH has been at the forefront of initiatives to improve reproducibility. In Germany, Open Science is high on the agenda.
2.     Some institutions are coming on board. They react more slowly than funders, but where funders lead, they will follow. Some nice examples of institution-wide initiatives toward open, reproducible science come from the Montreal Neurological Institute and the Cambridge MRC Cognition and Brain Sciences Unit. In my own department, Experimental Psychology at the University of Oxford, our Head of Department has encouraged me to hold a one-day workshop on reproducibility later this year, saying she wants our department to be at the forefront of improving psychological science.

3.     Some of the best arguments for working reproducibly have been made by Florian Markowetz. You can read about them on this blog, see him give a very entertaining talk on the topic here, or read the published paper here. So there is no escape. I won't repeat his arguments here, as he makes them better than I could, but his basic point is that you don't need to do reproducible research for ideological reasons: there are many selfish arguments for adopting this approach – in the long run it makes your life very much easier.

4.     One point Florian doesn't cover is pre-registration of studies. The idea of a 'registered report', where your paper is evaluated, and potentially accepted for publication, on basis of introduction and methods was introduced with the goal of improving science by removing publication bias, p-hacking and HARKing (hypothesising after results are known). You can read about it in these slides by Chris Chambers. But when I tried this with a graduate student, Hannah Hobson, I realised there were other huge benefits. Many people worry that pre-registration slows you down. It does at the planning stage, but you more than compensate for that by the time saved once you have completed the study. Plus you get reviewer comments at a point in the research process when they are actually useful – i.e. before you have embarked on data collection. See this blogpost for my personal experience of this.

5.     Another advantage of registered reports is that publication does not depend on getting a positive result. This starts to look very appealing to the hapless early career researcher who keeps running experiments that don't 'work'. Some people imagine that this means the literature will become full of boring registered reports with null findings that nobody is interested in. But because that would be a danger, journals who offer registered reports impose a high bar on papers they accept – basically, the usual requirement is that the study is powered at 90%, so that we can be reasonably confident that a negative result is really a null finding, and not just a type II error. But if you are willing to put in the work to do a well-powered study, and the protocol passes scrutiny of reviewers, you are virtually guaranteed a publication.

6.     If you don't have time or inclination to go the whole hog with a registered report, there are still advantages to pre-registering a study, i.e. depositing a detailed, time-stamped protocol in a public archive. You still get the benefits of establishing priority of an idea, as well as avoiding publication bias, p-hacking, etc. And you can even benefit financially: the Open Science Framework is running a pre-registration challenge – they are giving $1000 to the first 1000 entrants who succeed in publishing a pre-registered study in a peer-reviewed journal.

7.     The final advantage of adopting reproducible and open science practices is that it is good for science. Florian Markowetz does not dwell long on the argument that it is 'the right thing to do', because he can see that it has as much appeal as being told to give up drinking and stop eating Dunkin Donuts for the sake of your health. He wants to dispel the idea that those who embrace reproducibility are some kind of altruistic idealists who are prepared to sacrifice their careers to improve science. Given arguments 1-6, he is quite right. You don't need to be idealistic to be motivated to adopt reproducible practices. But it is nice when one's selfish ambitions can be aligned with the good of the field. Indeed, I'd go further and suggest that I've long suspected that this may relate to the growing rates of mental health problems among graduate students and postdocs: many people who go into science start out with high ideals, but are made to feel they have to choose between doing things properly vs. succeeding by cutting corners, over-hyping findings, or telling fairy tales in grant proposals. The reproducibility agenda provides a way of continuing to do science without feeling bad about yourself.

Brent and Matt are right that we have a problem with the current generation of established academic psychologists, who are either hostile to or unaware of the reproducibility agenda.  When I give talks on this topic, I get instant recognition of the issues by early career researchers in the audience, whereas older people can be less receptive. But what we are seeing here is 'survivor bias'. Those who are in jobs managed to succeed by sticking to the status quo, and so see no need for change. But the need for change is all too apparent to the early career researcher who has wasted two years of their life trying to build on a finding that turns out to be a type I error from an underpowered, p-hacked study. My advice to the latter is don't let yourself be scared by dire warnings of the perils of working reproducibly. Times really are changing and if you take heed now, you will be ahead of the curve.

Sunday, 23 April 2017

Sample selection in genetic studies: impact of restricted range

I'll shortly be posting a preprint about methodological quality of studies in the field of neurogenetics. It's something I've been working on with a group of colleagues for a while, and we are aiming to make recommendations to improve the field.

I won't go into details here, as you will be able to read the preprint fairly soon. Instead, what I want to do here is to expand on a small point that cropped up as I looked at this literature, and which I think is underappreciated.

It's to do with sampling. There's a particular problem that I started to think about a while back when I heard someone give a talk about a candidate gene study. I can't remember who it was or even what the candidate gene was, but basically they took a bunch of students, genotyped them, and then looked for associations between their genotypes and measures of memory. They were excited because they found some significant results. But I was, as usual, sitting there thinking convoluted thoughts about all of this, and wondering whether it really made sense. In particular, if you have a common genetic variant that has such a big effect on memory, would this really show up in a bunch of students – who are presumably people who have pretty good memories? Wouldn't it rather be the case that what you'd expect would be an alteration in the frequencies of genotypes in the student population?

Whenever I have an intuition like that, I find the best thing to do is to try a simulation. Sometimes the intuition is confirmed, and sometimes things turn out different and, very often, more complicated.

But this time, I'm pleased to say my intuition seems to have something going for it.

So here's the nuts and bolts.

I simulated genotypes and associated phenotypes by just using R's nice mvrnorm function. For the examples below, I specified that a and A are equally common (i.e. minor allele frequency is .5), so we have 25% as aa, 50% as aA, and 25% AA. The script lets you specify how closely these are related to the phenotype, but from what we know about genetics, it's very unlikely that a common variant would have a value more than about .25.

We can then test for two things:
1)  How far does the distribution of genotypes in the sample (i.e. people who are aa, aA or AA) resemble that in the general population? If we know that MAF is .5, we expect this distribution to be 1:2:1.
2) We can assign each person a score corresponding to number of A alleles (coding aa as zero, aA as 1, and AA as 2) and look at the regression of the phenotype on the genotype. That's the standard approach to looking for genotype-phenotype association.

If we work with the whole population of simulated data, these values will correspond to those that we specified in setting up the simulation, provided we have a reasonably large sample size.

But what if we take a selective sample of cases who fall above some cutoff on the phenotype? This is equivalent to taking, for instance, a sample from a student population from a selective institution, when the phenotype is a measure of cognitive function. You're not likely to get into the institution unless you have a good cognitive ability. Then, working with this selected subgroup, we recompute our two measures, i.e. the proportions of each genotype, and the correlation between the genotype and the phenotype.

Now, the really interesting thing here is that, as the selection cutoff gets more extreme, two things happen:
a) The proportions of people with different genotypes starts to depart from the values expected for the population in general. We can test to see when the departure becomes statistically significant with a chi square test.
b) The regression of the phenotype on the genotype weakens. We can quantify this effect by just computing the p-value associated with the correlation between genotype and phenotype.

Figure 1: Genotype-phenotype associations for samples selected on phenotype

Figure 1 shows the mean phenotype scores for each genotype for three samples: an unselected sample, a sample selected with z-score cutoff zero (corresponding to the top 50% of the population on the phenotype) and a sample selected with z-score cutoff of .5 (roughly selecting the top third of the population).

It's immediately apparent from the figure that the selection dramatically weakens the association between genotype and phenotype. In effect, we are distorting the relationship between genotype and phenotype by focusing just on a restricted range. 

Comparison of p-values from conventional regression approach and chi square test on genotype frequencies in relation to sample selection

Figure 2 shows the data from another perspective, by considering the statistical results from a conventional regression analysis, when different z-score cutoffs are used, selecting an increasingly extreme subset of the population. If we take a cutoff of zero – in effect selecting just the top half of the population, the regression effect (predicting phenotype from genotype), shown in the blue line, which was strong in the full population, is already much reduced. If you select only people with z-scores of .5 or above (equivalent to an IQ score of around 108), then the regression is no longer significant. But notice what happens to the black line. This shows the p-value from a chi square test which compares the distribution of genotypes in relation to expected population values in each subsample. If there is a true association between genotype and phenotype, then greater the selection on the phenotpe, the more the genotype distribution departs from expected values. The specific patterns observed will depend on the true association in the population and on the sample size, but this kind of cross-over is a typical result.

So what's the moral of this exercise? Well, if you are interested in a phenotype that has a particular distribution in the general population, you need to be careful when selecting a sample for a genetic association study. If you pick a sample that has a restricted range of phenotypes relative to the general population, then you make it less likely that you will detect a true genetic association in a conventional regression analysis. In fact, if you take a selected sample, there comes a point when the optimal way to demonstrate an association is by looking for a change in the frequency of different genotypes in the selected population vs the general population.

No doubt this effect is already well-known to geneticists, and it's all pretty obvious to anyone who is statistically savvy, but I was pleased to be able to quantify the effect via simulations. It is clear that it has implications for those who work predominantly with selected samples such as university students. For some phenotypes, use of a student sample may not be a problem, provided they are similar to the general population in the range of phenotype scores. But for cognitive phenotypes that's very unlikely, and attempting to show genetic effects in such samples seems a doomed enterprise.

The script for this simulation, simulating genopheno cutoffs.R should be available here:

(This link updated on 29/4/17).

Sunday, 5 March 2017

Advice for early career researchers re job applications: 1. Work 'in preparation'

Image from:
I posted a couple of tweets yesterday giving my personal view of things to avoid when writing a job application. These generated a livelier debate than I had anticipated, and made me think further about the issues I'd raised. I've previously blogged about getting a job as a research assistant in psychology; this piece is directed more at early career researchers aiming for a postdoc or first lectureship. I'll do a separate post about issues raised by my second tweet – inclusion of more personal information in your application. Here I'll focus on this one: 
  • Protip for job applicants: 3+ 1st author 'in prep' papers suggests you can't finish things AND that you'll be distracted if appointed

I've been shortlisting for years, and there has been a noticeable trend for publication lists to expand to include papers that are 'in preparation' as well as those that are 'submitted' or 'under review'. One obvious problem with these is that it's unclear what they refer to: they could be nearly-completed manuscripts or a set of bullet points. 
My tweet was making the further point that you need to think of the impression you create in the reader if you have five or six papers 'in preparation', especially if you are first author. My guess is that most applicants think that this will indicate their activity and productivity, but that isn't so. I'd wonder whether this is someone who starts things and then can't finish them. I'd also worry that if I took the applicant on, the 'in preparation' papers would come with them and distract them from the job I had employed them to do. I've blogged before about the curse of the 'academic backlog': While I am sympathetic about supporting early researchers in getting their previous work written up, I'd be wary of taking on someone who had already accumulated a large backlog right at the start of their career.

Many people who commented on this tweet supported my views:
  • @MdStockbridge We've been advised never to list in prep articles unless explicitly asked in the context of post doc applications?. We were told it makes one looks desperate to "fill the space."
  •  @hardsci I usually ignore "in prep" sections, but to me more than 1-2 items look like obvious vita-padding
  • @larsjuhljensen "In prep" does not count when I read a CV. The slight plus of having done something is offset by inability to prioritize content.
  • @Russwarne You can say anything is "in preparation." My Nobel acceptance speech is "in preparation." I ignore it.
  • DuncanAstle I regularly see CVs with ~5 in prep papers... to be honest I don't factor them into my appraisal.?
  • @UnhealthyEcon I'm wary if i see in-prep papers at all. Under review papers would be different.
  • @davidpoeppel Hey peeps in my labs: finish your papers! Run -don't walk -back to your desks! xoxo David. (And imho, never list any in prep stuff on CV...)
  • @janhove 'Submitted' is all right, I think, if turn arounds in your field are glacial. But 'in prep' is highly non-committal.

Others, though, felt this was unfair, because it meant that applicants couldn't refer to work that may be held up by forces beyond their control: 
  • @david_colquhoun that one seems quite unfair -timing is often beyond ones's control
  • @markwarschauer I disagree completely. The more active job applicants are in research & publishing the better.
  • @godze786  if it's a junior applicant it may also mean other authors are holding up. Less power when junior
  • @tremodian All good except most often fully drafted papers are stuck in senior author hell and repeated prods to release them often do nothing.
 But then, this very useful suggestion came up:  
  • @DrBrocktagon But do get it out as preprint and put *that* on CV
  • @maxcoltheart Yes. Never include "in prep" papers on cv/jobapp. Or "submitted" papers? Don't count since they may never appear? Maybe OK if ARKIVed
The point here is that if you deposit your manuscript as a preprint, then it is available for people to read. It is not, of course peer-reviewed, but for a postdoc position, I'd be less interested in counting peer-reviewed papers than in having the opportunity to evaluate the written work of the applicant. Preprints allow one to do that. And it can be effective:
  • @BoyleLab we just did a search and one of our candidates did this. It helped them get an interview because it was a great paper
But, of course, there's a sting in the tail: once something is a preprint it will be read by others, including your shortlisting committee, so it had better be as good as you can get it. So the question came up, at what point would you deposit something as a preprint? I put out this question, and Twitter came back with lots of advice:
  • @michaelhoffman Preprint ≠ "in prep". But a smart applicant should preprint any of their "submitted" manuscripts.?
  • @DoctorZen The term "pre-print" itself suggests an answer. Pre-prints started life as accepted manuscripts. They should not be rough drafts.
  • @serjepedia these become part of your work record. Shoddiness could be damaging.
  • @m_wall I wouldn't put anything up that hadn't been edited/commented by all authors, so basically ready to submit.
  • @restokin If people are reading it to decide if they should give you a job, it would have to be pretty solid. 
All in all, I thought this was a productive discussion. It was clear that many senior academics disregard lists of research outputs that are not in the public domain. Attempts to pad out the CV are counterproductive and create a negative impression. But if work is written up to a point where it can be (or has been) submitted, there's a clear advantage to the researcher in posting it as a preprint, which makes it accessible. It doesn't guarantee that a selection committee will look at it, but it at least gives them that opportunity.

Thursday, 23 February 2017

Barely a good word for Donald Trump in Houses of Parliament

I am beginning to develop an addiction to Hansard, the public record of debates in Parliament and the House of Lords. It's a fascinating public record of how major political decisions are debated, and I feel fortunate to live in a country where it is readily available on the internet the day after a debate.

The debate on Donald Trump's state visit was particularly interesting, because it was prompted by a public petition signed by 1.85 million people, which read:

Donald Trump should be allowed to enter the UK in his capacity as head of the US Government, but he should not be invited to make an official State Visit because it would cause embarrassment to Her Majesty the Queen.

I've been taking a look at the debate from 20th February, which divided neatly down party lines, with the Conservatives and a single DUP member supporting the state visit, and everyone else (Labour, Lib Dems, SNP and Green) opposing it.

A notable point about the defenders of the State Visit is that virtually none of them attempted to defend Trump himself. The case that speaker after speaker made was that we should invite Trump despite of his awfulness. Indeed, some speakers argued that we'd invited other awful people before – Emperor Hirohito, President Ceausescu, Xi Jinping and Robert Mugabe - so we would be guilty of double standards if we did not invite Trump as well.

It was noted, however, that this argument did not hold much water, as none of these other invitees had been extended this honour within a week of being elected, and other far less controversial US presidents had never had a State Visit.

The principal argument used to support the government's position was a pragmatic one: it will be to the benefit of the UK if we work with the US, our oldest ally. That way we may be able to influence him, and also to achieve good trade deals. Dr Julian Lewis (Con) went even further, and suggested that by cosying up to Trump we might be able to avert World War 3:

…given he is in some doubt about continuing the alliance that prevented world war three and is our best guarantee of world war three not breaking out in the 21st century‚ do they really think it is more important to berate him, castigate him and encourage him to retreat into some sort of bunker, rather than to do what the Prime Minister did, perhaps more literally than any of us expected, and take him by the hand to try to lead him down the paths of righteousness? I have no doubt at all about the matter.

He continued:
What really matters to the future of Europe is that the transatlantic alliance continues and prospers. There is every prospect of that happening provided that we reach out to this inexperienced individual and try to persuade him‚there is every chance of persuading him, to continue with the policy pursued by his predecessors.

I can't imagine this is an argument that would be appreciated by Trump, as it manages to be both patronising and insulting at the same time.

The closest anyone dared come to being positive about Trump was when Nigel Evans (Con) said:

We might not like some of the things he says. I certainly do not like some of what he has said in the past, but I respect the fact that he is now delivering the platform on which he stood. He will go down in history as the only politician roundly condemned for delivering on his promises. I know this is a peculiar thing in the politics we are used to here‚- politicians standing up for something and delivering‚- but that is what Trump is doing.

But most of those supporting the visit did so while attempting to distance themselves from Trump's personal characteristics, e.g. Gregory Campbell (CON):

My view is that Candidate Trump and Mr Trump made some deplorable and vile comments, which are indefensible - they cannot be defended morally, politically or in any other way - but he is the democratically elected President of the United States of America.

Other made the point in rather mild and general terms, e.g. Anne Main:

Any of us who have particular concerns about some of President Trump's pronouncements are quite right to have them; I object completely to some of the things that have been said.

If we turn to the comments made by the speakers who opposed the state visit, then they were considerably more vivid in the negative language they used to portray Trump, with many focusing on the less savoury aspects of his character:
Paul Flynn (Lab) referred to the 'cavernous depths of his scientific ignorance'. Others picked up on Trump's statements on women, Muslims, the LGBT community, torture, and the press:

I think of my five-year-old daughter when I reflect on a man who considers it okay to go and grab pussy, a man who considers it okay to be misogynistic towards the woman he is running against. Frankly, I cannot imagine a leader of this country, of whatever political stripe, behaving in that manner. David Lammy (Lab)

President Trump's Administration so far has been characterised by ignorance and prejudice, seeking to ban Muslims and deny refuge to people fleeing from war and persecution. Kirsten Oswald (SNP)

Even if one were the ultimate pragmatist for whom the matters of equality or of standing against torture, racism and sexism do not matter, giving it all up in week 1 on a plate with no questions asked would not be a sensible negotiating strategy. Stephen Doughty (Lab)

I fought really hard to be elected. I fought against bigotry, sexism and the patriarchy to earn my place in this House. By allowing Donald Trump a state visit and bringing out the china crockery and the red carpet, we endorse all those things that I fought hard against and say, Do you know what? It's okay.  Naz Shah (Lab)

Let me conclude by saying that in my view, Mr Trump is a disgusting, immoral man. He represents the very opposite of the values we hold and should not be welcome here. Daniel Zeichner (Lab)

We are told that Trump is very thin-skinned and gets furious when criticised. It is also said that he doesn't read much, but gets most of his news from social media and cable TV, and is kept happy insofar as his staff feed him only positive media stories. If so, then I guess there is a possibility his team will somehow keep Hansard away from him, and the visit will go ahead. But it's hard to see how it could possibly succeed if he becomes aware of the disdain in which he is held by Conservative MPs as well as the Opposition. They have made it abundantly clear that the offer of a state visit is not intended to honour him. Rather they regard him as a petulant but dangerous despot, who might be bribed to behave well by the offer of some pomp and ceremony.

The petition to withdraw the invitation has been voted down, but it has nevertheless succeeded by forcing the Conservatives to make public just how much they despise the US President.

Saturday, 18 February 2017

The alt-right guide to fielding conference questions

After watching this interview between BBC Newsnight's Evan Davies and Sebastian Gorka, Deputy Assistant to Donald Trump, I realised I'd been handling conference questions all wrong. Gorka, who is a former editor of Breitbart News, gives a virtuoso performance that illustrates every trick in the book for coming out on top in an interview: smear the questioner, distract from the question, deny the premises, and question the motives behind a difficult question. Do everything, in fact, except give a straight answer. Here's what a conference Q&A session might look like if we all mastered these useful techniques.

ED: Dr Gorka, you claim that you can improve children's reading development using a set of motor exercises. But the data you showed on slide 3 don't seem to show that.

SG: That question is typical of the kind of bias from people working at British Universities. You seem hell-bent on discrediting any view that doesn't agree with your own preconceived position.

ED: Er, no. I just wondered about slide 3. Is the difference between those two numbers statistically significant?

SG: Why are people like you so obsessed with trivial details? Here we are showing marvellous improvements in children's reading, and all you can do is to pick away at a minor point.

ED: Well, you could answer the question? Are those numbers significantly different?

SG: It's not as if you and your colleagues have any expertise in statistics. The last talk by your colleague Dr Smith was full of mistakes. She actually did a parametric test in a situation that called for a nonparametric test.

ED: But can we get back to the question of whether your intervention had a significant effect.

SG: Of course it did. It's an enormous effect. And that's only part of the data. I've got lots of other numbers that I haven't shown here. And if we got to slide 3, just look at those bars: the red one is much higher than the blue one.

ED: But where are the error bars?

SG: That's just typical of you. Always on the attack. Look at the language you are using. I show you all the results in a nice bar chart, and all you can do is talk about error. Don't you ever think of anything else?

ED: Well, I can see we aren't going to get anywhere with that question, so let me try another one. Your co-author, Dr Trump, said that the children in your study all had dyslexia, whereas in your talk you said they covered the whole range of reading ability. That's rather confusing. Can you tell us which version is correct?

SG: There you go again. Always trying to pick holes in everything we do. Seems you're just jealous because your own reading programs don't have anything like this effect.

ED: But don't you think it discredits your study if you can't give a straight answer to a simple question?

SG: So this is what we get, ladies and gentleman. All the time. Fake challenges and attempts to discredit us.

ED: Well, it's a straightforward question. Were they dyslexic or not?

SG: Some of them were, and some of them weren't.

ED: How many? Dr Trump said all of them were dyslexic.

SG: You'll have to ask him. I've got parents falling over themselves to get their children enrolled, and I really don't have time for this kind of biased questioning.

Chair: Thank you Dr Gorka. We have no more time for questions.

Friday, 17 February 2017

We know what's best for you: politicians vs. experts

I regard politicians as a much-maligned group. The job is not, after all, particularly well paid, when you consider the hours that they usually put in, the level of scrutiny they are subjected to, and the high-stakes issues they must grapple with. I therefore start with the assumption that most of them go into politics because they feel strongly about social or economic issues and want to make a difference. Although being a politician gives you some status, it also inevitably means you will be subjected to abuse or worse. The murder of Jo Cox led to a brief lull in the hostilities, but it's resumed with a vengeance as politicians continue to grapple with issues that divide the nation and that people feel strongly about. It seems inevitable, then, that anyone who stays the course must have the hide of a rhinoceros, and so by a process of self-selection, politicians are a relatively tough-minded lot. 

I fear, though, that in recent years, as the divisions between parties have become more extreme, so have the characteristics of politicians. One can admire someone who sticks to their principles in the face of hostile criticism; but what we now have are politicians who are stubborn to the point of pig-headedness, and simply won't listen to evidence or rational argument. So loath are they to appear wavering, that they dismiss the views of experts.

This was most famously demonstrated by the previous justice secretary, Michael Gove, who, when asked if any economists backed Brexit, replied "people in this country have had enough of experts". This position is continued by Theresa May as she goes forth in the quest for a Hard Brexit.

Then we have the case of the Secretary of State for Health, Jeremy Hunt, who has repeatedly ignored expert opinion on the changes he has introduced to produce a 'seven-day NHS'. The evidence he cited for the need for the change was misrepresented, according to the authors of the report, who were unhappy with how their study was being used. The specific plans Hunt proposed were described as 'unfunded, undefined and wholly unrealistic' by the British Medical Association, yet he pressed on.

At a time when the NHS is facing staff shortages, and as Brexit threatens to reduce the number of hospital staff from the EU, he has introduced measures that have led to demoralisation of junior doctors. This week he unveiled a new rota system that has a mix of day and night shifts that had doctors, including experts in sleep, up in arms. It was suggested that this kind of rota would not be allowed in the aviation industry, and is likely to put the health of doctors as well as patients at risk.
A third example comes from academia, where Jo Johnson, Minister of State for Universities, Science, Research and Innovation, steadfastly refuses to listen to any criticisms of his Higher Education and Research Bill, either from academics or from the House of Lords. Just as with Hunt and the NHS, he starts from fallacious premises – the idea that teaching is often poor, and that students and employers are dissatisfied – and then proceeds to introduce measures that are designed to fix the apparent problem, but which are more likely to damage a Higher Education system which, as he notes, is currently the envy of the world. The use of the National Student Survey as a metric for teaching excellence has come under particularly sharp attack – not just because of poor validity, but also because the distribution of scores make it unsuited for creating any kind of league table: a point that has been stressed by the Royal Statistical Society, the Office for National Statistics, and most recently by Lord Lipsey, joint chair of the All Party Statistics Group.

Johnson's unwillingness to engage with the criticism was discussed recently at the Annual General Meeting of the Council for Defence of British Universities (where Martin Wolf gave a dazzling critique of the Higher Education and Research Bill from an expert economics perspective).  Lord Melvyn Bragg said that in years of attending the House of Lords he had never come across such resistance to advice. I asked whether anyone could explain why Johnson was so obdurate. After all, he is presumably a highly intelligent man, educated at one of our top Universities. It's clear that he is ideologically committed to a market in higher education, but presumably he doesn't want to see the UK's international reputation downgraded, so why doesn't he listen to the kind of criticism put forward in the official response to his plans by Cambridge University? I don't know the answer, but there are two possible reasons that seem plausible to me.

First, those who are in politics seldom seem to understand the daily life of people affected by the Bills they introduce. One senior academic told me that Oxford and Cambridge in particular do themselves a disservice when they invite senior politicians to an annual luxurious college feast, in the hope of gaining some influence. The guest may enjoy the exquisite food and wine, but they go away convinced that all academics are living the high life, and give only the occasional lecture between bouts of indulgence. Any complaints, thus, are seen as those coming from idle dilettantes who are out of touch with the real world and alarmed at the idea they may be required to do serious work. Needless to say, this may have been accurate in the days of Brideshead Revisited, but it could not be further from the truth today – in Higher Education Institutions of every stripe, academics work longer hours than the average worker (though fewer, it must be said, than the hard-pressed doctors).

Second, governments always want to push things through because if they don't, they miss a window of opportunity during their period in power. So there can be a sense of, let's get this up and running and worry about the detail later. That was pretty much the case made by David Willetts when the Bill was debated in the House of Lords:

These are not perfect measures. We are on a journey, and I look forward to these metrics being revised and replaced by superior metrics in the future. They are not as bad as we have heard in some of the caricatures of them, and in my experience, if we wait until we have a perfect indicator and then start using it, we will have a very long wait. If we use the indicators that we have, however imperfect, people then work hard to improve them. That is the spirit with which we should approach the TEF today.

However, that is little comfort to those who might see their University go out of business while the problems are fixed. As Baroness Royall said in response:

My Lords, the noble Lord, Lord Willetts, said that we are embarking on a journey, which indeed we are, but I feel that the car in which we will travel does not yet have all the component parts. I therefore wonder if, when we have concluded all our debates, rather than going full speed ahead into a TEF for everybody who wants to participate, we should have some pilots. In that way the metrics could be amended quite properly before everybody else embarks on the journey with us.

Much has been said about the 'post-truth' age in which we now live, where fake news flourishes and anyone's opinion is as good as anyone else's. If ever there was a need for strong universities as a source of reliable, expert evidence, it is now. Unless academics start to speak out to defend what we have, it is at risk of disappearing.

For more detail of the case against the TEF, see here.

Sunday, 8 January 2017

A common misunderstanding of natural selection

My attention was drawn today to an article in the Atlantic, entitled ‘Why Do Humans Still Have a Gene That Increases the Risk of Alzheimer’s?’ It noted that there are variants of the apoliprotein gene that are associated with an 8- to 12-fold increased risk of the disease. It continued:
“It doesn’t make sense,” says Ben Trumble, from Arizona State University. “You’d have thought that natural selection would have weeded out ApoE4 a long time ago. The fact that we have it at all is a little bizarre.”

The article goes on to discuss research suggesting there might be some compensating advantage to the Alzheimer risk gene variants in terms of protection from brain parasites.

That is as may be – I haven’t studied the research findings – but I do take issue with the claim that the persistence of the risk variants in humans is ‘a little bizarre’.

The quote indicates a common misunderstanding of how natural selection works. In evolution, what matters is whether an individual leaves surviving offspring. If you don’t have any descendants, then gene variants that are specific to you will inevitably disappear from the population. Alzheimer’s is an unpleasant condition that impairs ability to function independently, but the onset is typically long after  child-bearing years are over. If a disease doesn’t affect the likelihood that you have surviving children, then it is irrelevant as far as natural selection is concerned. As Max Coltheart replied when I tweeted about this: “evolution doesn't care about the cost of living in an aged-care facility”.