Tuesday, 25 July 2017

Breaking the ice with buxom grapefruits: Pratiques de publication and predatory publishing

Guest blogpost by 

Ryan McKay, Department of Psychology,  Royal Holloway University of London


Max Coltheart, Department of Cognitive Science, Macquarie University

These days it is common for academics to receive invitations from unfamiliar sources to attend conferences, submit papers, or join editorial boards. We began an attack against this practice by not ignoring such invitations – by, instead, replying to them with messages selected from the output of the wonderful Random Surrealism Generator. It generates syntactically correct but surreal sentences such as “Is that a tarantula in your bicycle clip, or are you just gold-trimmed?” (a hint of Mae West there?). This sometimes had the desired effect of generating a bemused response from the inviter; but we decided more was needed.

So we used the surrealism generator to craft an absurdist critique of “impaired” publication practices (the title of the piece says as much, albeit obliquely). The first few sentences seem relevant to the paper’s title but the piece then deteriorates rapidly into a sequence of surreal sentences (we threw in some gratuitous French and Latin for good measure) so that no one who read the paper could possibly believe that it was serious (our piece also quotes itself liberally); and we submitted the paper to a number of journals. Specifically, we submitted the paper to every journal that contacted either of us in the period 21 June 2017 to 1 July 2017 inviting us to submit a paper. There were 10 such invitations. We accepted all of them, and submitted the paper, making minor changes to the title of the paper and the first couple of sentences to generate the impression that the paper was somehow relevant to the interests of the journal; but the bulk of the paper was always the same sequence of surreal sentences.

While we were engaged in this exercise, the blogger Neuroskeptic was doing something similar: we describe that work below. Both of us were of course following the honourable tradition of  submissions as these by -->Peter Vamplew and Christoph Bartnek (More generally, there is a fine tradition of hoax articles intended as critiques of certain academic fields, e.g., postmodernism or theology).

What happened then?

All ten journals responded by informing us that our ms had been sent out for review.  We did not hear anything further from four of them. A fifth, the SM Journal of Psychiatry and Mental Health, eventually responded “The ms was plagiarized so please make some changes to the content”. We did not respond to this request, nor to a subsequent request for resubmission. 

The Scientific Journal of Neurology & Neurosurgery responded by telling us that our paper had been peer-reviewed; the reviewer praised our “scientific methodology” but chided us about our poor English (specifically, they said “English should be rewritten, it is necessary a correction of typing errors (spaces)”). We ignored this advice and resubmitted. However, the journal then noticed the similarity with the article we had submitted to the International Journal of Brain Disorders and Therapy (see below for this), so ceased production of our article.

The paper was accepted by Psychiatry and Mental Disorders: “accepted for publication by our reviewers without any changes”, we were told.

The paper was accepted by Mental Health and Addiction Research, but at that point we were told that a publication fee was due. We protested on the ground that when we had been invited to submit there had been no mention of a fee, and we said that unless a full fee waiver was granted we would take our work to a more appreciative journal. In response, we were granted a full fee waiver, and our paper was published in the on-line journal.

The SM Journal of Disease Markers also accepted the paper, and sent us proofs, which we corrected and returned. At that point, we were told that an article processing fee of US$920 was due. We protested in the same way, asking for a full fee waiver. In response, they offered a reduced fee of $520. We did not respond, so this paper, although accepted, has not been published.

The tenth journal, the International Journal of Brain Disorders and Therapy, sent us one reviewer comment. The reviewer had entered into the spirit of the hoax by providing a review which was itself surrealistic. We incorporated this reviewer’s comment about Scottish Lithium Flying saucers and resubmitted, and the paper was accepted. The journal then noticed irregularities in some (but surprisingly not all) of the references. We replaced these problematic references with citations of recent and classic hoaxes (e.g., Kline & Saunders’ 1959 piece on “psychochemical symbolism”; Lindsay & Boyle’s recent piece on the “Conceptual Penis”), along with a citation of Pennycook et al’s article “On the reception and detection of pseudo-profound bullshit”. The paper was then published in the on-line journal.  Later this journal asked us for a testimonial about the review process, which we supplied: "The process of publishing this article was much smoother than we anticipated".

In sum: all ten journals to which we submitted the paper sent it out for review, even though any editor had only to read to the end of the first paragraph to come across this:
“Of course, neither cognitive neuropsychiatry nor cognitive neuropsychology is remotely informative when it comes to breaking the ice with buxom grapefruits. When pondering three-in-a-bed romps with broken mules, therefore, one must refrain, at all costs, from driving a manic-depressive lemon-squeezer through ham (Baumard & Brugger, 2016).”

Of these ten journals, two tentatively accepted the paper and four fully accepted it for publication. Two of these journals have already published it.

The blogger Neuroskeptic did this a little differently (see http://blogs.discovermagazine.com/neuroskeptic/2017/07/22/predatory-journals-star-wars-sting/#.WXbIstP5hTF ). A hoax paper entitled “Mitochondria: Structure, Function and Clinical Relevance” was prepared. It did not contain any nonsensical sentences, as our paper did, but its topic was the fictional cellular entities “midi-chlorians” (which feature in Star Wars). The paper was submitted to nine journals. Four accepted it. One of these charged a fee, which the author declined to pay; the other three charged no fee, and so the paper has been published in all three of these papers, the International Journal of Molecular Biology: Open Access (MedCrave), the Austin Journal of Pharmacology and Therapeutics (Austin) and American Research Journal of Biosciences (ARJ). In order to know that this paper was nonsense, one would need some knowledge of cell biology. But our paper is blatantly nonsensical to any reader; and yet it boasted an acceptance rate very similar to that of Neuroskeptic’s paper.

What can be learned from our exercise? Several things:

(a) It is clear that with these journals there is no process by which a submission is initially read by an editor to decide whether the paper should be sent out for review, because our paper could not possibly have survived any such inspection.

(b)  But nor should our paper have survived any serious review process, since any reviewer reading the paper would have pointed out its nonsensical content. Only twice did a journal send us feedback from a reviewer, one which said we should discuss Lithium Flying Saucers, and one which seemed suspect to us because its criticism of our English was expressed in such poor English.

(c) In contrast to this apparent lack of human intervention in the article-handling process, there was some software intervention: some of these journals appear routinely to apply plagiarism-detection software to submitted articles

(d) What’s in this for the journals? We assumed that they exist solely to make money by charging authors. We presume that, just as they attempt to build apparently legitimate editorial boards (see here), these journals will sometimes waive their fees so as to get some legitimate-seeming articles on their books, the better to entice others to submit.

Sunday, 2 July 2017

The STEP Physical Literacy programme: have we been here before?

One day in 2003, I turned on BBC Radio 4 and found myself listening to an interview on the Today Programme with Wynford Dore, the founder of an educational programme that claimed to produce dramatic improvements in children's reading and attentional skills. The impetus for the programme was a press release of a study published in the journal Dyslexia, reporting results from a trial of the programme with primary school-children.  The interview seemed more like an advertisement than a serious analysis, but the consequent publicity led many parents to sign up for the programme, both in the UK and in other countries, notably Australia.

The programme involved children doing two 10-minute sessions per day of exercises designed to improve balance and eye-hand co-ordination. These were personalised to the child, so that the specific exercises would be determined by level of progress in particular skills. The logic behind the approach was that these exercises trained the cerebellum, a part of the brain concerned with automatizing skills. For instance, when you first play the piano or drive a car, it is slow and effortful, but after practice you can do it automatically without thinking about it. The idea was that cerebellar training would lead to a general cerebellar boost, helping other tasks, such as reading, to become more automatic.

Various experts who were on the editorial board of Dyslexia were unhappy with the quality of the research and asked for the paper to be retracted. When no action was taken, a number of them resigned. In 2007, I published a detailed critique of the study, which by that time had been complemented by a follow-up – which had prompted further editorial resignations.
Meanwhile, Wynford Dore, who had considerable business acumen, continued to promote the Dore Programme, writing a popular book describing its origins, and signing up celebrities to endorse it. Among these were rugby legends Kenny Logan and Scott Quinnell. In addition, Dore was in conversations with the Welsh Assembly about the possibility of rolling the programme out in Welsh schools. He had also persuaded Conservative MP Christopher Chope that the Dore programme was enormously effective but was being suppressed by government.
Various bloggers were interested in the amazing uptake of the Dore Programme, and in 2008, Ben Goldacre wrote a trenchant piece on his Bad Science blog, noting among other things that Kenny Logan was paid for some of his promotional work. The nail in the coffin of the Dore Programme was an Australian documentary in the Four Corners series, which included interviews with Dore, some of his customers, and scientists who had been involved both in the evaluation and the criticisms. The Dore business, which had been run as a franchise, collapsed, leaving many people out of pocket: parents who had paid up-front for a long-term intervention course, and staff at Dore centres, who found themselves out of a job.
The Dore programme did not die completely, however. Scott Quinnell continued to market a scaled-down version of the programme through his company Dynevor, but was taken to task by the Advertising Standards Authority for making unsubstantiated claims. Things then went rather quiet for a while.
This year, however, I have been contacted by concerned teachers who have told me about a new programme, STEP Physical Literacy, which is being promoted for use in schools, and which bears some striking similarities to Dore.  Here are some quotes from the STEP website:
  • Pupils undertake 2 ten minute exercise sessions at the start and end of each school day. The exercises focus on the core skills of balance, eye-tracking and coordination.
  • STEP is a series of personalised physical exercises that stimulate the cerebellum to function more efficiently.
  • The STEP focus is on the development of physical capabilities that should be automatic such as standing still, riding a bike or following words on a page.
In addition, STEP Physical Literacy is being heavily promoted by Kenny Logan, who features several times on the News section of the website.
As with Dore, STEP has been promoted to politicians, who argue it should be introduced into schools. In this case, the Christopher Chope role is fulfilled by Liz Smith MSP, who appears to be sincerely convinced that Scotland's literacy problems can be overcome by having children take two 10 minute sessions out of lessons to do physical exercises.
On Twitter, Ben Goldacre noted that the directors of Dynevor CIC, overlap substantially with directors of Step2Progress, who own STEP. The registered address is the same for the two companies.
When asked about Dore, those involved with STEP deny any links. After I tweeted about this, I was emailed by Lucinda Roberts Holmes, Managing Director of STEP, to reassure me that STEP is not a rebranding of Dore, and to suggest we meet so she could "talk through the various pilots and studies that have gone on both in the UK and the US as well as future research RCTs planned with Florida State University and the University of Edinburgh." I love evidence, but I find it best to sit down with data rather than have a conversation, so I replied explaining that and saying I'd be glad to take a look at any written reports. So far nothing has materialised. I should add that I have not been able to find any studies on STEP published in the peer-reviewed literature, and the account of the pilot study and case studies on the STEP website does not given me confidence that these would be publishable in a reputable journal.
In short, the evidence to date does not justify introducing this intervention into schools: there's no methodologically adequate study showing effectiveness, and it carries both financial costs and opportunity costs to children. It's a shame that the field of education is so far behind medicine in its attitude to evidence, and that we have politicians who will consider promoting educational interventions on the basis of persuasive marketing. I suggest Liz Smith talks to the Education Endowment Foundation, who will be able to put her in touch with experts who can offer an objective evaluation of STEP Physical Literacy.

8th July 2017: Postscript. A response from STEP
I have had a request from Lucinda Roberts-Holmes, Managing Director of Step2Progress, to remove this blogpost on the grounds that it contains defamatory and inaccurate information. I asked for more information on specific aspects of the post that were problematic and obtained a very long response, which I reproduce in full below. Readers are invited to form their own interpretation of the facts, based on the response from STEP (in italics) and my comments on the points raised.

Preamble: To be clear your blog in its current form includes a number of statements which are factually incorrect. In particular, the suggestion that STEP is simply a reincarnation of the Dore programme is not true as I have already explained to you (see my email of 29 June). The fact that you chose to ignore that assurance and instead publish the blog is very concerning to us. The suggestion, also, that I had chosen not to reply to your email ("so far nothing has materialised") is, I am afraid, disingenuous particularly in circumstances where you did not even set a deadline in your email and you waited only 72 hours to post your blog. Had you, of course, waited to receive a response to your email, we would have explained the correct position to you. Similarly, had you carried out an objective comparison of the two programmes you would have noted the many differences between STEP and Dore and, more significantly, identified the fact that STEP makes absolutely none of the assertions about cures for Dyslexia and other learning difficulties or any other of the hypotheses that Wynford Dore concocted. They are not the same programme evidenced not least by the fact that STEP states its programme is not a SEN learning intervention.

Comment: a) I did not state in the blog that STEP is 'simply a reincarnation of the Dore Programme'. I said it bears some striking similarities to Dore.

b) I did not ignore Lucinda's reassurance that STEP is not a rebranding of Dore. On the contrary, I stated in the blogpost that I had received that reassurance from her.

c) I did not suggest that Lucinda had chosen not to reply to my email: I simply observed that I had not so far received a response. As my blogpost points out, I had made it clear in my initial email that I did not want her to 'explain the correct position' to me. I had specifically requested written reports documenting the evidence for effectiveness of STEP.

1. Despite what Ben Goldacre may believe, Kenny Logan (KL) was not paid by the Dore programme for "promotional work". He was, in fact, a paying customer of the programme who went from being unable to read at the start of the programme to being literate by the end of it. KL was happy to share his experience publicly and was very clear with Dore that he would not be paid to do this. Whilst it is true that in 2006, he was contracted and paid by Wynford Dore for his professional input into a sports programme that he was seeking to develop that is an entirely different matter. The suggestion that KL was only promoting the Dore programme for his own financial benefit is clearly defamatory of him (and indeed of us).

I asked Ben Goldacre about this. The claim about Logan's payment for promotional work was made in a Comment is Free article in the Guardian. Ben told me it all went through a legal review at the Guardian to ensure everything was robust, and no complaints were received from Kenny Logan at the time. If the claim is untrue, then Kenny Logan needs to take this up with the Guardian. It's unclear to me why Kenny Logan promoting Dore would be defamatory of STEP, given that STEP claims to have no association with Dore.

2. The fact that KL previously promoted the Dore programme also does not support the allegation that the STEP programme is the same as the Dore programme. They are very different programmes and we are a very different organisation to Dore. Incorrectly stating that KL was paid for the promotion of Dore and trying to draw an inference that therefore he is paid to promote STEP (which he is not) is also misleading.

Comment: I made no claims that Kenny Logan is paid to promote STEP. He is a shareholder in STEP2Progress, which is a different matter.

3. Dynevor was never "Scott Quinnell's Company". Dynevor was primarily owned by Tim Griffiths and was the organisation that purchased the intellectual property rights in Dore after it went bankrupt. Tim Griffiths had no prior connection to Wynford Dore or the Dore programme but did have an interest in the link between exercise and ability to learn. As many thousands of people had been left in a difficult position when Dore collapsed into administration having purchased a programme they could not continue the directors at Dynevor agreed to commit the funding necessary to allow those who wanted to continue the programme the opportunity to do so. Scott Quinnell had a shareholding of less than 1% in Dynevor. STEP has absolutely no association with Scott Quinnell.

Comment: The role of Scott Quinnell in Dynevor is not central to my description of Dore, but this account of his role seems disingenuous. According to Companies House, Quinnell was appointed as one of two Directors of Dynevor C.I.C in 2009, and his interest in the company in 2011 was 2.6% of the shareholding, at a time when Wynford Dore had a shareholding of 4.3%.

I have not claimed that Scott Quinnell has any relationship with STEP. My account of his dealings was to provide a brief history of the problems with Dore for readers unfamiliar with the background.

4. You refer to the claims Ben Goldacre has made on Twitter that the directors of Dynevor CIC "overlap substantially" with the directors of STEP. In fact, of the 8 Directors of Dynevor only 2 hold directorships at STEP. In any event that misses the point which is that none of the directors of STEP had any association with the Dore Programme prior to the purchase of intellectual property rights in 2009.

Comment: According to Companies House, the one 'active person with significant control' in Dynevor CIC is Timothy Griffiths, and the 'one active person with significant control' in STEP2Progress is Conor Davey. If I have understood this correctly, this is based on shareholdings. Timothy Griffiths is one of four Directors of STEP2Progress, and Conor Davey is the Chairman of Dynevor CIC. Dynevor CIC and STEP2Progress have the same postal address.

It wasn't quite clear if Lucinda was saying that Dynevor CIC is now disassociated from Dore, but if that is the case, it would be wise to update the company's LinkedIn Profile, which states that the company 'provides the Dore Programme to individual clients and schools around the UK and licences the rights to provide the Dore Programme in a number of overseas countries'.

5. It is not correct to state that STEP denies any links to the Dore programme. There is, of course, a link, as there is also to the work of Dr Frank Belgau and his studies into balametrics. There is also a link to other movement programmes such as Better Movers and Thinkers and Move to Learn. What we have said is that the STEP programme is not the Dore programme and we stand by this. You may seek to draw similarities between them as I could between apples and pears.

Comment: Nowhere in my blogpost did I state that STEP denies any links to the Dore programme.

Re Belgau: I have just done a search on Web of Science that returned no articles for either author = Belgau or topic = balametrics.

6. May I also ask how you can state that "the evidence to date does not justify introducing this intervention in to schools" when you have refused so far to meet with me or even seen the evidence or read the full Pilot Study? Have you asked any teachers or head teachers who have experience of delivering the STEP Programme whether they would recommend to their peers the use of the programme in their schools?

Comment: There is a fundamental misunderstanding here about how scientists evaluate evidence. If you want to find out whether an intervention is effective, the worst thing you can do is to talk to people who are convinced that it is. There are people who believe passionately in all sorts of things: the healing powers of crystals, the harms of vaccines, the benefits of homeopathy, or the evils of phonics instruction. They will, understandably, try to convince you that they are right, but they will not be objective. The way to get an accurate picture of what works is not by asking people what they think about it, but by doing well-controlled studies that compare the intervention with a control condition in terms of children's outcomes. It is for this reason that I have been asking for any hard evidence that STEP2Progress has from properly conducted studies or information about future-planned studies, which I am told are in the pipeline. I would love to read the full Pilot Study, but am having difficulty accessing it (see below).

7. You say in your blog "It is a shame that... We have politicians who will consider promoting educational interventions on the basis of persuasive marketing" Presumably this is a reference to Liz Smith MSP (LS) who you refer to separately in the blog? For your information, LS has read the full research report of the 2015/2016 Pilot Study as well as the other case studies. In light of that information, she has indicated the she is impressed with the STEP programme and that the Scottish Government should consider piloting it and looking more widely at the impact of physical literacy on academic attainment. At the point she expressed this view there had not been any marketing of the STEP programme in Scotland so I do not understand the evidence to support the statement you make in the blog.

Comment: In this regard Liz Smith has the advantage. Although Lucinda has now sent me three emails since my blogpost appeared, in none of them did she send me the reports I had initially requested. In my latest email I asked to see the 'full research report' that Liz Smith had access to. I got this reply from Lucinda:

Dear Dorothy,

Thank you for your email. With the greatest respect, I think the first step should be for you to correct or remove your blog and apologise for the inaccuracies I have outlined below. Alongside that I repeat my offer to come and talk you through the STEP programme and the studies that have been carried out so far. As I say, we are not the same programme as the Dore programme and it is wrong to allege otherwise.

Kind regards

Nevertheless, with her penultimate email, Lucinda attached a helpful Excel spreadsheet documenting differences between Dore and STEP, as follows:

Difference 1. The Dore Programme was a paper book of 100 exercises followed sequentially. Dore's assertions that they were personalised were untrue. STEP software contains over 350 exercises delivered through an adaptive learning software platform that is individualised to the child based on previous performance. The Programme also contains 10 minutes of 1-1 time with each pupil twice per day (nurture) and involves pupils overcoming a series of physical challenges (resilience) in a non class-competitive environment (success cycle) which displays their commitment levels (engagement) and is overseen by committed members of staff who also work with them in the classroom (mentoring and translational trust building).

Comment: The question of interest is where do these exercises come from? How were they developed? Usually for an adaptive learning process, one needs to do prior research to establish difficulty levels of items for children of different ages. I raised this issue with the original Dore programme: there is no published evidence of the kind of foundational work you'd normally expect for an educational programme. Readers will no doubt be intereted to hear that STEP has more exercises than Dore and delivers these in a specific, personalised sequence, but what is missing is a clear rationale explaining how and why specific exercises were developed. It would also be of interest to know how many of Dore's original 100 exercises are incorporated in STEP.

Difference 2. Dore was an exercise programme completed by adults and children at home supervised by untrained parents. STEP is delivered in schools and overseen by teaching staff trained through industry leader Professor Geraint Jones' teacher training programme. This also includes training on how to assess pupil performance.

Comment. If the intervention is effective, then standardized administration by teachers is a good thing. If it is not effective, then teachers should not be spending time and money being trained. Everything hinges on evidence for effectiveness (see below).

Difference 3. Dore asserted that the programme was a cure for dyslexia and and other learning difficulties. It further claimed to know the cause of these learning difficulties. STEP makes absolutely no assertions about Dyslexia, ADHD or other learning difficulties and absolutely no assertions about the medical cause for these.

Comment. I am sure that there are many people who will be glad to have the clarification that STEP is not designed to treat children with specific learning difficulties or dyslexia, as there appears to be some misunderstanding of this. This may in part be the consequence of Kenny Logan's involvement in promoting STEP. Consider, for instance, this piece in the Daily Mail, which first describes how Kenny's dyslexia was remediated by the Dore programme, and then moves to talk of his worries over his son Reuben, who was having difficulties in school:

"The answer was already staring him in the face, however, and within months, Kenny decided to try putting Reuben through a similar 'brain-training' technique to the one that transformed his own life just 14 years ago. Reuben, it transpired, had mild dyspraxia - a condition affecting mental and physical co-ordination - and the outcome for him has been so successful that Kenny is currently trying to persuade education chiefs to implement the technique in the country's worst-performing state schools, to raise attainment levels."

Another reason for confusion may be because the STEP home page lists the British Dyslexia Association as a partner and has features in the News section of its website on Dyslexia Awareness Month , on unidentified dyslexia, and a case study describing use of STEP with dyslexic children in Mississippi.

The transcript of the debate in the Scottish Parliament (scroll down to the section on Motion debated: That the Parliament is impressed by the STEP physical literacy programme) shows that many of the Scottish MPs who took part in the debate with Liz Smith were under the impression that STEP was a treatment for specific learning disabilities such as dyslexia and ADHD, as evident from these quotes:

Daniel Johnson: 'It is vital that we understand that there is a direct link between physical understanding, learning, knowledge and ability and educational ability. Overall - and specifically - there would be key benefits for people who have conditions such as ADHD and dyslexia... There is a growing body of evidence about the link between spatial awareness and physical ability and dyslexia. Likewise, the improvements on focus and concentration that exercises such as those that are outlined in the STEP programme can have for people with ADHD are clear. Improvements in those areas are linked not only to training the mind to concentrate, but to the impacts on brain chemistry.'

Elaine Smith: With regard to STEP, we have already heard that it is a programme of exercises performed twice a day for 10 minutes and focuses in particular on balance, eye tracking and co-ordination with the aim of making physical activity part of children's everyday learning. Improving physical literacy is particularly advantageous for children and young people who can find it difficult to concentrate, such as those with dyslexia and autism... STEP also has the backing of the British Dyslexia Association, which supported the findings of the pilot study.

Shirley-Anne Somerville: We are aware that the STEP programme has been promoted for children who have dyslexia.

Difference 4. Dore claimed that completing the exercises would repair a damaged or underdeveloped cerebellum. It is known that repetitive physical exercises stimulate the cerebellum but STEP makes no assertions of science that any physiological changes take place. STEP involves using repetitive physical exercises to embed actions and make them automatic.

Comment: It is good to see that some of the more florid claims of Dore are avoided by STEP, but the fact remains that the underlying theory is similar, namely that cerebellar training will improve skills beyond motor skills. The idea that training motor skills will produce effects that generalise to other aspects of development is is dubious because the cerebellum is a complex organ subserving a range of functions and controlled studies typically find that training effects are task-specific. I discussed these issues in relation to the Dore programme here.

Specific statements about the cerebellum on the STEP website are:

'After going on national television to tell his heart-breaking story about facing up to the frustrations of overcoming a childhood stumbling block bigger than Mount Everest, Kenny (Logan) is determined to highlight the positive effects of using cerebellum specific teaching and learning programmes in primary school settings.'

And on this page of the website we hear: 'In the last century, academics experimenting with balametrics, dance and movement, established that specifically stimulating the cerebellum through exercise improves skill automation. The STEP Programme is built upon this foundation.'

Difference 5. Dore was a "medical" treatment that required participants to regularly visit treatment centres for "medical" evaluations to determine whether their learning difficulty was being cured. STEP is a primary school physical literacy programme delivered by teaching assistants or other teaching staff. It is to date shown to be most impactful on the lower quartile of the classroom in terms of academic improvement.

This is a rather odd interpretation of the Dore programme, which perhaps is signalled by the use of quotes around 'medical'. I never had the impression it was medical ╨ it was not prescribed or administered by doctors. It is true that Dore did establish centres for assessment and this proved to be a major reason for its commercial failure: there were substantial costs in premises, staffing and equipment. But there was no necessity to run the intervention that way: some people at the time of the collapse suggested it would be feasible to offer the exercises over the internet at much lower cost.

The second point, re the greatest benefits for the lower quartile of the classroom, is on the one hand of potential interest, but on the other hand raises the concern that the benefits could be a classic case of the regression to the mean. This is one of many ways in which scores can improve on an outcome measure for spurious reasons - which is why you need proper randomised controlled trials. Improvements are largely uninterpretable without these because increases in scores can arise because of practice, maturation, regression to the mean or placebo effects.

Difference 6. Dore determined "progress" and "cure" via a series of physical assessments. STEP empirically measures the academic progress of pupils with baseline data and presents reports against actual physical skills developed inviting schools to draw their own conclusions in the context of their school setting.

Comment. Agree that Dore's method of measuring progress and cure was a major problem, because a child could improve on the measures of balance and eye-hand co-ordination and be deemed 'cured' even though their reading had not improved at all. But the account of STEP sounds too vague to evaluate - and the evidence on their website from the pilot study is so underspecified as to be uninterpretable. It is not clear what the measures were, and which children were involved in which measures. I would like to see the full report to have a clearer idea of the methods and results.

Difference 7. Dore claimed that the exercises were developed and delivered in a formulaic manner that was a trade secret. STEP focuses on determining whether a pupils core physical capabilities in balance, eye tracking and coordination. There is no secret formula or claims of one. The genesis of STEP is in balametrics as well as other movement programmes such as Better Movers and Thinkers https://www.ncbi.nlm.nih.gov/pubmed/27247688 and Move to Learn https://www.movetolearn.com.au/research/

Comment. In STEP, how are the scores on core physical capabilities standardized for age and sex? This refers back to my earlier comment about the development work needed to underpin an effective programme. The impression is that people in this field borrow ideas from previous programmes but there is no serious science behind this.

Difference 8. The Dore Programme cost over £2000 per person and was paid for individually. STEP costs £365 per year per child and is completed over 2 years. It is largely paid for through schools that have the discretion to ask parents to fund the programme if it is an additional intervention being offered. STEP also commits a significant number of places to schools free of charge. The fee includes year round school support

Comment. Good to have the differences in charging methods clarified.

Difference 9. Dore published research based around a single school with hypotheses relating to the cerebellum and dyslexia that could not be substantiated. It used dyslexic tendencies as a measure of improvement and selection. STEP as an organisation is wholly open to independent research and evaluation. Its initial pilot study was designed and led by the IAPS Education Committee and conducted by Innovation Bubble, led by Dr Simon Moore, University of Middlesex and Chartered Psychologist. It was held across 17 schools. Further pilot studies have taken place carried out by education districts in Mississippi and ESCCO as well as independent case studies. These have always been presented openly and in the context they were compiled. STEP believes it has sufficient evidence to warrant a large scale evaluation of the Programme.

Comment. In the context of intervention evaluation, quantity of research does not equate with quality. Here is Wikipedia's definition of a pilot study: 'A small scale preliminary study conducted in order to evaluate feasibility, time, cost, adverse events, and effect size (statistical variability) in an attempt to predict an appropriate sample size and improve upon the study design prior to performance of a full-scale research project.' I agree that a large-scale evaluation of the Programme is warranted. It's a bit odd to say the results have been presented openly while at the same time refusing to send me reports unless I take down my blogpost.

It is clear that the MSPs in the debate in the Scottish Parliament were all, without exception, convinced that we already had evidence for the effectiveness of STEP. If they based these impressions on the information on the STEP website (as suggested by Liz Smith's initial statement), then this is worrying, as this came from the pilot study, where the methods were not clearly described, and the description of the results is unclear and looks incomplete, or from uncontrolled case studies.

Here are some of the statements from MSPs:

Liz Smith: As members know, the programme has been used successfully in both England and the United States, and it has been empirically evidenced to reduce the attainment gap in primary school pupils. Pupils who have completed STEP have shown significant improvements academically, behaviourally, physically and socially. A United Kingdom pilot last year compared more than 100 below-attainment primary school pupils who were on the STEP programme to a group of pupils at the same attainment level who were not. The improved learning outcomes that the study showed are extremely impressive: 86 per cent of pupils on the programme moved to on or above target in reading, compared with 56 per cent of the non-STEP group; 70 per cent of STEP pupils met their target for maths, compared with 30 per cent of the non-STEP group; and 75 per cent and 62 per cent of STEP pupils were on or above target for English comprehension and spelling respectively, compared with 43 per cent and 30 per cent of the non-STEP group.
In Mississippi, in the USA, more than 1,000 pupils have completed the programme over the past three years, and it is no coincidence that that state has seen significant improvement in fourth grade - which is the equivalent of P6 - reading and maths, which has resulted in the state being awarded a commendation for educational innovation.

Brian Whittle: The STEP programme is tried and tested, with measured physical, emotional and academic outcomes, especially in the lower percentiles.

Daniel Johnson: Perhaps most impressive is the STEP programme's achievements on academic improvement╤it has led to improved English for 76 per cent of participants, and to improved maths, reading and spelling for 70 per cent of participants. The benefits that physical literacy can bring to academic attainment are clear.

Oliver Mundell: the STEP programme has been shown to work and is popular with both the teachers and the pupils who have benefited from it in England and the USA.

Conclusion This has been a very long postscript, but it seems important to be clear about what the objections to STEP are. I have not claimed that STEP is exactly the same as Dore. My sense of déjà vu arises because of the similarities, in the people involved, in the use of cerebellar exercises involving balance and eye-hand coordination delivered in short sessions, and in the successful promotion of the programme to politicians and schools in the absence of adequate peer-reviewed evidence. Given that the basic theory does not have strong scientific plausibility, this latter point that is the source of greatest concern. We can agree that we all want children to succeed in school and any method that can help them achieve this is to be welcomed. There is also, however, a need for better education of our politicians, so that they are equipped to evaluate evidence properly. They have a responsibility to ensure we do the best for our children, but this requires a critical mindset.

Saturday, 17 June 2017

Prospecting for kryptonite: the value of null results

This blogpost doesn't say anything new – it just uses a new analogy (at least new to me) to make a point about the value of null results from well-designed studies. I was thinking about this after reading this blogpost by Anne Scheel.

Think of science like prospecting for kryptonite in an enormous desert. There's a huge amount of territory out there, and very little kryptonite. Suppose also that the fate of the human race depends crucially on finding kryptonite deposits.

Most prospectors don't find kryptonite. Not finding kryptonite is disappointing: it feels like a lot of time and energy has been wasted, and the prospector leaves empty-handed. But the failure is nonetheless useful. It means that new prospectors won't waste their time looking for kryptonite in places where it doesn't exist.  If, however, someone finds kryptonite, everyone gets very excited and there is a stampede to rush to the spot where it was discovered.

Contemporary science works a bit like this, except that the whole process is messed up by reporting bias and poor methods which lead to false information.

To take reporting bias first: suppose the prospector who finds nothing doesn't bother to tell anyone. Then others may come back to the same spot and waste time also finding nothing. Of course, some scientists are like prospectors in that they are competitive and would like to prevent other people from getting useful information. Having a competitor bogged down in a blind alley may be just what they want for their rivals. But where there is an urgent need for new discovery, there needs to be a collaborative rather than competitive approach, to speed up discovery and avoid waste of scarce funds. In this context, null results are very useful.

False information can come from the prospector who declares there is no kryptonite on the basis of a superficial drive through a region. This is like the researcher who does an underpowered study that gets an inconclusive null result. It doesn't allow us to map out the region with kryptonite-rich and kryptonite-empty areas – it just leaves us having to go back and look again more thoroughly. Null results from poorly designed studies are not much use to anyone.

But the worst kind of false information is fool's kryptonite: someone declares they have found kryptonite, but they haven't. So everyone rushes off to that spot to try and find their own kryptonite, only to find they have been deceived. So there are a lot of wasted resources and broken hearts. For a prospector who has been misled in this way, this situation is worse than just not finding any kryptonite, because their hopes have been raised and they may have put a disproportionate amount of effort and energy into pursuing the false information.

Pre-registering a study is the equivalent of a prospectors declaring publicly that they are doing a comprehensive survey of a specific region, and will declare what they have found, so that the map can gradually be filled in, with no duplication of effort.

Some will say, what about exploratory research? Of course the prospector may hit lucky and find some other useful mineral that nobody had anticipated. If so, that's great, and it may even turn out more important than kryptonite. But the point I want to stress is that the norm for most prospectors is that they won't find kryptonite or anything else. Really exciting findings occur rarely, yet our current incentive structures create the impression that you have to find something amazing to be valued as a scientist.  It would make more sense to reward those who do a good job of prospecting, producing results that add to our knowledge and can be built upon.

I'll leave the last word to Ottoline Leyser, who in an interview for The Life Scientific said: "There's an awful lot of talk about ground-breaking research…. Ground-breaking is what you do when you start a building. You go into a field and you dig a hole in the ground. If you're only rewarded for ground-breaking research, there's going to be a lot of fields with a small hole in, and no buildings."

Sunday, 28 May 2017

Which neuroimaging measures are useful for individual differences research?

The tl;dr version

A neuroimaging measure is potentially useful for individual differences research if variation between people is substantially greater than variation within the same person tested on different occasions. This means that we need to know about the reliability of our measures, before launching into studies of individual differences.
High reliability is not sufficient to ensure a good measure, but it is necessary.

Individual differences research

Psychologists have used behavioural measures to study individual differences - in cognition and personality - for many years. The goal is complementary to psychological research that looks for universal principles that guide human behaviour: e.g. factors affecting learning or emotional reactions. Individual differences research also often focuses on underlying causes, looking for associations with genetic, experiential and/or neurobiological differences that could lead to individual differences.

Some basic psychometrics

Suppose I set up a study to assess individual differences in children’s vocabulary. I decide to look at three measures.
  • Measure A involves asking children to define a predetermined set of words, ordered in difficulty, and scoring their responses by standard criteria.
  • Measure B involves showing the child pictured objects that have to be named.
  • Measure C involves recording the child talking with another child and measuring how many different words they use.
For each of these measures, we’d expect to see a distribution of scores, so we could potentially rank order children on their vocabulary ability. But are the three measures equally good indicators of individual differences?

We can see immediately one problem with Test B: the distribution of scores is bunched tightly, so it doesn’t capture individual variation very well. Test C, which has the greatest spread of scores, might seem the most suitable for detecting individual variation. But spread of scores, while important, is not the only test attribute to consider. We also need to consider whether the measure assesses a stable individual difference, or whether it is influenced by random or systematic factors that are not part of what we want to measure.

There is a huge literature addressing this issue, starting with Francis Galton in the 19th century, with major statistical advances in the 1950s and 1960s (see review by Wasserman & Bracken, 2003). The classical view treats test scores as a compound, with a ‘true score’ part, plus an ‘error’ part. We want a measure that minimises the impact of random or systematic error.

If there is a big influence of random error, then the test score is likely to change from one occasion to the next. Suppose we measure the same children on two occasions a month apart on three new three tests, and then plot scores on time 1 vs time 2. (To simplify this example, we assume that all three tests have the same normal distribution of scores - the same as for test A in Figure 1, and there is an average gain of 10 points from time 1 to time 2).

Figure 2

We can see that Test F is not very reliable: although there is a significant association between the scores on two test occasions, individual children can show remarkable changes from time to time. If our goal is to measure a reasonably stable attribute of the person, then Test F is clearly not suitable. aov
Just because a test is reliable, it does not mean it is valid. But if it is not reliable, then it won’t be valid. This is illustrated by this nice figure from https://explorable.com/research-methodology:

What about change scores?

Sometimes we explicitly want to measure change: for instance, we may be more interested in how quickly a child learns vocabulary, rather than how much they know at some specific point in time. Surely, then, we don’t want a stable measure, as it would not identify the change? Wouldn’t test F be better than D or E for this purpose?

Unfortunately, the logic here is flawed. It’s certainly possible that people may vary in how much they change from time to time, but if our interest is in change, then what we want is a reliable measure of change. There has been considerable debate in the psychological literature as to how best to establish the reliability of a change measure, but the key point is that you can find substantial change in test scores that is meaningless, and that the likelihood of it being meaningless is substantial if the underlying measure is unreliable. The data in Figure 2 were simulated by assuming that all children changed by the same amount from Time 1 to Time 2, but that tests varied in how much random error was incorporated in the test score. If you want to interpret a change score as meaningful, then the onus is on you to convince others that you are not just measuring random error.

What does this have to do with neuroimaging?

My concern with the neuroimaging literature, is that measures from functional or structural imaging are often used to measure individual differences, but it is rare to find any mention of reliability of those measures. In most cases, we simply don’t have any data on repeated testing using the same measures - or if we do, the sample size is too small, or too selected, to give a meaningful estimate of reliability. Such data as we have don’t inspire confidence that brain measurements achieve high level of reliability that is aimed for in psychometric tests. This does not mean that these measures are not useful, but it does make them unsuited for the study of individual differences.

I hesitated about blogging on this topic, because nothing I am saying here is new: the importance of reliability has been established in the literature on measurement theory since 1950. Yet, when different subject areas evolve independently, it seems that methodological practices that are seen as crucial in one discipline can be overlooked in another that is rediscovering the same issues but with different metrics.

There are signs that things are changing, and we are seeing a welcome trend for neuroscientists to start taking reliability seriously. I started thinking about blogging on this topic just a couple of weeks ago after seeing some high-profile papers that exemplified the problems in this area, but in that period, there have also been some nice studies that are starting to provide information on reliability of neuroscience measures. This might seem like relatively dull science to many, but to my mind it is a key step towards incorporating neuroscience in the study of individual differences. As I commented on Twitter recently, my view is that anyone who wants to using a neuroimaging measure as an endophenotype should first be required to establish that it has adequate reliability for that purpose.

Further reading

This review by Dubois and Adolphs (2016) covers the issue of reliability and much more, and is highly recommended.
Other recent papers of relevance:
Geerligs, L., Tsvetanov, K. A., Cam-CAN, Henson, R. N. 2017 Challenges in measuring individual differences in functional connectivity using fMRI: The case of healthy aging. Human Brain Mapping
Nord, C. L., Gray, A., Charpentier, C. J., Robinson, O. J., Roiser, J. P. 2017 Unreliability of putative fMRI biomarkers during emotional face processing.Neuroimage.

Note: Post updated on 17th June 2017 because figures from R Markdown html were not displaying correctly on all platforms.

Monday, 1 May 2017

Reproducible practices are the future for early career researchers

This post was prompted by an interesting exchange on Twitter with Brent Roberts (@BrentWRoberts) yesterday. Brent had recently posted a piece about the difficulty of bringing about change to improve reproducibility in psychology, and this had led to some discussion about what could be done to move things forward. Matt Motyl (@mattmotyl) tweeted:

I had one colleague tell me that sharing data/scripts is "too high a bar" and that I am wrong for insisting all students who work w me do it

And Brent agreed:

We were recently told that teaching our students to pre-register, do power analysis, and replicate was "undermining" careers.

Now, as a co-author of a manifesto for reproducible science, this kind of thing makes me pretty cross, and so I weighed in, demanding to know who was issuing such rubbish advice. Brent patiently explained that most of his colleagues take this view and are skeptics, agnostics or just naïve about the need to tackle reproducibility. I said that was just shafting the next generation, but Brent replied:

Not as long as the incentive structure remains the same.  In these conditions they are helping their students.

So things have got to the point where I need more than 140 characters to make my case. I should stress that I recognise that Brent is one of the good guys, who is trying to make a difference. But I think he is way too pessimistic about the rate of progress, and far from 'helping' their students, the people who resist change are badly damaging them.  So here are my reasons.

1.     The incentive structure really is changing. The main drivers are funders, who are alarmed that they might be spending their precious funds on results that are not solid. In the UK, funders (Wellcome Trust and Research Councils) were behind a high profile symposium on Reproducibility, and subsequently have issued statements on the topic and started working to change policies and to ensure their panel members are aware of the issues. One council, the BBSRC, funded an Advanced Workshop on Reproducible Methods this April. In the US, NIH has been at the forefront of initiatives to improve reproducibility. In Germany, Open Science is high on the agenda.
2.     Some institutions are coming on board. They react more slowly than funders, but where funders lead, they will follow. Some nice examples of institution-wide initiatives toward open, reproducible science come from the Montreal Neurological Institute and the Cambridge MRC Cognition and Brain Sciences Unit. In my own department, Experimental Psychology at the University of Oxford, our Head of Department has encouraged me to hold a one-day workshop on reproducibility later this year, saying she wants our department to be at the forefront of improving psychological science.

3.     Some of the best arguments for working reproducibly have been made by Florian Markowetz. You can read about them on this blog, see him give a very entertaining talk on the topic here, or read the published paper here. So there is no escape. I won't repeat his arguments here, as he makes them better than I could, but his basic point is that you don't need to do reproducible research for ideological reasons: there are many selfish arguments for adopting this approach – in the long run it makes your life very much easier.

4.     One point Florian doesn't cover is pre-registration of studies. The idea of a 'registered report', where your paper is evaluated, and potentially accepted for publication, on basis of introduction and methods was introduced with the goal of improving science by removing publication bias, p-hacking and HARKing (hypothesising after results are known). You can read about it in these slides by Chris Chambers. But when I tried this with a graduate student, Hannah Hobson, I realised there were other huge benefits. Many people worry that pre-registration slows you down. It does at the planning stage, but you more than compensate for that by the time saved once you have completed the study. Plus you get reviewer comments at a point in the research process when they are actually useful – i.e. before you have embarked on data collection. See this blogpost for my personal experience of this.

5.     Another advantage of registered reports is that publication does not depend on getting a positive result. This starts to look very appealing to the hapless early career researcher who keeps running experiments that don't 'work'. Some people imagine that this means the literature will become full of boring registered reports with null findings that nobody is interested in. But because that would be a danger, journals who offer registered reports impose a high bar on papers they accept – basically, the usual requirement is that the study is powered at 90%, so that we can be reasonably confident that a negative result is really a null finding, and not just a type II error. But if you are willing to put in the work to do a well-powered study, and the protocol passes scrutiny of reviewers, you are virtually guaranteed a publication.

6.     If you don't have time or inclination to go the whole hog with a registered report, there are still advantages to pre-registering a study, i.e. depositing a detailed, time-stamped protocol in a public archive. You still get the benefits of establishing priority of an idea, as well as avoiding publication bias, p-hacking, etc. And you can even benefit financially: the Open Science Framework is running a pre-registration challenge – they are giving $1000 to the first 1000 entrants who succeed in publishing a pre-registered study in a peer-reviewed journal.

7.     The final advantage of adopting reproducible and open science practices is that it is good for science. Florian Markowetz does not dwell long on the argument that it is 'the right thing to do', because he can see that it has as much appeal as being told to give up drinking and stop eating Dunkin Donuts for the sake of your health. He wants to dispel the idea that those who embrace reproducibility are some kind of altruistic idealists who are prepared to sacrifice their careers to improve science. Given arguments 1-6, he is quite right. You don't need to be idealistic to be motivated to adopt reproducible practices. But it is nice when one's selfish ambitions can be aligned with the good of the field. Indeed, I'd go further and suggest that I've long suspected that this may relate to the growing rates of mental health problems among graduate students and postdocs: many people who go into science start out with high ideals, but are made to feel they have to choose between doing things properly vs. succeeding by cutting corners, over-hyping findings, or telling fairy tales in grant proposals. The reproducibility agenda provides a way of continuing to do science without feeling bad about yourself.

Brent and Matt are right that we have a problem with the current generation of established academic psychologists, who are either hostile to or unaware of the reproducibility agenda.  When I give talks on this topic, I get instant recognition of the issues by early career researchers in the audience, whereas older people can be less receptive. But what we are seeing here is 'survivor bias'. Those who are in jobs managed to succeed by sticking to the status quo, and so see no need for change. But the need for change is all too apparent to the early career researcher who has wasted two years of their life trying to build on a finding that turns out to be a type I error from an underpowered, p-hacked study. My advice to the latter is don't let yourself be scared by dire warnings of the perils of working reproducibly. Times really are changing and if you take heed now, you will be ahead of the curve.

Sunday, 23 April 2017

Sample selection in genetic studies: impact of restricted range

I'll shortly be posting a preprint about methodological quality of studies in the field of neurogenetics. It's something I've been working on with a group of colleagues for a while, and we are aiming to make recommendations to improve the field.

I won't go into details here, as you will be able to read the preprint fairly soon. Instead, what I want to do here is to expand on a small point that cropped up as I looked at this literature, and which I think is underappreciated.

It's to do with sampling. There's a particular problem that I started to think about a while back when I heard someone give a talk about a candidate gene study. I can't remember who it was or even what the candidate gene was, but basically they took a bunch of students, genotyped them, and then looked for associations between their genotypes and measures of memory. They were excited because they found some significant results. But I was, as usual, sitting there thinking convoluted thoughts about all of this, and wondering whether it really made sense. In particular, if you have a common genetic variant that has such a big effect on memory, would this really show up in a bunch of students – who are presumably people who have pretty good memories? Wouldn't it rather be the case that what you'd expect would be an alteration in the frequencies of genotypes in the student population?

Whenever I have an intuition like that, I find the best thing to do is to try a simulation. Sometimes the intuition is confirmed, and sometimes things turn out different and, very often, more complicated.

But this time, I'm pleased to say my intuition seems to have something going for it.

So here's the nuts and bolts.

I simulated genotypes and associated phenotypes by just using R's nice mvrnorm function. For the examples below, I specified that a and A are equally common (i.e. minor allele frequency is .5), so we have 25% as aa, 50% as aA, and 25% AA. The script lets you specify how closely these are related to the phenotype, but from what we know about genetics, it's very unlikely that a common variant would have a value more than about .25.

We can then test for two things:
1)  How far does the distribution of genotypes in the sample (i.e. people who are aa, aA or AA) resemble that in the general population? If we know that MAF is .5, we expect this distribution to be 1:2:1.
2) We can assign each person a score corresponding to number of A alleles (coding aa as zero, aA as 1, and AA as 2) and look at the regression of the phenotype on the genotype. That's the standard approach to looking for genotype-phenotype association.

If we work with the whole population of simulated data, these values will correspond to those that we specified in setting up the simulation, provided we have a reasonably large sample size.

But what if we take a selective sample of cases who fall above some cutoff on the phenotype? This is equivalent to taking, for instance, a sample from a student population from a selective institution, when the phenotype is a measure of cognitive function. You're not likely to get into the institution unless you have a good cognitive ability. Then, working with this selected subgroup, we recompute our two measures, i.e. the proportions of each genotype, and the correlation between the genotype and the phenotype.

Now, the really interesting thing here is that, as the selection cutoff gets more extreme, two things happen:
a) The proportions of people with different genotypes starts to depart from the values expected for the population in general. We can test to see when the departure becomes statistically significant with a chi square test.
b) The regression of the phenotype on the genotype weakens. We can quantify this effect by just computing the p-value associated with the correlation between genotype and phenotype.

Figure 1: Genotype-phenotype associations for samples selected on phenotype

Figure 1 shows the mean phenotype scores for each genotype for three samples: an unselected sample, a sample selected with z-score cutoff zero (corresponding to the top 50% of the population on the phenotype) and a sample selected with z-score cutoff of .5 (roughly selecting the top third of the population).

It's immediately apparent from the figure that the selection dramatically weakens the association between genotype and phenotype. In effect, we are distorting the relationship between genotype and phenotype by focusing just on a restricted range. 

Comparison of p-values from conventional regression approach and chi square test on genotype frequencies in relation to sample selection

Figure 2 shows the data from another perspective, by considering the statistical results from a conventional regression analysis, when different z-score cutoffs are used, selecting an increasingly extreme subset of the population. If we take a cutoff of zero – in effect selecting just the top half of the population, the regression effect (predicting phenotype from genotype), shown in the blue line, which was strong in the full population, is already much reduced. If you select only people with z-scores of .5 or above (equivalent to an IQ score of around 108), then the regression is no longer significant. But notice what happens to the black line. This shows the p-value from a chi square test which compares the distribution of genotypes in relation to expected population values in each subsample. If there is a true association between genotype and phenotype, then greater the selection on the phenotpe, the more the genotype distribution departs from expected values. The specific patterns observed will depend on the true association in the population and on the sample size, but this kind of cross-over is a typical result.

So what's the moral of this exercise? Well, if you are interested in a phenotype that has a particular distribution in the general population, you need to be careful when selecting a sample for a genetic association study. If you pick a sample that has a restricted range of phenotypes relative to the general population, then you make it less likely that you will detect a true genetic association in a conventional regression analysis. In fact, if you take a selected sample, there comes a point when the optimal way to demonstrate an association is by looking for a change in the frequency of different genotypes in the selected population vs the general population.

No doubt this effect is already well-known to geneticists, and it's all pretty obvious to anyone who is statistically savvy, but I was pleased to be able to quantify the effect via simulations. It is clear that it has implications for those who work predominantly with selected samples such as university students. For some phenotypes, use of a student sample may not be a problem, provided they are similar to the general population in the range of phenotype scores. But for cognitive phenotypes that's very unlikely, and attempting to show genetic effects in such samples seems a doomed enterprise.

The script for this simulation, simulating genopheno cutoffs.R should be available here: 

(This link updated on 29/4/17).

Sunday, 5 March 2017

Advice for early career researchers re job applications: 1. Work 'in preparation'

Image from: https://fdudhwala.wordpress.com/2017/01/05/first-time-publishing-woerries/
I posted a couple of tweets yesterday giving my personal view of things to avoid when writing a job application. These generated a livelier debate than I had anticipated, and made me think further about the issues I'd raised. I've previously blogged about getting a job as a research assistant in psychology; this piece is directed more at early career researchers aiming for a postdoc or first lectureship. I'll do a separate post about issues raised by my second tweet – inclusion of more personal information in your application. Here I'll focus on this one: 
  • Protip for job applicants: 3+ 1st author 'in prep' papers suggests you can't finish things AND that you'll be distracted if appointed

I've been shortlisting for years, and there has been a noticeable trend for publication lists to expand to include papers that are 'in preparation' as well as those that are 'submitted' or 'under review'. One obvious problem with these is that it's unclear what they refer to: they could be nearly-completed manuscripts or a set of bullet points. 
My tweet was making the further point that you need to think of the impression you create in the reader if you have five or six papers 'in preparation', especially if you are first author. My guess is that most applicants think that this will indicate their activity and productivity, but that isn't so. I'd wonder whether this is someone who starts things and then can't finish them. I'd also worry that if I took the applicant on, the 'in preparation' papers would come with them and distract them from the job I had employed them to do. I've blogged before about the curse of the 'academic backlog': While I am sympathetic about supporting early researchers in getting their previous work written up, I'd be wary of taking on someone who had already accumulated a large backlog right at the start of their career.

Many people who commented on this tweet supported my views:
  • @MdStockbridge We've been advised never to list in prep articles unless explicitly asked in the context of post doc applications?. We were told it makes one looks desperate to "fill the space."
  •  @hardsci I usually ignore "in prep" sections, but to me more than 1-2 items look like obvious vita-padding
  • @larsjuhljensen "In prep" does not count when I read a CV. The slight plus of having done something is offset by inability to prioritize content.
  • @Russwarne You can say anything is "in preparation." My Nobel acceptance speech is "in preparation." I ignore it.
  • DuncanAstle I regularly see CVs with ~5 in prep papers... to be honest I don't factor them into my appraisal.?
  • @UnhealthyEcon I'm wary if i see in-prep papers at all. Under review papers would be different.
  • @davidpoeppel Hey peeps in my labs: finish your papers! Run -don't walk -back to your desks! xoxo David. (And imho, never list any in prep stuff on CV...)
  • @janhove 'Submitted' is all right, I think, if turn arounds in your field are glacial. But 'in prep' is highly non-committal.

Others, though, felt this was unfair, because it meant that applicants couldn't refer to work that may be held up by forces beyond their control: 
  • @david_colquhoun that one seems quite unfair -timing is often beyond ones's control
  • @markwarschauer I disagree completely. The more active job applicants are in research & publishing the better.
  • @godze786  if it's a junior applicant it may also mean other authors are holding up. Less power when junior
  • @tremodian All good except most often fully drafted papers are stuck in senior author hell and repeated prods to release them often do nothing.
 But then, this very useful suggestion came up:  
  • @DrBrocktagon But do get it out as preprint and put *that* on CV
  • @maxcoltheart Yes. Never include "in prep" papers on cv/jobapp. Or "submitted" papers? Don't count since they may never appear? Maybe OK if ARKIVed
The point here is that if you deposit your manuscript as a preprint, then it is available for people to read. It is not, of course peer-reviewed, but for a postdoc position, I'd be less interested in counting peer-reviewed papers than in having the opportunity to evaluate the written work of the applicant. Preprints allow one to do that. And it can be effective:
  • @BoyleLab we just did a search and one of our candidates did this. It helped them get an interview because it was a great paper
But, of course, there's a sting in the tail: once something is a preprint it will be read by others, including your shortlisting committee, so it had better be as good as you can get it. So the question came up, at what point would you deposit something as a preprint? I put out this question, and Twitter came back with lots of advice:
  • @michaelhoffman Preprint ≠ "in prep". But a smart applicant should preprint any of their "submitted" manuscripts.?
  • @DoctorZen The term "pre-print" itself suggests an answer. Pre-prints started life as accepted manuscripts. They should not be rough drafts.
  • @serjepedia these become part of your work record. Shoddiness could be damaging.
  • @m_wall I wouldn't put anything up that hadn't been edited/commented by all authors, so basically ready to submit.
  • @restokin If people are reading it to decide if they should give you a job, it would have to be pretty solid. 
All in all, I thought this was a productive discussion. It was clear that many senior academics disregard lists of research outputs that are not in the public domain. Attempts to pad out the CV are counterproductive and create a negative impression. But if work is written up to a point where it can be (or has been) submitted, there's a clear advantage to the researcher in posting it as a preprint, which makes it accessible. It doesn't guarantee that a selection committee will look at it, but it at least gives them that opportunity.