I had thought the saga of climate science critic Edward Wegman and the various allegations of misconduct in his recent work could not possibly get any more bizarre, especially in the wake of manifestly contradictory findings in two recently concluded investigations at George Mason University.
But in a shocking new development, it turns out that two problematic overview articles by Wegman and his protege and congressional report co-author Yasmin Said in Wiley Interdisciplinary Reviews: Computational Statistics (WIREs CS), have been completely revised. Those revisions saw the removal or rewriting of massive swathes of copy-and-paste scholarship, as well as correction of many errors identified by myself and others. In each case, the comprehensive revisions came “at the request of the Editors-in-Chief and the Publisher”, following complaints to Wiley alleging wholesale plagiarism. But Wegman and Said also happen to be two of the three chief editors of WIREs CompStat, thus raising compelling concerns of conflict of interest, to say the least.
In fact, it is very clear that Wiley’s own process for handling misconduct cases was egregiously abused in favour of a face-saving “redo” manoeuvre. And this latest episode raises disturbing new questions about the role of the third WIREs CS editor-in-chief (and “hockey stick” congressional report co-author) David Scott, and indeed Wiley management itself, in enabling the serial misconduct of Wegman and Said.
BACKGROUND AND INTRODUCTION: A PAIR OF DUBIOUS PAPERS
Readers are no doubt be aware of dubious scholarship in the infamous 2006 Wegman Republican-commissioned report on paleoclimatology (not to mention shoddy analysis) and a follow-up 2008 article in Computational Statistics and Data Analysis by Said, Wegman and two others (retracted last year). Those two works were the subject of contradictory decisions in recently concluded misconduct proceedings at George Mason University. (Weirdly, even though the CSDA paper was retracted last year, and the plagiarism finding was upheld by GMU, the paper is still listed in Wegman’s recent publications as if nothing has happened).
However, in 2011 the same pattern of error-ridden and lightly edited copy-and-paste scholarship was discovered in two long review articles by Wegman and Said in WIREs CS (which, as previously mentioned, they also co-edit along with David Scott). In order of exposure, these were:
- “Color Theory and Design” (2011): Discussion part 1 & part 2 with side-by-side analysis.
- “Roadmap for Optimization” (2009): Discussion and side-by-side analysis.
Both papers were the subject of separately filed complaints to Wiley within a short time of the discovery of apparent plagiarism. The complaint timelines are outlined in John Mashey’s just-released study of various investigations at GMU, See No Evil at George Mason University (see section 5.2 on page 25 of the full report).
Mashey’s chronology also makes clear that little substantive response has been received from Wiley, leaving the impression that no action had been taken. But in each case, the complaint was apparently handled by allowing a complete”redo” of the questionable paper that removed all traces of plagiarism and, in the case of “Roadmap” added dozens of new citations, as well as many references.
In the two following sections, I’ll briefly describe the chronology of events for each paper, and give examples of some of the wholesale changes made in attempted remediation. Then in the concluding section, I will look at the troubling issues raised by this sorry episode, and point at some tentative recommendations for resolving the resulting fiasco.
EVOLUTION OF “COLOR DESIGN AND THEORY”
On March 26, I published the first of a two part series on Wegman and Said’s 2011 article ,”Color Theory and Design”. There I showed that much of the article showed evidence of flow-through copy-and-paste of material in three 2000 websites used extensively in earlier Wegman lectures, as seen in the following diagram:
Two days later, John Mashey filed a complaint to Wiley pointing to my initial discussion, as well as to my detailed paragraph-by-paragraph analysis (snapshot below, showing identical text from unattributed antecedents in cyan, and trivial changes in yellow).
In this case, a paragraph, a table and a chart of secondary primaries were copied from an obscure (and no longer existing) web page by Ted Park into Wegman’s 2002 color theory lecture, and from there to the 2011 article with very little change. Similarly, other passages from this and two other websites found their way into the article via Wegman’s color theory lecture, augmented by unattributed passages and diagrams from Wikipedia. Some latter passages in “Color Theory and Design” did acknowledge generally Marc Green’s work (which is still available on the internet), but even here there were several unacknowledged block quotes and lightly paraphrased passages; unattributed antecedents were discovered on 13 pages of the 15 page article.
Part 2 of my discussion pointed out that 12 of 17 diagrams were not attributed (these were either from Wikipeida or the antecedent 2000 websites). In all, 12 unattributed article antecedents were discovered and documented. Only 17 references were provided (WIREs own guidelines call for 50-100); of these, 10 appear to have been incorporated from Wikipedia. As I wrote at the time:
This pattern strongly suggests that these are not bona fide references, and are simply padding and obfuscation. Meanwhile, of course, the real Wikipedia sources are unacknowledged.
Over the ensuing months, Mashey received little substantive response from Wiley concerning these well documented problems. But behind the scenes, something was going on.
As I discovered two weeks ago , at some point it was decided by Wiley and the WIREs CS team to solve the problem by mandating a complete rewrite of the article.
The new abstract gives a broad hint:
This article, first published online on February 4, 2011 in Wiley Online Library (http://www.wileyonlinelibrary.com), has been revised at the request of the Editors-in-Chief and the Publisher. References and links have been added to aid the reader interested in following up on any technique.
All of the obviously copied material has either been removed, revised or (in the case of several diagrams) finally attributed. For example, the above passage and the diagram from Ted Park have both been removed, while the table has been reformatted and attributed.
Of course, attribution to an old website is itself curious; in this case, the only choice is Ted Park’s website at Beer.org as stored at Archive.org, certainly a fairly unique reference in a scholarly journal.
Park T. 2001. Available at: http://replay.waybackmachine.org/20011217234921/http:/www.beer.org/∼tpark/color.html. (Accessed April 19, 2011).
This also suggests that the revision may have been underway as early as April of last year (just weeks after my original post). However, the “creation date” of the current PDF version of the article is dated December 2011, so the exact timeline of revision and acceptance by Wiley is very imprecise.
Anyway, I’ve checked a few other “cyan” passages, and they all show a similar pattern.
This is not to say the article is much improved in other respects, and its relevance to the subject of computational statistics remains woefully unclear, Despite the post hoc assertion that the emphasis is “on use in statistical,
scientific, and data visualization”, the subject of the article remains inexpertly explained and sourced.
And at least one major error, apparently based on a misunderstanding of Marc Green, is still uncorrected. Green’s original read:
Moreover, the elderly have difficulty discriminating colors which differ primarily in their blue content: blue-white, blue-gray, green-blue green, red-purple, etc.
Wegman and Said’s mangled rendition renamed the difficult-to-distingush pairs (e.g red vs purple) into mixed color descriptions (e.g. magenta is a mix of red and purple)! Thus, the original point was completely obscured.
Of course, colors that have a blue component will shift in the perception of the elderly so that cyan, blue-gray, light blue, magenta will be affected and more difficult to distinguish.
This wrongly implies that the elderly will have difficulty distinguishing light blue, say, from magenta. The revised version is not much better, despite the dropping of the final phrase.
Of course, colors that have a blue component will shift in the perception of the elderly so that cyan, blue-gray, light blue, magenta will be affected.
The acknowledgments have seen an important change as well. The original stated the article was based on previous lectures, but only acknowledged the one still existing antecedent (out of five or six).
This article is based on lectures given by one of us (E.J.W.) in graduate courses in Statistical Data Mining and in Scientific and Statistical Visualization. Much of the discussion in the Section on Color Deficiencies in Human Vision and the Subsection on Hard-Wired Perception is based on material in Green (2004). The inspiration of Marc Green is hereby gratefully acknowledged.
Now this has been revised in order to remove any reference to the problematic lectures.
As with any overview article, this discussion was synthesized from many sources including the cited Wikipedia articles. Early discussion in the sections on Human Visual System and Color Theory were based on Park and Eastman Kodak which are now no longer directly accessible. Much of the discussion in the section on ‘Color Deficiencies in Human Vision’ and the subsection on ‘Hardwired Perception’ is based on material in Green. The inspiration of Marc Green is hereby gratefully acknowledged.
Thus, the true main antecedents have now finally be acknowledged, but the new version leaves the reader clueless as to why such old and ephemeral sources were used in the first place.
A TWISTED “ROADMAP TO OPTIMIZATION”
In the case of the flagship “Roadmap” article (the very first article to appear in WIREs CS at its launch 2009), the chronology is much more complex. Not long after I published my analysis of “Color Theory and Design”, I was advised by SFU professor Ted Kirkpatrick that the “Roadmap” article contained similar problems. According to him, a key passage on the simplex method was derived from Wikipedia (an assertion easily confirmed) and that at least one or two other passages were suspect, not to mention incorrect.
A few days later, Kirkpatrick advised me that he had decided to launch a complaint to Wiley based on his preliminary analysis, at which point we terminated discussion of the matter. I now have found out that John Mashey had advised Wiley of this development in late April 2011, followed by the more extensive communication from Ted himself a couple of weeks later.
Meanwhile, I launched my own independent examination of “Roadmap” (after kicking myself for not paying heed to a comment months earlier from “Amoeba”, who pointed out that the article opening appeared to come from Wikipedia). It was not until October 2011 that I posted my discussion and analysis, as summarized at the time.
- No fewer than 15 likely online antecedent sources, all unattributed, have been identified, including 13 articles from Wikipedia and two others from Prof. Tom Ferguson and Wolfram MathWorld.
- Numerous errors have been identified, apparently arising from mistranscription, faulty rewording, or omission of key information.
- The scanty list of references appears to have been “carried along” from the unattributed antecedents; thus, these references may well constitute false citations.
Unattributed antecedents were summarized in the following table (as found on p.3 in the complete analysis document Suboptimal Scholarship: Antecedents of Said and Wegman 2009).
As can be seen, all sections were derived at least in part from unattributed antecedents, and the first eight pages were almost all derivative.
Here are two of the most extreme examples, once again with copied material in cyan and trivially edited in yellow. The first shows a comparison of the section on Kusher-Kuhn-Tucker conditions and the corresponding part of the Wikipedia article on KKT (Suboptimal Scholarship, p. 14).
The second example is on linear programming, much of which came from an online article by Tom Ferguson (Suboptimal Scholarship, p. 21).
Thirteen egregious errors were also identified, nine of which were introduced by mistranscription or else misunderstanding of the lightly edited copied material (see p. 4-5 of Suboptimal Scholarship).
My original post discussed three of these in detail, including the infamous “2d (or, not 2d) ” howler where the Wikipedia superscript was apparently rendered as a regular character.
… the simplex method visits all 2d vertices before arriving at the optimal vertex.
The original has:
the simplex method … visits all 2n vertices before arriving at the optimal vertex.
And all this time I thought a cube had eight vertices, not six. Who knew!
Other gems included:
- “Mathematical programming is the simplest case of optimization.” [This is a mangling of Wikipedia; in the given context, "mathematical programming" is a synonym for "optimization".]
- “The conjugate gradient method is a recursive
numerical method.” [The Wikipedia version has "iterative"; Said & Wegman use these two terms as interchangeable synonyms.]
- “2. Optimally solve the subproblems using the three step process iteratively.”[Here we have the opposite problem - the middle step of dynamic programming is recursive, as in the Wikipedia original, not iterative.]
Once again, a recent visit to WIREs CS revealed wholesale changes to the original “Roadmap” article.
This article, first published online on July 13, 2009 in Wiley Online Library(http://www.wileyonlinelibrary.com), has been revised at the request of the Editors-in-Chief and the Publisher. References and links have been added to aidthe reader interested in following up on any technique.
This time, the abstract itself contained significant changes. First, the original:
This article focuses broadly on the area known as optimization. The intent is to provide in broad brush strokes a perspective on the area in order to orient the reader to more detailed treatments of specific subdisciplines of optimization throughout WIREs: Computational Statistics. In this articlewe provide background on mathematical programming, Lagrange multipliers, Karush-Kuhn-Tucker Conditions, numerical optimization methods, linear programming, dynamic programming, the calculus of variations, and metaheuristic algorithms.
And now the new abstract:
This article is intended as a broad overview of optimization.While often considered as a subset of operations research, optimization is a central concept for statistical theory, e.g., maximum likelihood, least squares, minimum entropy, minimum loss and risk, and so on. As data set sizes become larger, the computational framework of optimization becomes more important. In this article we cover mathematical programming, linear programming, dynamic programming, calculus of variations, and metaheuristic methods.
This appears to be an attempt to justify the treatment of optimization within the rubric of computational statistics. It is still not clear, however, whether all of the optimization methods discussed have much applicability to statistics , although certainly some do.
The article itself has been almost completely rewritten. The most egregious cases of block copying have either been removed altogether (e.g. Tom Ferguson on the simplex method), or greatly reduced and referenced (e.g. Table 1 on KKT conditions). The latter has been reformatted, with the corresponding Wikipedia reference cited a at the beginning of the section, along with three others (though the actual antecedent is still clearly Wikipedia).
Indeed, the number of citations has grown from a mere seven to more than a hundred, including all the previously missing Wikipedia references! As for the obvious errors, the ones I checked have been more or less fixed, or else avoided altogether. In general, though, the article is still somewhat vacuous and aimless.
The timing of the revision shows a similar pattern to that of “Color”. All online references (mainly Wikipedia and WolframMathWorld) were accessed on July 27, 2011. However, as in the case of “Color”, the published PDF is dated December 2011, so again the intervening timeline is unclear.
The original contained no acknowledgment section, but the new version has the following:
As with any overview article, this discussion was synthesized from many sources including the cited Wikipedia and Mathematica articles. There is no intent in this article to claim that this article represents original research work on our part, but this article is offered with the intent of providing the Roadmap to the field. We are grateful to the two external referees who reviewed this article and whose suggestions have much improved the discussion.
It’s certainly true that “any overview article” would be “synthesized from many sources”. However, I very much doubt that Wikipedia is commonly cited in such articles. And the massive copy-and-paste previously in evidence can not simply be waved away by the assertion that “[T]here is no intent in this article to claim that this article represents original research work on our part”.
Even more interesting, though, is the reference to external referees. Since the article is so greatly changed, presumably they reviewed the revised version, not the original, leading to questions about the original process in place at WIREs CS. And with that, I will now move on to the editorial process at WIREs CS, including the handling misconduct complaints.
EDITORIAL PROCESS AT WILEY
It should be clear at the outset that the editorial process relies on clear separation between the roles of editor and author. Although perhaps not the norm, it certainly is not extremely exceptional for editors of specialized journals to also write in those journals. However, there is always an implicit or explicit expectation that in such cases, the articles will be overseen by another editor.
In the case of WIREs CS, the only other available editor was David Scott, and so I have presumed that he must have overseen the original articles by Wegman and Said, and led the subsequent complaint procedure.
Aside from the copy-and-paste problems, my original analyses showed errors and generally shoddy scholarship, accompanied by a poor understanding of the subject area. The original peer review of these articles failed to find even those problems, leading to the inevitable conclusion that either Scott’s choice of peer reviewers was wholly inadequate, or that the articles were not peer reviewed at all.
The handling of the two plagiarism complaints by Scott and Wiley was inexcusable, especially given Wiley’s comprehensive ethics policy. Here is the process laid out by Wiley for the handling of plagiarism complaints concerning previously published work (with original flow chart found here).
The first thing to notice is that Wiley recommends that instructions to authors should include a “definition of plagiarism and the journal’s policy on it”. The WIREs CS author guide contains nothing on the subject, although this appears to be true of all the WIREs journals.
The first check point of interest is “Check degree of copying”. Here there are two possible responses: “Clear plagiarism” and “minor copying of short phrases”. As we have seen above, the evidence is overwhelmingly in favour of the first; were this not the case, massive revisions would not have been necessary to remove all evidence of copy-and-paste material, and the matter could have been handled by “publishing correction giving reference to original paper(s)”.
That leaves two possible characterizations of responses to “clear plagiarism”:
- Unsatisfactory explanation/admits guilt
- Satisfactory explanation (honest error/journal instructions unclear/very junior researcher)
Of the three bases for “satisfactory explanation” the only one that can possibly obtain is “honest error” (given the involvement of two of the journal’s own editors). But, here again, the weight of the sheer magnitude of copying, the addition of deliberate slight edits and the failure to attribute almost all of the actual sources makes this explanation a non-starter. The only way that this pattern can possibly be ascribed to “honest error” is if Wegman and Said truly do not understand the very concept of plagiarism. This level of delusion can not be ruled out, but nor can it be reasonably anticipated or accommodated by a research misconduct policy.
Also note that within the Wiley framework, the only actions possibly engendered by a “satisfactory explanation” are:
- Write to author (all authors if possible) explaining position and expected future behaviour.
- Inform readers/victim(s) of outcome and action.
So even if, against all logic and evidence, Wiley arrived at a finding of “honest error”, the mandated information to readers was not provided.
Of course the most obvious limb of the flow chart is the one following “unsatisfactory explanation/admits guilt”. Here the actions to “consider” or perform include:
- Consider publishing retraction.
- Inform editors other journal(s) involved or publisher of plagiarized books.
- Consider informing author’s superior and/or person responsible for research governance at author’s institution.
There is no possible justification in any of the above for the course actually chosen by Wiley, which was to mandate massive revisions that in effect allowed all hint of plagiarism to be covered up, with no admission of problems.
And a particularly disturbing aspect is the statement that the two revisions came at the request of all three editors in chief (in the plural). This could be interpreted to mean that Scott and Wiley improperly allowed Wegman and Said to play some role in the decision making process. If so, that was clearly unacceptable and an egregious conflict of interest.
It is now crystal clear that the research misconduct procedures went seriously awry at Wiley. The only possible response is to have an outside review of the whole matter, excluding the three WIREs CS editors and anyone else from Wiley who may have been involved in the original plagiarism response.
As for David Scott, he has clearly failed to exercise proper oversight on the work of Wegman and Said. An offer to resign from WIREs CS should not be out of the question.
It also high time for David Scott to take some responsibility with regard to the Wegman Report. Rumour has it that his involvement may have been minimal. If he had nothing to do with the copy-and-paste scholarship of the background portions (see analysis 1, 2, 3, 4) or supplementary sections of the report, about 35 pages in all, or the incompetent and biased analysis, he should do the honourable thing.
David Scott should remove his name from the list of authors and disavow the Wegman Report once and for all.