Wegman report update, part 2: GMU dissertation review

By Deep Climate

Several posts in past months have highlighted highly questionable scholarship in the 2006 Wegman report on the “hockey stick” temperature reconstruction (and revelations of much more will come soon, with  the imminent release of John Mashey’s massive analysis). Today I present yet another analysis of background material of “striking similarity” to antecedents, this time found in a trio of dissertations by recent George Mason University PhD students under the supervision of Edward Wegman.

Wegman Report co-author Yasmin Said’s 2005 dissertation on the “ecology” of alcohol consumption  appears to presage some of the questionable scholarship techniques employed in the Wegman Report.  And later dissertations from two other Wegman proteges, Walid Sharabati (2008) and Hadi Rezazad (2009), both have extensive passages that follow closely Wegman Report’s social networks background section, which in turn  is based on unattributed material from Wikipedia and two widely used text books. Thus, as in the case of Donald Rapp, there appears to be serial propagation of unattributed, “striking similar” material. Astonishingly, all three Wegman acolytes were honored with an  annual GMU award for outstanding dissertations in statistics and computational science.   However, a closer look betrays not only scholarship problems in the work, but clear failure in the PhD supervision process itself.

It may also be that some heat is being felt behind the scenes. For one thing, Said’s 2005 dissertation was recently deleted from the George Mason University website. And around the same time, most traces of Said’s eye-opening presentation on the Wegman panel process [PDF] were also deliberately removed. That appears to be a clumsy attempt to cover up embarrassing details about the U.S. House Energy and Commerce Committee 2005-2006 climate investigation, including the key role of Republican  staffer Peter Spencer, Representative “Smoky” Joe Barton’s long time point man on climate change issues. (These disappearances were pointed out to me by the ever-vigilant John Mashey).

Before diving into the details of Wegman’s proteges, here is a statistical summary of the material in section 2 of the Wegman report, found to be “strikingly similar” to various unattributed sources.

Section (with links) WC SS ID+TC ID BL
2.1 Tree rings [Cmp] 703 67% 51% 38% 2.9
2.1 Ice cores & corals [Cmp] 335 93% 90% 65% 4.8
2.2 PCA and stats [Cmp] 1200 32% 28% 26% 4.3
2.3 Social networks [Cmp] 2351 87% 85% 76% 8.8

The section column has links to each relevant post, along with links to the detailed side-by-side comparisons. The other columns list total word count (WC), along with percentage of identical words (ID), identical plus trivially changed words (ID+TC), and overall percentage of “strikingly similar” material (SS).  The final column gives the average block length (BL), i.e. average length of all exactly identical phrases.

Yasmin Said: In the beginning

Yasmin Said’s 2005 dissertation, AgentBased Simulation of Ecological Alcohol Systems, seeks to “establish a modeling framework for alcohol abuse that allows evaluation of interventions meant to reduce adverse effects of alcohol overuse without the financial, social and other costs of imposing interventions that are ultimately ineffective (or even simply not cost effective)”  [Cached Google doc version].

I’d skimmed the dissertation and had the impression of a less than fully-developed concept with little practical application. As there was no apparent connection to the Wegman report, I gave it no further consideration.

But a recent comment from “terry” got me thinking about the Said dissertation in a new way.

Previously Prof. D. Climate found that many strikingly similar passages in the Wegman Report are in the background sections. These passages are characterized by minor alterations to large blocks of text copied from other sources together with insufficient attribution. With this in mind, I had a quick look at Said’s Ph.D. thesis, “AGENT-BASED SIMULATION OF ECOLOGICAL ALCOHOL SYSTEMS” and observed a similar pattern of strikingly similar text.

“Terry” went on to note that Said’s background section on alcohol “follows both the structure and phrasing” of the web page Chemical of the Week: Ethanol by University of Wisconsin professor Bassam Shakhashiri  “extremely closely”.

Indeed, all five pages of Said’s section 1.1 appear to be derived from Shakhashiri’s web page on ethanol, as evidenced in the detailed,  side-by-side comparison.

Let’s look at a few examples, using the same conventions as I introduced in my latest summary of dubious scholarship in the Wegman report. Identical wording is highlighted in cyan, and trivial changes are in yellow; issues created by changes are underlined.

Shakhashiri, paragraph 3:

Starches from potatoes, corn, wheat, and other plants can also be used in the production of ethanol by fermentation.

And Said has (at p. 7):

Starches supplied  from corn, potatoes, and other wheat plants including barley can also yield ethanol through fermentation.

Notice how many of the same words have been used, but have now been split up into single words or two-word phrases. This reworking has even introduced an obvious error as the revised sentence has classified potatoes and corn as forms of a mysterious group called “wheat plants”.

Shakhashiri concludes that same paragraph:

Thus, the germination of barley, called malting, is the first step in brewing beer from starchy plants, such as corn and wheat.

Said has slightly reworked this:

the germination of barley, which is therefore required to be the first step in producing alcohol from starchy plants (Petrucci, 2001; Shakhashiri, 2005).

Here the rephrasing has introduced yet another error. Of course, there are other methods of producing alcohol from starchy plants. The germination of barley is used in the production of beer, but not in production of vodka from potatoes . And this paragraph ends with a strange double citation, a pattern repeated for some of the paragraphs in this section, although most paragraphs have no citation at all.

Here are a couple more examples, both of which appear in paragraphs without any citation.

Shakhashiri: Ethanol acts as a drug affecting the central nervous system.

Said: The central nervous system is significantly affected by the consumption of ethanol.

Shakhashiri: Most people begin to show measurable mental impairment at around 0.05 percent blood alcohol.

Said: Impairment of brain functions for most people begin to become noticed at around a blood alcohol percentage of 0.05

Interestingly, Said decided to remove the one explicit reference to alcohol acting as a drug, even though this fact would appear to be at the center of the motivation for research, and is discussed at length in the abstract.

The final example comes in the penultimate sentence (once again, this final paragraph is unattributed).

Shakhashiri: Above 0.5 percent, the breathing center of the brain or the beating action of the heart can be anesthetized, resulting in death.

Said: at a level of 0.5 percent the brain’s breathing center and the pumping of the heart can become anesthetized resulting in their impediment.

Here, the order of the identical wording has been preserved. But the amount of trivial changes (no less than six!) is truly astonishing:

  • Above  ->  at a level of
  • of the brain  ->  the brain’s
  • or  ->  and
  • beating action  ->  pumping
  • be  ->  become
  • death  ->  their impediment

And what can say about that last change? The reference to the mortal consequences of alcohol overdose has been transformed into “impediment” of vital organs. It’s doubtful that this should be construed as extreme scholarly understatement, but rather appears to be yet another example of error introduced by the poorly conceived changes. In any case, it’s a howler for the ages (once again, hat tip to John Mashey).

The GMU Writing Center weighs in

Overall, nearly 50% of the wording in Said is identical to the Shakhashiri source, with another 20% involving trivial changes of the sort highlighted in yellow above, and the rest consisting expanded verbiage that adds little of substance.

Clearly this section falls well short of the expected standard for summarization or paraphrasing, at least according to the extensive explanation of plagiarism available from the George Mason University Writing Center.

What is paraphrasing, and how do I do this?

First, for paraphrasing, a good idea is to read the original, make sure that you understand it, lay it aside, and then write it down in your own words imagining that you are explaining it to someone who will read your paper. If you are having trouble putting it into your own words, then you probably don’t understand it well enough to write about it. When you are finished, cite the author according to the style you are using.

Always remember, borrowing (both language and syntax) too heavily from a source, even if you cite it, is plagiarism. A good thing to keep in mind is to use no more than two of the author’s original words.

Clearly, Said did not follow any of this sage advice. Here is a telling example given by the GMU Writing Center. First, here is the original:

The park [Caspers Wilderness Park] was closed to minors in 1992 after the family of a girl severely mauled there in 1986 won a suit against the county. The award of $2.1 million for the mountain lion attack on Laura Small, who was 5 at the time, was later reduced to $1.5 million. – Reyes and Messina, “More Warning Signs,” p. B1.

And now here is one attempted example at paraphrase that falls short, with identical and trivially changed wording marked in the same manner as I have done above:

Reyes and Messina report that Caspers Wilderness Park was closed to children in 1992 after the family of a girl brutally mauled there in 1986 sued the county. The family was ultimately awarded $1.5 million for the mountain lion assault on Laura Small, who was 5 at the time (B1).

This example has a proper citation and is rather short. Yet it still is a clear example of unacceptable paraphrasing rising to plagiarism.

In Said’s case, the entire section of five pages consists of a similar sort of supposed paraphrasing strung together from one source. The source is cited at the end of some of the paragraphs, but not the majority of them.

And an extra source has also been given in each case, Ralph Petruicci’s 2001 General Chemistry: Principles and Modern Applications. But since Said’s entire section can be traced sentence by sentence to the Shakhashiri web page, this citation is almost certainly bogus.

Moreover, Said has made no effort to distill, as it were, her chosen source into a summary of material actually relevant to her own study, choosing instead to indulge in what “terry” rightly called a “minimal rewrite” of the entire original article.

All of this eerily presages the Wegman Report background sections, where, for example, section 2.3 on social networks contained much copied unattributed material that was clearly irrelevant (or at least not adduced in any way) in Wegman et al’s use of SNA to characterize co-author relationships in paleoclimatology.

Nevertheless, the possibility exists that Said (or even Wegman himself) does not understand that this kind of “scholarship” is entirely unacceptable. That would be very sad indeed.

Walid Sharabati joins the team

At the time of the Wegman report, Walid Sharabati was an up-and-coming Wegman protege; after receiving his PhD he went on to  a position at Purdue University. Although he was not involved in the report itself, Wegman tapped him to  produce an appendix to Wegman’s supplementary congressional testimony, a written response to supplementary questions from Rep. Bart Stupak  [PDF 2.7 Mb].

In the original report, Wegman et al had claimed that “hockey stick” author Michael Mann’s wide social network of co-authors, including several connected co-author “cliques”, implied that these same co-authors were reviewing each others’ work, to the detriment of the quality of paleoclimatology research. Wegman’s response to Stupak further claimed that Wegman’s own social network of co-authors (many of whom were ex-students) exhibited a so-called “mentor-scholar” style of co-authorship that was much less prone to this problem than Mann’s “entrepreneurial” style.

The claim seems absurd on its face, as Mann’s seminal (if controversial) 1998 and 1999 papers were written before he had ever collaborated with most of the co-authors in his “social network”.

Nevertheless, Sharabati’s appendix analyzed Wegman’s co-authorship network at length. The credibility of the whole exercise was hardly enhanced by Sharabati’s claim:

Of all the work that has been done on social networks, very few investigators have considered coauthorship network. Therefore, what we are about to observe in this paper is a brand new approach in the social networks field.

In fact, a Google scholar search shows dozens of hits in the literature for the combined terms “social network analysis” and “co-author analysis”.

Eventually, this highly speculative and weak social network analysis of co-author relationships was developed into an article and submitted to the journal Computational Statistics and Data Analysis in mid-2007. Lead author Said was joined by Wegman, Sharabati, and yet another Wegman protege, John Rigsby, who had performed the original analysis of Mann’s co-author network. Social Networks of Author–Coauthor Relationships was published in early 2008.

As noted in my previous discussion of the social network background section in the Wegman report, the CS&DA article’s background section was a reduced version of that found in the Wegman report. In turn, the Wegman et al background section was clearly copied from three unattributed sources (as seen in the complete and updated side-by-side comparison):

  • Wikipedia article on Social Networks (2006 version)
  • Wasserman and Faust’s seminal Social Network Analysis: Methods and Applications (1994)
  • Exploratory Social Network Analysis with Pajek by W. de Nooy, A. Mrvar and V. Batagelj (2005)

Not only that, but the background material was largely irrelevant to the article’s analysis, and the analysis no less fatuous and unconvincing than the original found in Wegman et al.

Yet despite these problems, not to mention the veritable paucity of citations, the article sailed through peer review in six days! Perhaps Wegman’s presence on the journal’s advisory board and Said’s previous tenure as an editor had something to do with the lightning acceptance. In any event, as John Mashey has pointed out, the article and its history stand out as an excellent example of a self-refuting paper.

Perhaps less noticed at the time was a comment where I discussed Sharabati’s subsequent dissertation, submitted a year later, entitled Multi-Mode and Evolutionary Networks.  It contained a social network background section virtually identical to that found in Said et al (itself reduced from Wegman et al). I have since updated the relevant side-by-side comparison to incorporate the colour highlighting scheme and to show the minimal differences between Said et al and Sharabati’s section 1.1, as well as their clear derivation from the unattributed original sources.

In fact, I have identified only two small changes in the actual text of Sharabati’s section 1.1, compared to the earlier Said et al.

First, Sharabati’s version has no attribution whatsoever, just like the Wegman material it was based on, whereas Said et al has added a citation of Granovetter (1973) in a section discussing “weak ties”.

The second small difference is quite telling. Here is a passage mostly identical to Wasserman and Faust:

Social ties link actors to one another. The range and type of social ties can be quite extensiveA tie establishes a linkage between a pair of actors. Linkages are represented by edges of the graph. Examples of linkages include the evaluation of one person by another (such as expressed friendship, liking, respect), transfer of material resources ( such as business transactions, lending or borrowing things), association or affiliation (such as jointly attending the same social event or belonging to the same social club), behavioral interaction (talking together, sending messages), movement between places or statuses [states] {statues} (migration, social or physical mobility), physical connection (a road, river, bridge connecting two points), formal relations such as authority and biological relationships such as kinship or descent.

As before, we see the same pattern of identical text (highlighted in cyan), interspersed with truly trivial changes (highlighted in yellow), all carried over from Wegman et al.  The phrase “for example” was carefully changed to “such as” four times. That suggests that Said may have done the original rendition found in the Wegman report.

However, Wegman et al had the nonsensical “movement between places or statues” instead of Wasserman and Faust’s “movement between places and statuses” (a mistake treated with much-deserved derision last time I discussed this).

In Said et al, this was “corrected” to read “movement between places and states” (shown in square-brackets above). But notice that in Sharabati’s dissertation this has reverted back to “statues” (in curly-brackets).

One plausible explanation is that Sharabati performed the reduction from Wegman et al for Said et al, and then used his own reduction in his dissertation. Meanwhile, someone else, possibly Said herself, added the Granovetter citation and attempted to correct the obvious error in “statues”.

Sharabati did redeem himself somewhat by omitting the Said et al passage on centrality and writing his own section on this concept, properly citing Wasserman and Faust and even relating the section to susbequent analysis.

Nevertheless, Sharabati’s section 1.1 is noteworthy as a possible example of apparent triple serial plagiarsim, a unique occurrence to my knowledge. And it’s also important to observe that in this case, both Wegman and Said played the role of mentor, as they acted as Sharabati’s joint dissertation advisors. Indeed, it is hard to avoid the conclusion that the supreme weakness of the “mentor-scholar” style of co-authorship, at least as practiced at at Computational Data Sciences department of GMU,  has now been exposed.

And then there were three: the curious case of Hadi Rezazad

After “terry’s” discovery of dubious scholarship in Yasmin Said’s dissertation, John Mashey pointed out a list of prize winners of the verbosely named Center for Computational Statistics/Computational Data Sciences Outstanding Ph.D. Dissertation Award. The most recent recipient of the honour was Hadi Rezazad, following in Said’s and Sharabati’s foot steps, for the 2009 dissertation entitled  Enhancement of Network Robustness and Efficiency through Evolutionary Computing, Statistical Computation and Social Network Analysis.

Here is the key passage of the abstract:

Through this work, I develop a novel method to assess and improve the robustness and efficiency of computer networks. This method uses computer network analysis, social network analysis, evolutionary computing, statistical methods, and graph theory. Specifically, my aim is to achieve enhanced network robustness and efficiency with a primary focus on architecture and topology of networks.

In the scholarly literature, social network analysis has generally been applied to computer networks in such intuitively apt ways as studies of collaboration via the internet, or development of security protocols based on trust.

In Rezazad’s chosen “primary focus” of network architecture and toplogy, the application of SNA is less obvious, but has occasionally been used. For example, the concept of betweenness centrality informed the discussion of “attack-robust network topology” in Holme et al’s Attack vulnerability of complex networks (2002), as well as the development of optimized network traffic filtering strategies in Bloem et al’s Malware Filtering for Network Security Using Weighted Optimality Measures (2007).

Curiously, though, the only article cited as a previous example of SNA applied to Rezazad’s focus is a brief article from d’Ambrosio and Birmingham, Achieving agent coordination via Distributed Preferences. As Rezazad explains:

A research effort (D’Ambrosio, et al, 1996) for applying social network analysis to Local Area Network (LAN) topologies takes a close look at the network activity, betweenness, and centrality. Here, a link between a pair of nodes represents a bidirectional information flow or knowledge exchange between two nodes. The total number of direct connections of a node is referred to as the degree of that node.

But as the title implies, the cited article discussed “agent-based systems”, in this case focusing on concurrent engineering, not LAN topologies. The relevance is unclear, and in any event there is no reference in the actual article  by d’Ambrosio and Birmingham to SNA nor to SNA concepts such as centrality and betweenness. It is possible that Rezezad meant to cite some other work as Rezezad cites “d’Ambrosio et al”, but only lists d’Ambrosio and Birmingham; however, chasing down these citations and references is beyond the scope of my present analysis.  Suffice it to say that Google scholar returns no hits for a search on the author”J d’Ambrosio” and the term “centrality”. And the same reference is cited, more sensibly, in Rezazad’s later discussion of “agent-based” systems.

Nevertheless, Rezazad’s sections 2.2.1 and 2.2.2 give eight pages of SNA background material and are derived almost entirely from Wegman et al’s section 2.3 (in turn based on three unattributed sources, namely Wikipedia, Wasserman and Faust, and de Nooy et al, as noted previously).

The side-by-side comparison leaves no doubt that these sections have largely reproduced Wegman et al (which needless to say is not cited or listed as a reference). Of the three underlying sources, only Wasserman and Faust is listed as a reference, albeit with the mistaken date of 1999.

As before, I use highlighting to show the relationship to unattributed antecedents, as well as curly brackets {} for additions by Rezazad and square brackets [with strikeout] to show material in Wegman, but omitted by Rezazad. The opening of Rezazad (and Wegman) is one of the very few passages not copied from any of the three sources, as far as I can tell.

{Networks are useful mechanisms for modeling and understanding
the existing relationships in the world.}

Networks operate anywhere that energy and information are
exchanged: between neurons and cells, computers and people,
genes and proteins, atoms and atoms, and people and people.
{ (Wasserman, 1999) }

Oddly, this opener from Wegman is the one sentence for which I could not find an antecedent; yet, it is the only one where Rezazad has added a citation! (The bibliography lists Wasserman and Faust, not Wasserman, and the date appears to be mistaken).

For the rest, Rezazad, has been careful to change many (but by no means all) references to “persons” or “people” to “actors” or “nodes”; either that, or simply omit occasional sentences that are an especially poor fit. In some cases, a sentence or two attempting to tie the material to computer network topology has been added, as in the following definition originally from Wasserman and Faust, via Wegman et al:

Actor: Social network analysis is concerned with understanding the linkages among social entities and the implications of these linkages. The social entities are referred to as actors. Actors do not necessarily have the desire or the ability to act. Most social network applications consider a collection of actors that are all of the same type. These are known as one-mode networks.

{ In the domain of computer networks, an actor is a network
component, which may be a server, hub, a router, or a workstation.}

Since the concepts of centrality and closeness are actually used in subsequent analysis, Rezazad has made a special effort to eschew references to “people” (or even worse the second person “you”) , in favour of “actors” and “nodes” (p. 17):

The concepts of vertex centrality and network centralization are best understood by considering undirected communication networks. If social relations are channels that transmit information between [people], {actors} central [people] {actors} are those [people] {actors} who have access to information circulating in the network or who may control the circulation of information.

Closeness – The accessibility of information is linked to the concept of distance. If [you are] { a node A is} closer to the other [people] {nodes} in the network, the paths that information has to follow to reach [you] {node A} are shorter, so it is easier for [you] {node A} to acquire information.

Presumably, Rezazad was encouraged to incorporate SNA into his research by his mentors Wegman and Said (a presentation by the trio, with Rezazad in the lead position, entitled A Statistical Social Network Approach to Computer Network Optimization was given at the 2007 Joint Statistical Meetings, sponsored by the American Statistical Association).

And Rezazad (and possibly even Sharabati) may not have realized there was anything problematic with the provenance of the large swathes of unattributed background material. It could even be that Wegman or Said actually encouraged the wholesale use of this material of dubious provenance from the Wegman report.

That makes the apparent failure of PhD supervision even worse; quite simply, Wegman and Said have failed to uphold, much less instill, minimal standards of scholarship. The reputation of the current cohort of PhD students at the Center for Computational Statistics and Computational Data Sciences, and indeed George Mason University itself, will undoubtedly be damaged by their actions.

The pattern of shoddy scholarship outlined here can not be excused or easily  explained away. Such a pattern not only bespeaks a lack of integrity, but also a willingness to cut corners and substitute ignorance, obfuscation and incompetence for diligent scholarship.

And now that the sad truth is emerging, it would seem that an attempted cover up is under way. Not only was Yasmin Said’s dissertation removed from the GMU website after the initial revelations of its dubious scholarship, but a key Said presentation on the Wegman panel was also recently excised.

That talk was the very first event in the GMU Fall 2007 Statistics Colloquium Series. The original colloquium web page linked to the abstract and presentation slides, but on August 20, all references to the September 7, 2007 talk were suddenly removed from the revised colloquium web page.

Experiences with Congressional Testimony: Statistics and The Hockey Stick gave an eye-opening account of the Wegman panel saga. Earlier this year, I showed how Said revealed that Joe Barton staffer Peter Spencer had controlled the process from the start, going so far as to supply the material that Wegman et al should examine.  Barton had previously rejected a National Academy of Sciences offer to study paleoclimatology, yet Said claimed the request to Wegman was in part based on the “independent recommendation” of the National Academy of Science! And despite the obvious partisanship and bias of the exercise, Said even tried to claim that the Wegman team were acting as objective and impartial “referees”.

No wonder Wegman and Said received a “bad invitation” from top administrators at GMU “to explain our testimony”. Perhaps it is time for another such meeting, in light of the mounting evidence of shoddy scholarship and poor leadership exhibited by Wegman and Said.

GMU can no longer ignore the obvious. Two faculty members have not only exhibited a lack of scholarly standards, but they have also participated in an unscrupulous attack on climate science and scientists, part of a blatantly partisan and dishonest effort to mislead the U.S. congress, and the public at large.

Most people begin to

74 responses to “Wegman report update, part 2: GMU dissertation review

  1. LOL…It isn’t just Wegman who figured out that Mann’s scholarship was terrible. That is obvious for anyone who cares to look at the statistical methods and the data being used. Why is it that the AGW movement can’t accept the fact that the Hockey Stick is dead and move on to some of the areas where it has a chance of being convincing?

  2. 1) re: triple serial plagiarism
    Well, it might not be that rare, given that
    original => SWSR => {Sharabati, Rezzazad} is a pair of triples
    Alterrnatively, tehre is
    original => (internal version) => SWSR
    (internal version) {Sharabati, Rezazad}

    2) An odd effect in Rezazad (and earlier in Rigsby’s work, and I think he’s currently doing PhD at GMU)me is the seeming attempt to jam a bunch of SNA terminology into work where it isn’t particularly relevant and doesn’t even get used that much. It is really weird to copy terminology like dyad and triad and never use them.

    It is really weird to use SNA terminology for *computer* networks, not for the human networks that happen to use computers, i..e, including the “social network applications” people now use. Really, graph theory has been around a long time, and people have been using to model computer networks since they’ve existed. For example, I grabbed my wife’s Andrew S. Tannenbaum, Computer Networks, 1981 off the shelf. Chapter 2, “Network Topology” has nodes and networks, but uses the typical graph theory terminology. It doesn’t try to call nodes actors. It was hardly new then. It certainly goes back many decades, before modern computers, into telephone switching machines and their topologies. I worked at Bell labs 1973-1983. This wasn’t new then.

    3) Sharabati has a similar citation/reference problem to that of the Wegman report: half (or more) of the references are uncited.

    4) Without commenting on the *quality or novelty* of the most of the work in Rezazad or Sharabati, (that takes way more expertise to check), both of them clearly spent a lot of time.
    They each wrote 200+-page dissertations covering much ground. This is pretty sad.

    They each plagiarized a few pages text that was (marginal) for Sharabati and irrelevant/distracting for Rezazad. Quite possibly, both will end up with PhD’s revoked, and while they hardly seem blameless, this is *not* good PhD supervision, especially since it almost seems they were “pushed” into this.

  3. Gavin's Pussycat

    The Ethanol.pdf link is broken — prefixed by deepclimate.org. The proper link is

    Click to access Ethanol.pdf

  4. But,but,but! This unfair attack on Wegman is worse than anything we did to the communist Michael Mann! And what about the emails? You cannot attack people unless you steal their emails!1!!

    • No doubt if we had access to some of these guys emails the revelations would make this orgy of deceit look like a grammatical error.

      The truth will out but not by osmosis. Superb job here and elsewhere DC of exposing the ‘auditors’ for the craven integrity-starved hacks they are. I for one am eagerly anticipating Mr. Mashey’s opus.

  5. The NAS and Wegman came to the same conclusion about the statistical methods and use of bristlecones and Foxtail Pines. You can’t get around the fact that Mann’s paper reached conclusions that were not based on the evidence by attacking Wegman for something else that may or may not be valid. Stick to what is material and make your case there. Unless you can show how short centering is appropriate, why stripbark proxies should be used, why it is OK to backfill missing data and cut off data series when real data is available, etc., etc., etc., you have no case and can’t defend Mann’s fraud.

    • WHAT?
      This article has to do with Wegman et al. [edit] articles and Phd Theses.
      nice try at diverting the discussion without providing any evidence.
      It is obvious from your comments that you have not stopped beating your wife. TRUE or FALSE!!!
      SEE? I too can make comments pulled out of the hat.

  6. Oh look, yet another Hockey Stick! This one is from Thibodeau et al. (2010, GSL), “Twentieth century warming in deep waters of the Gulf of St. Lawrence: A unique feature of the last millennium”

    Their conclusion:

    “We conclude that the 20th century warming of the incoming intermediate North Atlantic water has had no equivalent during the last thousand years.”

    And the folks at ClimateAudit [edited] say that they do not know whether or not we have a problem.

    If Mann et al. had committed fraud they would have been found out by their peers long before McIntyre and Barton arrived on the scene. Funny how Mann et al. keep getting vindicated (NAS, PSU, HoC, Muir etc.), but some are just so obsessed with ideology that they can’t see the forest for the trees or is that the trees for the forest of political spin?

    The really juicy story here is the convincing case for plagiarism committed by groups with ties to McIntyre.

    [DC: That looks to be an interesting paper. If you post some commentary on it on the Open Thread, no doubt some will comment on it there (hint, hint). ]

  7. For everyone’s edification on the consequences of plagiarism, see:


    How about that? Having your thesis revoked and being told to send your diploma back to the university! Plus, they removed her thesis from all online and library sources.

  8. Hank,
    I take your point. Moderation will be tighter from here on.

    But I’ll briefly respond to Vangel before moving on and returning to the actual topic under discussion.

    Both the NAS and the later Wegman report agreed that “short-centred” PCA was inappropriate and should not have been used. The NAS also agreed that firm probabalistic statements about specific years or specific decades in temperature reconstructions could not be sustained.

    However, the NAS disagreed with Wegman in several important respects.

    – The NAS was careful to note that other proxy studies, whether using corrected PCA (Wahl and Ammann) or non-PCA methodologies, arrived at similar findings of anomalous late 20th century warmth. The NAS also noted several other lines of evidence for this assertion.

    – The NAS recommended that “strip-bark samples” be “avoided”, implying that the bristlecone and foxtail proxies could yield useful climatic information, if properly handled.

    – The NAS overview of paleoclimatological methods and findings avoided the errors and bias evident in the Wegman report.

    – Lead author Gerald North vehemently disagreed with Wegman’s assertion of the failure of peer review in paleoclimatology. North also denigrated the social network analysis on which the assertion was based.

    Finally, there is no evidence whatsoever of fraud on the part of Michael Mann. But in the case of Wegman and Said, there is ample evidence of questionable scholarship, apparently rising to research misconduct.

    The latter is the subject of this thread, and I expect all future comments to address this issue or else be subject to moderation at my discretion. Thanks!

  9. There may well be extenuating circumstances for Rezazad, and even for Sharabati. The unattributed material on social network analysis was not crucial to the dissertations, and there is not evidence of a wider pattern elsewhere in the dissertations or other work.

    The same can not be said for Wegman and Said, however.

  10. [DC: This comment ignores my previous warnings. It is off-topic, full of false accusations and way too long. You’ll have to discuss Wegman’s cherrypicking and misinterpretation of Wahl and Ammann somewhere else. Thanks! ]

  11. Of the 3 PhDs, Said is pretty clear.
    1) and I recommend going back and reviewing her Sept 2007 talk that has been disappeared. Of course, that didn’t quite happen… and actually, someone missed another seminar list:
    the Washington Statistical Society, and happily, it even links to an abstract, which is informative:
    “Rarely does the federal government need advice on theoretical statistics. I would like to talk about one exception. Efforts to persuade Congress to enact legislation that affects public policy are constantly being made by lobbyists who are paid by special interests. While this mode of operation is frequently extremely effective for achieving the goals of the special interest groups, it often does not serve the public interests in the best possible way. As counterpoint to this mode of operation, pro bono interaction with individual legislators and especially testimony in Congressional hearings can be remarkably effective in presenting a balanced picture. The debate on anthropogenic global warming has in many ways left scientific discourse and landed in political polemic. In this talk I will discuss our positive and negative experiences in formulating testimony on this topic. ”

    2) And every time I look at that, more things pop up. we’d noticed a while back that the WR used a distorted version of the long-obsolete 1990 IPCC FAR graph. By chance, it over-emphasized the MWP. They didn’t have a copy of the FAR, so got it from someone else. I always wondered if it was distorted before or afterwards.

    Slide 8 of Said’s talk has a copy of the 1990 IPCC FAR temperature sketch, the right one.

  12. Hmm DC
    What this reminds me of is forensics performed in trying to determine if computer software has been copied from a previous employer and used at a new company. I have been involved in analyzing some code in support of legal actions for copyright infringement based on this. At what point does plagiarism become copyright infringement? As I understand the current laws, just the publishing of your text invokes copyright so that you now own what you have written. The fact that you change a few words, does not invalidate the ownership of the original copyright holder.

    eg. in one famous case, the infringment was determined by the tabs/spaces found at the beginning of the lines of code, which were identical, even if the code had had variables etc changed.

  13. and in a more famous the cases won against Led Zepplin for “plagiarism”/”copyright” infringement.

    [DC: More here: http://en.wikipedia.org/wiki/Led_Zeppelin#Allegations_of_plagiarism.

    But enough about musical examples, entertaining though they may be. ]

  14. harvey,
    Well, yes, plagiarism often involves copyright infringement. I suppose it would be up to the authors or publishers to decide if this should be pursued.

  15. Rattus Norvegicus

    Hmm, this is interesting. I guess, given the level of scholarship in Wyner & McShane, that this sort of problem may be endemic to the sort of “scholarship”

    It’s sort of like the old cliche: “wer’re from the statistics department, and we’re here to help”. There seem to be a lot of statisticians who are more than willing to leap into a field w/o understanding the science, the data problems, or the questions the scientists in a given field (mostly climate science) face. I’d like to see Mashey (he has just a wee bit of expertise in this area) comment more on network architecture and SNA. This seems a rather odd application of SNA and as he points out graph theory has been the basic method for analyzing network architecture for, well, pretty much since computer networks were invented. Does SNA add anything?

    I suppose this add fuel to the ASA ethics requirement that the statistician should understand the underlying science. Wegman (and his students) clearly did not.

  16. 1) Recall that last year I found 200+ physicists willing to sign that silly Austin/Happer/Singer anti-AGW petition for the American Physical Society.

    2) I don’t know that statisticians are any more ready to jump into unfamiliar turf than physicists. After all, so far we’ve only identified a very small set of them, basically Wegman & a few of his students, and McShane Wyner.

    3) The last thing in the world we need is the “faux fight between statisticians and climate scientists” that Wegman tried to start. in any acse, ASA has some good folks, and I recommend ASA Workshop 2007.
    Jim Berger’s talk, the part on role of statisticians in climate, is good.
    In fact, all but one of the talks is good.

    On the other issues around SNA, I’ve written a few pages on that, Coming Very Soon.

  17. DC: — “Experiences with Congressional Testimony: Statistics and The Hockey Stick gave an eye-opening account of the Wegman panel saga.”

    Interesting presentation. Does anyone know who dropped out and why they dropped out (page 5)?

    Page 6: “None of our team had any real expertise in paleoclimate
    , meaning NO expertise in paleoclimate reconstruction, surely?

    Page 8: “The 1990 IPCC report showed a very different
    curve with a warmer-than-current period from
    1000 to about 1450.”
    Wasn’t that based on Central England alone? Was this pointed out orally at the presentation?

    On page 20: “This was obviously coached by the “Hockey Team” asking very detailed statistical questions.” Big deal. Were the questions invalid? Not likely.

    Nice to see she got a load of contacts, though. It was obviously good for her career.

    Does anyone on the contrarian side of the fence take this climate s**t seriously?

    • The 1990 FAR chart is discussed here (I believe William Connelley wrote the bulk of this article).

      There was more discussion of the use of the chart in the Wegman report at WC’s Stoat, including comments by John Mashey and myself.

      I think John has a section on this in his upcoming report.

    • Thanks DC. For some reason it keeps cropping up lately as proof that the IPCC is rubbish so the links are handy.

    • Nice to see she got a load of contacts, though. It was obviously good for her career.

      Notably the “contacts” slide seems to be all Republicans or “skeptics”.

      From the PPT:

      Our approach was to serve as an honest broker…


      Demonstrated mathematically that the Mann et al. procedure introduces a bias that preferentially selects “hockey stick” shapes

      Why yes, it does. However Dr Said didn’t feel it was important enough to mention on the slide that apparently these preferentially selected “hockey stick” shapes are *much* smaller than the one in Mann et al, and that they are equally likely to have the blade pointing down as pointing up. If this is true and she didn’t mention it, one wonders about the “honest broker” bit…

      …but I suspect this is starting to (?) head OT…

  18. jklinesmith12@gmail.com

    It isn’t just Wegman who figured out that Mann’s scholarship was terrible. [etc. ….]

    [DC: Here’s a bit of comic relief. This comment reproduced a previous one from Vangel, and its URL (now removed) pointed to a website called DissertationMaster, where you can get “dissertation writing help” and even “a list of suggested topics”. ]

  19. Yes, I believe that the Vangel’s comment is a perfect example of the bandwagon technique.

    How is that? I figured out the problems with the Mann paper years ago and was arguing that the verification statistics showed that there was no ‘there’ there. It is the AGW clowns who are jumping on bandwagons. They were unable to argue against the statisticians’ critiques of the MBH papers so they tried to smear them by making up charges that divert attention from the actual issues.

    [DC: You are grossly mistaken. There are several substantive critiques of M&M and Wegman. For example, M&M and Wegman never addressed the need for an objective criterion for retention/selection of principal components, an issue I examined (among many others) in my discussion of McShane and Wyner.

    And the poor and biased scholarship of the Wegman report has now been amply supported by reams of evidence. Should we believe you, or our own “lying eyes”? ]

    • > How is that [Vangel is exemplifying the bandwagon technique]?

      For memory’s sake, here is the interesting bit:

      > That is obvious for anyone who cares to look at the statistical methods and the data being used. Why is it that the AGW movement can’t accept the fact that the Hockey Stick is dead and move on to some of the areas where it has a chance of being convincing?

      The first sentence amounts to say: we won. The second sentence amounts to say: if you join the AGW movement, whatever that is, you’re a loser. The bandwagon technique relies on the fact that people wants to be the part of winners, not losers. The bandwagon here is the anti-AGW movement.

      There are lots of entailments that are presupposed by that bit. The most important one is that Mann’s work is essential for the AGW movement to stay alive. So we have the usual “we broke the hockeystick, so we won” meme.

  20. [DC: Again with the charges of fraud. I’ve been patient with you over the past few months, but no more. Thanks! ]

  21. Aside: DC, I wasn’t criticizing your moderation, which is commendably patient — you give people the chance to think and improve what they write, no matter how poor their initial posts.
    I was noting the persistently poor posts by ‘vangel’ — the same userid (though we can’t tell if it’s the same person) has been trolling and gotten moderated down on other blogs, on both climate and US politics, in the last few years.

  22. Alan D McIntire

    Take a look a this paper

    Click to access jf97.pdf

    and this one

    Click to access an%20improved%20approximation%20to%20the%20estimation%20of%20the%20critical%20f%20values%20in%20best%20subset%20regression.pdf

    From your links, Mann was picking about 90 pr0xies out of a
    potential 1100 or so. Plug

    into that critical F value equation, and you ‘ll see that the AVERAGE
    RE correlation from random data using 119 years, leaving 30 for validation, will be over 90%.

    After reading and understanding those two papers, you’ll undestand why
    Mann, Wahl, and Ammann are laughingstocks in the statistics community.

    • Although it’s not clear, I think you are referring to the 90 or so proxies used by McShane and Wyner in their reconstruction back to 1000. That’s because only 90 are available back to that century. (Alternatively one can build the reconstruction in “steps” using progressively fewer proxies as one goes further back).

      So it’s not 90 out of 1100, it’s 90 out of … well, 90.

      In Mann et al’s CPS reconstrution, the screening process selected something like 25-30% of all available proxies, IIRC. EIV uses the whole proxy set.

      Probably, though, it would be best to continue this at the McShane and Wyner thread.

  23. Alan D McIntire

    [DC: I’ve posted this to the McShane and Wyner thread. Please continue there. Thanks! ]

  24. Could someone provide a bit of clarity on what appears to be the core textual analysis technique being used in this post?

    AFAIK, we’re dealing with comparison of original and paraphrased texts. The complaints seem to be a) Too much similarity in the paraphrase; b) insufficient citation/credit given.

    The three confusing things I see are:
    a) When paraphrasing a factual data-based statement, I would expect the word count of “data communication” to be passed through pretty much intact. Wouldn’t analysis of ANY such brief section produce a similar result? OR… is the unstated complaint that a paraphrase should cover a larger section of text than a paragraph?
    b) If GMU’s own example fails your test… I wonder if you have ever applied this test to a variety of other “training texts”? After all, if your test fails on the instructional examples provided to students, is it any surprise that the students’ papers produce a similar result?
    c) AFAIK, each “chunk” of material bringing in information from a source should properly cite the original. Two questions about situations like this where there are a number of citations: 1) with lots of citations, is it still required for the “citation rate” to be 100%, or are you just being picky? and 2) what do we see in other typical papers?

    My bottom line query: we have incredibly powerful textual analysis tools today, to tear into written material… before we can confidently claim these tools have found something significant, it seems to me they need to be applied to a variety of similar original/paraphrased texts to understand what ought to be expected in the first place.

    I had no real question about this until I saw that your technique produced a “fail” even for the GMU “good” example. At that point I began to wonder if the problem lay more with your measuring stick than the thing being measured 🙂

    • I’m afraid you have misconstrued the GMU Writing Centre example (which, by the way, was a constructed example, not student work).

      The writing centre proffered this as an example of improper paraphrasing rising to plagiarism. The point of the Writing Centre example is that using mostly the same wording in a paraphrase is unacceptable, even if the citation itself is properly done, and even if the paraphrase is relatively short.

      All I did was to show clearly why this example is plagiarism. Obviously, if the GMU Writing Centre considers that example plagiarism, there can be little doubt about their finding on the Said passage.

  25. In general, use of key technical terms is sometimes inescapable, so when I was doing the exhausting analysis of the Summaries, I did not think “plagiarism” just because I saw a few of the obvious terms. Since I was looking at Summaries, whose sources were identified, I started with a slightly different, and more restrictive approach, to be as conservative as possible, and give them every benefit of the doubt.

    Mine worked like this:
    1) Use a manual approximation to the “longest common subsequence” problem, i.e., UNIX “diff” looking for *exact* word matches in order when comparing the WR versus the antecedent, i.e., ID in cyan. The common technical terms were generally nowhere as useful in locating the antecedent sentences as were unusual words. However, inescapable technical terms adjacent to a lot of cyan got marked cyan as well, since it was obvious,

    2) I gave them the benefit of the doubt on inescapable common words.
    I never started with such.

    3) Obvious Trivial Changes, in order, were marked in yellow, and included in SS, which also picked up obvious rearrangements and rephrasings.

    In practice, I think my version tends to be a little more restrictive on ID, but we end up roughly the same on SS. Mine is slightly more algorithmic, with less low-level judgment, DC’s captures “cut-paste-rearrange” plagiarism a little better. Think of them as two slightly different reconstruction techniques.

    By my (restrictive) rules, 50% of the WR Summaries words 9as a group) are ID, another 31% SS, and of course, I was giving them every break.

    In any case, it doesn’t matter. This stuff is so clearly plagiarism that a few percentage points here are there are irrelevant. The most amazing one was MBH99, which was 100% SS … obviously, that was an unimportant paper, could be dealt wit very quickly.

    Of course, the hard part, of which DC is the master, is *finding* the antecedents in the first place, especially when not available online. Of course, that step wasn’t needed for the Summaries.

    However, fairly soon, you will be able to see even more evidence of the visual similarity of plagiarism style between the WR and Said’s dissertation. DC has showed 10 WR pages so far, I’ll add another 25 for anyone not yet convinced. That’s 35 of 91 pages … that is mostly plagiarism. The plagiarism in MW is more sophisticated, as it blends 2 sources (sometimes wrongly), but its problem is odd tip-off words.
    Said’s stuff (except for the weird howlers) mostly looks OK because the *words* are OK words, since they are mostly cut-and-paste. As best as i can tell, even experts miss this stuff because they skim introductory material, see what they expect, and go on … without studying every sentence to notice the silly things.

    [DC: For the record, I don’t agree there is a strong case for plagiarism in M&W. But I would say the evidence is overwhelming that they have cited sources that were not in fact consulted, while not fully disclosing the actual sources. While that is not acceptable, I’m not quite sure what might be the correct term for such lapses of scholarship standards. ]

    • Rattus Norvegicus

      DC: how about “ideologically driven incompetence”?

    • DC,

      Proabably it’s ‘unethical citation pratices’ which comes generally under academic misconduct, though it generally seems to be considered a minor type. On the other hand, as correct referencing is meant to be your protection from allegations of plagiarism, consistently using ‘unethical citation practices’ may leave you in a sticky situation.

      It’s ok to use secondary referencing, but it should be made crystal clear that is what is occurring (eg. A reports (cited in B, 19xx)), and only the material actually read should be listed in the reference list, not the original source.

  26. Oh, to be clear, this is nothing like the massive obvious plagiarism of the WR. MW are much more subtle.

    The correct term for most of the issues is fabrication. The plagiarism is relatively minor, but simply shows how they mangled two sources together, but with tip-offs that show where the ideas came from. You might post the new source you found, and see what people think.

  27. RN,
    I think there is also another “i” word involved – “indolence”.

    Anyway, in some cases, an “as cited” reference should have been used. This appears to be the case for the Bradley references, which were almost certainly not consulted directly, but rather really “as cited” in Wegman et al.

    In other cases, the actual source was likely something even less “citable” than Wegman et al. For example, M&W’s mixup over the unpublished Mann et al 2004 reply, was almost certainly the result of poor understanding of ClimateAudit.

    As John mentions, there is a rather embarrassing apparent antecedent for M&W. I’ll be posting on this within the next day or two.

  28. Marion Delgado

    I would say, having studied graph theory undergrad and courses that included it in grad classes, the strong computer network-graph theory connection is visible from the other side as well.

  29. DC,

    Have you or anyone else that you know of submitted a formal complaint to GMU about plagiarism in the Wegman report?

    • Who would this complaint be submitted to? Is the Wegman report considered a scholarly (peer-reviewed) work? If it is just an opinion piece and not research, then it is not research misconduct. Copyright violation maybe, but that is entirely different.

      With that said, plagiarism in a scholarly publication or dissertation is considered research or academic misconduct, respectively. Research misconduct would be under the purview of the federal agency who funded the work (NIH in case of Said) and the university. Academic misconduct would only concern the university. Universities handled thesis plagiarism differently, as indicated in the link included in my earlier post, reposted below. Some universities look the other way, and other universities take concrete action.

      BTW, the Board of Trustees of Ohio State University voted on Friday to revoke Nixon’s doctoral degree.


    • Sam,

      The venue of the plagiarism is immaterial. Professors have been disciplined for plagiarizing in op-ed pieces and I would think a report to Congress would be held to as high a standard as other scholarly work.

  30. DC (and John Mashey), what do you expect GMU will do with this? Do you expect they will do something at all?

    I respect you both immensely for taking on this work.

    • Isn’t Ken Cuccinelli coming down hard on Virginia university staff who may be involved in dodgy goings on? Will he be producing a civil investigative demand for all documents pertaining to Wegman and demanding his emails? Is this an ideal case for him to sink his teeth into?

  31. Thanks for the kind words. Ask again in the near future after my tome on all this appears here.

    Although I cringe at the idea of another gate, DC’s discovery of the plagiarism was really like the watergate breakin. It is just the tip of the iceberg, and much more has come to light.

  32. New poll:

    What percentage of people asserting things like “fraudulent hockey stick is dead”, “Mann’s statistical techniques have been debunked” actually have a clue as to what they’re talking about?

  33. J Bowers; Re Cuccinelli
    Well, I’ve got that on my list of 30 {academic misconduct, copyright, ethical guidelines, possible mis-use of Federal funds, the Congress-related felony possibles, and destruction of evidence} … but that’s one I’m not holding my breath on…

    a) Cuccinelli and his assistant, Wesley Russell, both got their JD’s at GMU.

    b) The Kochs have funded GMU well, its Mercatus Center, its Institute for Humane Studies (where Fred Singer used to be attached), see
    see pp.93-95 in CCC.

    c) The Kochs have also given to Cuccinelli.
    See energy>natural gas, although of course coal does even more. I doubt the Kochs need to spend a l0t of money on Cuccinelli.

    An interesting, and totally nonobvious funder is Qustfore Communications, run by Ken’s father Ken Cuccinelli, Sr, previously a natural gas executive, and if you look closely still does consulting for Latin America and Europe. (Europe: hmm).

    • Thanks John. I remembered Cuccinelli went to GMU. My, how these networks intertwine 😉

      Correlation or causation?

  34. I didn’t know about Michael E O’Neill, another GMU professor accused of plagiarism. Maybe GMU will be known as Copy and Paste U.


  35. You present very compelling evidence and arguments. Have you considered the theses of this other students?

  36. John and DC,

    I just had a thought. I bet that Peter Sinclair would love to do a “Crock of the week” on this. Perhaps one of you should contact him and try and work something out. Your story would get a lot of exposure that way.

  37. Pingback: Wegman Report Revealed as Fraudulent Academic Plagiarism - The Blob

  38. Well, GMU is accredited by
    SACS; their next one is coming up Spring 2011.

    2) At least as of 2001, they were having
    problems with academic integrity:

    “The major conclusion of the Task Force was that large segments of both students and faculty ignore the Code’s provisions. We need to remedy this.”
    They promised to do better.

    MapleLeaf: Crock of the Week
    I’m not sure how well this works in Peter’s visual style, but it’s worth a thought … and don’t worry, we’re not unknown to each other.

  39. I have emailed the DoJ and GMU about this. I recommend that others do likewise.

  40. Deep Climate | October 6, 2010 at 6:32 am | Reply

    Why the DoJ?

    Misleading Congress

  41. Amoeba,

    I was thinking specifically of the PhD dissertation problems. But, yes, in the larger context, there is strong evidence that Congress was misled. The real villains here are Joe Barton and Ed Whitfield, with Peter Spencer co-ordinating the attack on climate science and scientists on their behalf.

  42. DC, you are of course correct.

    Let me take this opportunity to apologise to you wholeheartedly.

    I am together with many others very grateful for your discovery of plagiarism in the WR, investigation of the associated theses following a tip-off and the fact that your diligent and tireless groundwork inspired and ultimately grew into Dr Mashey’s magnum opus.

    I truly expect that the DoJ and GMU both investigate this to the fullest possible extent, following that I hope justice is served and that the right and proper consequences follow.

    The can of worms that has been revealed, shows the extent of the forces arrayed against science. When this conspiracy is combined with the Kochtopus network that has been revealed by GreenPeace [1]; The New Yorker [2]; The New York Times [3]; DirtyEnergyMoney [4]; it shows that democracy itself is under attack.

    References for new readers:
    1. Koch Industries Secretly Funding the Climate Denial Machine
    2. COVERT OPERATIONS – The billionaire brothers who are waging a war against Obama.
    3. The Billionaires Bankrolling the Tea Party
    4. http://dirtyenergymoney.com

  43. Pingback: Scientific misconduct and skepticgate | Open Parachute

  44. Pingback: Wegman et al miscellany | Deep Climate

  45. Pingback: Wegman and Said 2011: Dubious Scholarship in Full Colour, part 1 | Deep Climate

  46. Pingback: Minining new depths in scholarship, part 1 | Deep Climate