Statement of the Minnesota Physician-Patient Alliance on “pay for performance” to the Minnesota Citizens Forum on Health Care Costs, December 17, 2003
Advocates of managed care have asserted that quality of care can be measured so precisely that patients would soon be able to compare the quality of plans, hospitals, clinics, and even individual physicians. Although this prophecy has been made now for more than three decades, it has yet to come true. It is as impossible today for the average patient to “shop” for a doctor or hospital or insurance policy based on quality as it was in 1970 when HMO advocate Paul Ellwood first proposed “performance reports” for HMOs. (1)
Despite the failure of accurate report cards to materialize, insurance companies and managed care advocates have lately begun to promote a method of paying doctors known as “pay for performance,” a method which, like capitation, assumes the existence of accurate report cards. Last year, the Centers for Medicare and Medicaid Services announced, per instructions from Congress, it would conduct a three-year pay-for-performance demonstration project, starting in 2003 (2). Also in 2002, the Integrated Healthcare Association, a coalition of seven California plans, announced a pay-for-performance project that will issue its first report cards on physicians in 2004 (3). In Minnesota, HealthPartners, Blue Cross and Blue Shield, and Medica have all announced pay-for-performance programs. (4)
The phrase “pay for performance” is usually used in a context which indicates it means “more pay for higher quality of care.” However, the phrase is sometimes used in a manner which indicates it means “more pay for lower utilization rates” in addition to “more pay for higher quality.” This conflation of phrases with two different meanings is not unusual for the HMO industry. HMO officials and managed care advocates have in the past frequently treated the phrases “quality assurance” and “utilization review” as synonymous when, of course, they are not. MPPA opposes, unconditionally, “pay for performance” if it means “pay for reduced utilization.” In this paper, “pay for performance” is defined to be synonymous with “pay for improved quality of care.”
Because pay-for-performance (PFP) schemes are spreading, MPPA anticipates that the Citizens Forum will be asked to recommend PFP to the governor, either as a means to improve quality or to reduce cost. We make three points in this letter:
(1) The Citizens Forum was established to recommend methods of containing health care costs, not improving quality. Because there is no solid evidence that PFP can contain cost, the Forum should leave PFP off its agenda.
(2) If the Forum is going to expand its scope to include an examination of methods of improving quality, the Forum should focus first on solutions that are far more likely to improve quality than PFP, including eliminating managed care, the nursing shortage, and lack of insurance.
(3) PFP is unlikely to improve quality for the vast majority of medical services; in the few cases in which it will work the expense of “grading” physician performance accurately will be substantial and may well offset, or more than offset, any savings derived from improved quality; PFP has the potential to damage quality.
POINT 1: THE FORUM’S FOCUS IS COST CONTAINMENT
The mission of the Minnesota Citizens Forum on Health Care Costs is to find ways to reduce costs, not improve quality. This mission is obvious from the Forum’s name, and from the governor’s press release announcing the creation of the Forum (6). We do not mean to suggest that the Forum should ignore the effect that poor quality has on cost, nor that the Forum should ignore the impact its cost-containment recommendations could have on quality. We are saying that the Forum should recommend PFP, and any other quality-improvement proposal, as a cost-containment measure only if proponents of the proposal produce some empirical evidence that it can reduce costs.
At this date, no such evidence exists for PFP. For PFP to reduce costs, three conditions must be met. First, PFP must be shown to improve quality. Second, the quality improvement must be shown to reduce costs. Third, the cost savings caused by the quality improvement must be shown to exceed the cost of implementing PFP. Studies documenting all three of these conditions have not been published in peer-reviewed journals. (7)
POINT 2: IF THE FORUM IS GOING TO ADD QUALITY IMPROVEMENT TO ITS MISSION STATEMENT, IT SHOULD FOCUS FIRST ON OTHER SOLUTIONS
If the Forum decides, however, that quality improvement is part of its mission statement, then the Forum should make sure that it gives high priority to factors that contribute substantially to degraded quality of care, including managed care (to be specific, capitation and other methods of pay-for-denial-of-care, as well as utilization review), the nursing shortage, and the absence of universal health insurance. The evidence implicating each of these factors in reduced quality of care is extensive. MPPA would be happy to deliver papers to the Forum documenting this statement if we are asked.
POINT 3: PAY FOR PERFORMANCE IS UNLIKELY TO WORK AS ADVERTISED
PFP is unlikely to work for the vast majority of medical services provided to patients. There are several reasons for this, the most important of which is the very low probability that experts will ever develop accurate report cards for the vast majority of medical services. The reason why accurate report cards are required for PFP is obvious: An insurer cannot pay doctors for better or worse performance unless the insurer can grade performance.
Report cards are based on one or both of two types of quality measures: measures of outcomes, and measures of processes used in treatment. An example of an outcome measure is mortality rates among patients who had coronary artery bypass surgery. Another example is cholesterol levels, which falls into a category labeled by some as “intermediate outcomes.” Examples of process measures include prescribing beta blockers for patients who have suffered a heart attack, and ordering blood tests to check cholesterol levels in diabetics.
Patient satisfaction surveys are rarely accurate, but, accurate or not, they can pose questions designed to measure both outcomes and processes. The question, “Was your health improved by your care?” may be construed to be an outcome measure, while the question, “Did the doctor listen to you?” would fall into the process measure category.
Outcome measures are very expensive (in terms of dollars and lost privacy) because they have to be adjusted to reflect differences in patient health and other factors beyond physician control (a process known as “risk adjustment”). Process measures are usually expensive because they require agreement on standards of care, which do not exist for thousands of medical services. Some process measures must also be risk-adjusted.
Report cards based on outcome measures
Outcome measures are expensive and invade patient privacy because they must be risk-adjusted in order to be accurate. Adjusting for differences in patient health nearly always requires access to patient medical records (as opposed to “administrative data,” typically defined as data found in claim forms collected by insurers and/or discharge reports prepared by hospitals). The New York State Department of Health report card on heart surgeons is a good example. It uses mortality rates following coronary artery bypass grafts (an outcome measure) as the measure of quality, and it does expensive risk adjustment using patient medical records. Examples of “patient risk factors” that are taken into account by the New York Department of Health include several measures of left ventricular function (ejection fraction, heart attack within the last seven days, and congestive heart failure) and the presence of several comorbidities (including diabetes and obesity). (8)
Risk adjustment of outcome measures that will be used to punish or reward physicians (either with adjustments to reimbursement or with more or less market share) is necessary primarily to protect patients, and secondarily to ensure fair treatment of doctors. If risk adjustment is insufficiently accurate, or if it is even perceived by physicians to be insufficiently accurate, physicians will be under pressure to avoid sicker patients. The following statement by Fowles et al. was made about capitation, but it applies as well to PFP because PFP, like capitation, exposes physicians to risk of lost income if their patients are sicker than average:
Unless capitated payment rates can be adjusted for enrollee health status, physicians or physician groups with patient populations that are sicker than average will be at a financial disadvantage. Sick patients, too, will be in jeopardy. Without adequate risk adjusters, risk-bearing organizations have a financial incentive to select healthier individuals or jettison sicker ones. (9)
Despite the consensus that risk adjustment of outcome measures is required, few of the report cards now touted by the insurance industry and large employer groups are adjusted for anything other than age and sex. According to Hofer et al., “The profiling approach most commonly used by payers and administrators is to calculate simple age- and sex-adjusted measures that are averaged by physician to generate a physician profile.”(10) In view of the scarcity of research on the effect of implementing PFP without accurate report cards, it is fair to say the managed-care industry is once again implementing a managed-care method without first testing it to assure it is safe and effective and worth its cost.
The few reliable studies that have been done confirm what theory and common sense predict: that providers whose income is contingent upon their performance as measured by inaccurate report cards are under great pressure to cherry pick. Shen found that providers of substance abuse treatment avoided sicker patients following Maine’s implementation of “performance-based contracting,” a system that threatened low-scoring providers with loss of their contracts and promised high-performing providers more funding. The scores that Maine used to assess performance were not adjusted for risk (11). But even report cards with sophisticated risk adjustment may cause some providers to avoid the sickest patients. The New York bypass report card, as sophisticated as it is, may not be accurate enough to take the pressure off surgeons to cherry pick. In any case, some surgeons in New York think it is not, and, as a result, there is some evidence that quality of care for the sickest patients has declined as surgeons who do not trust the risk-adjustment methodology find ways to avoid sick patients in order not to have those patients drive their mortality rates up. (12)
Some experts have suggested that outcomes cannot be measured accurately enough to eliminate the incentive to reject sicker patients. Hofer et al. examined the accuracy of an outcome measure commonly used in physician report cards today – hemoglobin A1C (HbA1c) levels in patients with type 2 diabetes. After adjusting the HbA1c levels for differences in patient health and socioeconomic status (adjustments far more sophisticated and expensive than those used by the average insurer today), the investigators found that physicians would still be better off getting rid of their one to three sickest patients (out of an average diabetic base of about 21 patients per doctor) (13). Doing so would improve their score “dramatically,” said the authors. “This advantage from gaming could not be prevented by even detailed case-mix adjustment,” they concluded. (14)
Hofer et al. blamed the inaccuracy of the HbA1c outcome measure on two factors: the relatively small effect that differences in physician practice styles have on HbA1c levels, and the small number of diabetic patients seen by individual physicians. The authors determined that only 3 percent of the variation in physicians’ average HbA1c levels could be attributed to physician behavior; the rest was caused by factors outside of physician control, including patient behavior and chance. They stated that “at least 100 patients [per doctor] would be needed to reach 80 percent reliability (often considered the minimum for making decisions about individuals).” They took note of the possibility that patient pools and differences among physician practice styles might be larger for other diseases, then minimized that possibility with this observation: “However, diabetes is one of the most common diseases in the United States. Apart from hypertension, it is difficult to imagine that there would be enough cases per primary care physician to construct disease-specific profiles for almost any other chronic condition.” (15)
Of course, the question of whether PFP should proceed is dependent not just on how the question about report card accuracy is resolved. Two other questions must also be asked: What does it cost to produce accurate report cards based on outcome measures and to carry out the other tasks necessary to implement a PFP program, and does that cost outweigh the benefits achieved by the PFP program? The Forum should make sure it has sound, evidence-based answers to these questions before recommending PFP. What little evidence we have on the cost question is not encouraging. For example, one of the oldest report card projects, the Cleveland Health Quality Choice program, was terminated four years ago because the Cleveland Clinic concluded the cost was not worth it. After spending $2 million a year for a decade, the Clinic withdrew its nine hospitals from the project on the ground that the report cards’ effect on quality was too insubstantial to warrant $2 million a year. (16) Report cards on physician performance in hospitals are probably less expensive to risk adjust than report cards on outpatient care because records of inpatient medical care are more centralized. Collecting the data necessary to risk adjust outcomes for services provided on an outpatient basis will unquestionably be more expensive because patient medical records are scattered over many more sites.
Report cards based on process measures
Process measures (e.g., did the surgeon prescribe a beta blocker for the heart attack patient?) often do not require risk adjustment, and, for diagnoses for which a standard of care has been reached, are therefore likely to be less expensive than outcome measures. This is true, however, only where the patient population examined is limited to patients who actually saw a doctor. For example, a process measure that measured whether a doctor advised a patient to quit smoking would not need to be risk adjusted as long as the study was limited to patients who visited the doctor. But a process measure that used all patients assigned to a clinic by an HMO as the denominator would have to be risk adjusted for both health and socioeconomic factors that influence patients’ ability and inclination to see a doctor. These factors are partially or wholly outside the doctor’s control and, if uncontrolled, could bias physician “scores.” If patients refuse or are unable to see their doctor, or refuse or are unable to comply with physician recommendations, that is not the doctor’s fault, and it is irrational to punish doctors for conditions beyond their control. For example, a PFP scheme that paid doctors more if a higher proportion of their female patients got mammograms would have to risk adjust physician “scores” so that the report card would not be confounded by the following factors: the woman’s insurance status (is she insured, if so, does her policy cover mammograms, and if so, how big is the deductible?), her income, her education, and the presence or absence of language and transportation barriers. (17)
However, even process measures that do not require risk adjustment present a hurdle that is almost as daunting as risk adjustment, namely, the need for an agreed-upon standard of care that applies to all patients with a given diagnosis. Relative to the thousands of medical services rendered in America today, evidence-based standards are few. The proportion of medical services for which a science-based consensus on standard of care exists is apparently no more than15 to 20 percent (18). According to Landon et al., “[F]ew medical specialties have an evidence base that is robust and comprehensive enough to support PCPA [physician clinical performance assessment].” (19)
The main advantage of process measures – the fact that many do not have to be risk adjusted – is, of course, their main disadvantage – they may bear little relation to patient health. A physician could score high, for example, on a report card that measured how often the physician’s diabetic patients had their cholesterol measured (a process measure), but that score may say little about how well the patients have kept their cholesterol levels within the normal range (an outcome measure).
Report cards based on patient surveys
Many advocates of report cards have expressed the hope that patient surveys will prove to be an inexpensive alternative to report cards based on risk-adjusted outcome measures or process measures. It is no doubt true that most of the survey-based report cards being published today are relatively inexpensive, but, unfortunately, they are not accurate and should not be used by patients. Surveys face the same problems outcome- and process-based report cards face. If the survey question seeks information on outcomes, the “grade” has to be risk adjusted, that is, the respondents’ health status must be taken into account. It is well established that sicker patients are more critical of their caregivers. (20) If the survey question seeks information about processes of care, the process being measured must be shown to have a robust relationship with high-quality outcomes. It is not clear that questions commonly asked in patient surveys meet these criteria. Consumer surveys do not, in other words, provide a low-cost method of deriving accurate report cards on physician services.
The typical consumer survey suffers from a defect that is closely related to the problem of inadequate risk adjustment – the “bundled product” problem. Whereas physician outcome and process measures typically measure the quality of a single, discrete medical service (e.g., bypass surgery or cholesterol checks for diabetics), publishers of consumer surveys typically make no effort to limit their sample to patients receiving a single medical service. The Buyers Health Care Action Group in Minnesota, for example, makes no effort to limit its surveys of patients who visit BHCAG “care systems” to patients receiving specific services. Instead, BHCAG bundles all patient responses into one score. Thus, readers of BHCAG “satisfaction” surveys have no idea what services patients sought, and, therefore, no way to determine whether one care system’s ostensibly superior score was caused by factors within or without the system’s control. To take an extreme example, consider how easy it would be for Care System A to outscore Care System B if the former saw only children who needed immunization shots while the latter saw only elderly cancer patients with numerous comorbidities.
The most obvious problem with “satisfaction” surveys is that they may bear little relation to technical quality of care. Unfortunately, like so many other managed-care tools, little research has been done on whether the “satisfaction” survey tool actually works. As recently as 2003, Edlund et al. noted, “[G]iven the widespread use of satisfaction surveys, surprisingly little work has been done to investigate the relationship between subjective patient satisfaction and objective measure of quality of care.” (21) But after comparing a process-measure-based report card on mental health providers with the results of a risk-adjusted survey of those providers’ patients with addictions or mental disorders, Edlund et al. concluded that there is a modest correlation between the scores based on a well-designed survey and those derived from process measures. The crucial phrase here is “well-designed survey.” The survey used by Edlund et al. was unusually well adjusted for risk, and the bundled-product problem was greatly diminished by examining only patients needing mental health services. Very few, perhaps none, of the “consumer satisfaction” surveys published by plans, magazines, and entrepreneurs with Web sites are risk adjusted and limited to a single service, or to services provided by a single specialty. The reason for that is obvious: Accurate risk adjustment is expensive, and survey-based report cards would have to be far more numerous if they were restricted to a single service or specialty.
For three reasons, the Forum should not recommend PFP as a cost-containment method. First, assuming that PFP can improve quality, it is not clear that quality improvement always leads to cost reduction. Second, it is not at all clear that PFP can improve quality. Third, putting PFP on the front burner when more pressing quality and cost problems remain unsolved (managed care, the nursing shortage, and the absence of universal insurance are among them) does not make sense, but if it must be on any burner, then research should be done first on how accurate report cards on doctors can be and what they will cost in both dollars and lost privacy. PFP should not be implemented until these questions about PFP’s cost and accuracy have been answered. Inaccurate report cards are a disservice to both doctors and patients.
This paper has focused on only some of the impediments to and problems associated with PFP. For a discussion of other problems, we refer readers to the previously cited papers by Hofer et al and Landon et al., as well as a paper with the unequivocal title, “The toxicity of pay for performance.” (22)
The primary obstacle outcome-based report cards must overcome is the difficulty of accurate risk adjustment. Accurate risk adjustment is expensive and requires either time-consuming collection of patient consent or routine violation of patient privacy. But even sophisticated, expensive risk adjustment may be unable to eliminate the incentive for physicians to game the system – to overstate risk factors and to jettison sick patients.
The primary problem with process measures is that they require consensus on standards of care, and such consensus does not now exist for the great majority of medical services. Moreover, some process measures must be risk adjusted.
Finally, “consumer satisfaction” surveys are not a short cut to accurate report cards. They too must be risk adjusted. Moreover, the typical survey-based report card needs to be unbundled so that readers (including the doctors who presumably will attempt to alter their behavior upon reading the report card) can understand which type of service respondents had in mind when they answered the survey questions.
We agree with this statement by Landon et al.: “At the current time, given the state of technology and the existing infrastructure to support performance assessment, broad-based mandatory clinical performance assessment for individual physicians as a means of determining the competence of individuals physicians . . . appears to be infeasible.” (23) It appears now that only a few simple process measures can serve as the basis for PFP schemes, and that sufficiently accurate outcome and process measures for the vast majority of other medical services will probably never materialize, and those that do will be expensive to prepare.
Perhaps the most startling finding by the few investigators who have attempted to investigate the accuracy of report cards and the usefulness of PFP schemes is that variations in physician practice style account for a very small proportion of the variance. Hofer et al. reported that physicians accounted for only 3 percent of the variation in HbA1c levels, and they cited another study which found that “the practitioner accounted for a maximum of about 24 percent of the variance in a process-of-care score related to the management of digoxin and a minimum of 3 percent in process scores related to cancer screening.” (24) It is difficult to conceive of a more fundamental question than, Are these estimates correct, and are they representative of all medical services?
If more research confirms that physician practice style accounts for a small percentage of the variation in quality for most medical services, we could then state with confidence that it is irrational to focus on physician behavior as a means of improving quality when patient behavior and other factors outside physician control account for the vast majority of the variation in quality. Because managed-care plans now have considerable control over physicians and much less control over patients and the factors that affect patient care-seeking behavior (such as education and availability of child care), it is no doubt tempting to managed care officials and their allies in the health policy community to focus on physician behavior. But, if the physician effect on variation in quality is in fact small, the strategy of focusing on doctors may be compared to the strategy of the drunk who lost his keys, he knows not where, but he persists in restricting his search to a small area under a street light because that is where the light is good. Before the Forum spends any time debating PFP, the Forum should resolve first the question of whether physician practice style explains a substantial portion of the variation in report card scores.
Managed care advocates and health policy experts, some of whom sit on the Forum, endorse evidence-based medicine. MPPA likewise endorses evidence-based medicine. However, what is good for physicians is also good for health policy experts and managed-care advocates. MPPA endorses evidence-based health policy as well as evidence-based medicine. There is, at this date, no convincing evidence that PFP schemes will improve quality or reduce costs, and some convincing evidence that they will damage quality. We urge the Forum to adopt evidence-based health policy, and to refrain from endorsing PFP, either as a method of improving quality or of reducing cost.
(1) Paul M. Ellwood, Jr., “Health maintenance strategy,” Medical Care 19719:291-298. This paper is based on a paper Ellwood wrote in 1970 for the Nixon administration explaining his theory that pushing Americans into HMOs would reduce health care costs.
(2) Markian Hawryluk, “Medicare experiments with quality incentive programs,” American Medical News, November 4, 2002, 7.
(3) Integrated Health Care Association statement, http://www.iha.org, accessed December 17, 2003.
(4) Dan McLaughlin and Brian Campion, “Pay for performance,” Minnesota Physician, October 2003, 1; Douglas Hiza, “BCBSM launches provider incentive programs,” Minnesota Physician, October 2003, 11.
(5) For example, Paul R. Reich, MD, the medical director of Blue Cross and Blue Shield of Rhode Island recently wrote an article entitled “Pay for performance” in which he stated, “With the decline of capitation as a means of compensating doctors, ‘paying for performance’ has become a viable alternative.” Because capitation payments reduce utilization and have not been shown to improve quality, it is reasonable to infer from Dr. Reich’s statement that he defines “pay for performance” to mean “paying for reduced utilization.” Dr. Reich confirmed the accuracy of this inference later in the article when he said, “[I]f the goals [of the pay-for-performance program] comprise too many decreased-utilization targets, some may view the plan as asking physicians to reduce the amount of care provided to members to enrich themselves” (Paul R. Reich, “Paying for performance,” Managed Care Interface 2003;16:14). Clearly, Blue Cross and Blue Shield of Rhode Island defines “pay for performance” to mean both “pay for quality improvement” and “pay for utilization reduction.”
A recent article in Minnesota Physician offers another example of the misuse of phrases similar to “pay for performance.” Douglas Hiza, MD, the medical director for Blue Cross and Blue Shield of Minnesota (BCBSM), wrote that BCBSM “recently launched two new outcomes-based provider incentive programs,” one of which was dubbed “Recognizing Excellence.” He went on to say the purpose of these programs was “rewarding provider performance relative to proven clinical outcomes.” These phrases would lead anyone to think BCBSM perceives “pay for performance” to mean improving quality only. But Dr. Hiza then noted that the programs included “a goal of increasing appropriate use of generic drugs” (Douglas Hiza, “BCBSM launches provider incentive programs,” Minnesota Physician, October 2003, 11). Increasing the use of generic drugs is clearly a cost-containment goal, not a quality-improvement goal. There is no evidence that generic drugs are as a class more effective than brand-name drugs.
(6) “Governor announces citizens forum on health care,” press release, September 8, 2003, Office of Governor Tim Pawlenty, http://www.governor.state.mn.us/Tpaw_View_Article.asp?artid=543, accessed December 17, 2003.
(7) A search of Medline using the phrase “pay for performance” turned up 42 articles published since 1979. Half of these were published after 1997. None presented evidence that PFP, as defined in this paper, improved quality of care or reduced costs.
(8) Edward L. Hannan et al., “Improving the outcomes of coronary artery bypass surgery in New York State,” Journal of the American Medical Association 1994;271:761-766.
(9) Jinnet B. Fowles et al, “Taking health status into account when setting capitation rates: A comparison of risk-adjustment methods,” Journal of the American Medical Association, 1996;276:1316-1321, 1317.
(10) Timothy P. Hofer et al., “The unreliability of individual physician ‘report cards’ for assessing the costs and quality of care of a chronic disease,” Journal of the American Medical Association 1999;281:2098-2105, 2101.
(11) Yujing Shen, “Selection incentives in a performance-based contracting system,” Health Services Research 2003;38:535-552.
(12) Jesse Green and Neil Wintfeld, “Report cards on cardiac surgeons: Assessing New York State’s approach,” New England Journal of Medicine 1995;332:1229-1232.
(13) The study sample included 3,642 diabetic patients seen by 232 physicians. This averages out to 16 patients per physician. However, the actual number of diabetics seen by the 232 physicians was one-third higher because a third of these physicians’ diabetic patients were excluded from the sample, either because they declined a telephone request to participate in the study or because they failed to return a questionnaire used to risk-adjust the scores.
(14) Hofer et al. op cit., 2098.
(15) Ibid, 2104.
(16) Linda O. Prager, “Cleveland Clinic’s withdrawal dooms decade-old report card project,” American Medical News, April 12, 1999, 9.
(17) Franks et al. reported that patient income affects health care utilization even within a sample restricted to patients with private insurance. Lower-income patients were less likely to get Pap smears, mammograms, and diabetic eye exams, were less likely to make an office visit, and were more likely to be hospitalized and generate more expenditures for tests (Peter Franks et al., “Effects of patient and physician practice socioeconomic status on the health care of privately insured managed care patients,” Medical Care 2003;41:842-852).
(18) “In fact, several studies estimate that only 15 to 20 percent of medical practices can be justified on the basis of rigorous scientific data establishing their effectiveness. For most conditions, something other than rigorous data on efficacy or effectiveness must be used to determine criteria of appropriateness” (Paul G. Shekelle et al., “The reproducibility of a method to identify the overuse and underuse of medical procedures,” New England Journal of Medicine 1998;338:1888-1895, 1888).
(19) Bruce E. Landon et al., “Physician clinical performance assessment,” Journal of the American Medical Association 2003;290:1183-1189.
(20) See for example, R.A. Hoff et al., “Mental illness as a predictor of satisfaction with inpatient care at Veteran Affairs hospitals,” Psychiatric Services 1999;49:929-934.
(21) Mark J. Edlund et al., “Does satisfaction reflect the technical quality of mental health care?” Health Services Research 2003;38:631-645, 631,
(22) Donald M. Berwick, “The toxicity of pay for performance,” Quality Management in Health Care, 1995;4:27-33.
(23) Landon et al., op cit., 1188.
(24) Hofer et al., op cit. The study Hofer et al. cited was E. J. Orav et al., “Issues of variability and bias affecting multisite measurement of quality of care,” Medical Care 1996;34 (supplement 9):SS87-SS101.