U.S. Department of Justice Office of Justice Programs Bureau of Justice Statistics ************************************************************* Bureau of Justice Statistics Working Paper Series ************************************************************* Federal Sentencing Disparity: 2005–2012 ************************************************************* ************************************************************ ------------------------------------------------------------ This file is text only without graphics and many of the tables. A Zip archive of the tables in this report inspreadsheet format (.csv) and the full report includingtables and graphics in .pdf format are available on BJSwebsite at:http://www.bjs.gov/index.cfm?ty=pbdetail&iid=5432 ------------------------------------------------------------ ************************************************************** William Rhodes, Ph.D. Ryan Kling, M.A. Jeremy Luallen, Ph.D. Christina Dyous, M.A. Abt Associates, 55 Wheeler St, Cambridge, MA 02138, www.abtassociates.com ************************************************************* ************************************************************* WP-2015:01, October 22, 2015 The authors acknowledge the support of the Bureau of Justice Statistics, Award #2013-MU-CX-K057. ************************************************************** ************************************************************** Disclaimer: This paper is released to inform interested parties of research and to encourage discussion. The views expressed are those of the authors and not necessarily those of the Bureau of Justice Statistics or the U.S. Department of Justice. The authors accept responsibility for errors. ************************************************************** ************************************************************** ************ Abstract: ************ Federal Sentencing Disparity, 2005-2012, examines patterns of federal sentencing disparity among white and black offenders, by sentence received, and looks at judicial variation in sentencing since Booker vs. United States, regardless of race. It summarizes U.S. Sentencing Guidelines, discusses how approaches of other researchers to the study of sentencing practices differ from this approach, defines disparity as used in this study, and explains the methodology. This working paper was prepared by Abt Associates for BJS in response to a request by the Department of Justice’s Racial Disparities Working Group to design a study of federal sentencing disparity. Data are from BJS’s Federal Justice Statistics Program, which annually collects federal criminal justice processing data from various federal agencies. The analysis uses data mainly from the U.S. Sentencing Commission. ************************************************************ ****************** Table of Contents ****************** Table of Contents ***************** List of Figures ***************** List of Tables **************** *************** Introduction *************** 1.0 Federal Sentencing Guidelines 2.0 Recent Studies of Sentencing Disparity 3.0 Defining disparity 3.1 Disparity and the rule of law 3.2 How to define disparity post-Booker 3.3 Race is bundled with other factors 4.0 Statistical methodology: A random effect model 4.1 Estimation 4.2 Data and variables 5.0 Analysis and interpretation 5.1 Operational variables entering the analysis 5.2 Findings regarding sentencing disparity 5.2.1 Converting findings on disparity from standardized units to original units 5.2.2 Racial disparity across guideline cells 5.2.3 Racial disparity across judges 5.2.4 Increases in disparity: Variance about the guidelines 5.3 Evidence of prosecutorial discretion 5.3.1 Facts surrounding the case 5.3.2 Gaming drug amounts near mandatory minimums 6.0 Conclusions References Appendix A: Mechanics of guidelines Offense level Criminal history category Departures Appendix B: Detailed findings for sentencing disparity – U.S. citizens Data partition: Males, no weapons offenses, no sex offenders Data partition: Females, no weapons offenses, no sex offenders Data partition: Sex offenders Appendix C: Detailed findings for sentencing disparity: Non-U.S. citizens Data partition: Males, no weapons offenses, no sex offenders Data partition: Females, no weapons offenses, no sex offenders Data partition: Weapons offenders, no sex offenders Data partition: Sex offenders Appendix D: Detailed findings for prosecutorial discretion ***************** List of Figures ***************** Figure 1 – A causal model of how offense and offender facts affect the sentence Figure 2 – Increases in racial disparity over time for four partitions: Males convicted for non-weapons violations (overall significance p < 0.01) Figure 3 – Increases in racial disparity over time for four partitions: Males convicted of crimes involving weapons violations (overall significance p < 0.05) Figure 4 – Increases in racial disparity over time for four partitions: Alternative specification to figure 2 Figure 5 – Increases in racial disparity over time for four partitions: Alternative specification to figure 3 Figure 6 – Variation in racial sentencing disparity across judges for males convicted of non-weapons violations Figure 7 – Variation in racial sentencing disparity across judges for females convicted of nonweapons violations Figure 8 – Distribution of offenders within 100 grams of the 500-gram mandatory minimum threshold, by race and ethnicity Figure B.1 – Changes over time for three guideline cells: Males Figure B.2 – Predicted and actual sentences across maximum sentence in guideline cells: Males Figure B.3 – Differences in judge distributions: Males Figure B.4 – Changes over time for three guideline cells: Females Figure B.5 – Predicted and actual sentences across maximum sentence in guideline cells: Females Figure B.6 – Differences in judge distributions: Females Figure B.7 – Changes over time for three guideline cells: Weapons offenders Figure B.8 – Predicted and actual sentences across maximum sentence in guideline cells: Weapons offenders Figure B.9 – Differences in judge distributions: Weapons offenders Figure B.10 – Changes over time for three guideline cells: Sex offenders Figure B.11 – Predicted and actual sentences across maximum sentence in guideline cells: Sex offenders Figure B.12 – Differences in judge distributions: Sex offenders Figure C.1 – Changes over time for three guideline cells: Males Figure C.2 – Predicted and actual sentences across maximum sentence in guideline cells: Males Figure C.3 – Differences in judge distributions: Males Figure C.4 – Changes over time for three guideline cells: Females Figure C.5 – Predicted and actual sentences across maximum sentence in guideline cells: Females Figure C.6 – Differences in judge distributions: Females Figure C.7 – Changes over time for three guideline cells: Weapons offenders Figure C.8 – Predicted and actual sentences across maximum sentence in guideline cells: Weapons offenders Figure C.9 – Differences in judge distributions: Weapons offenders Figure C.10 – Changes over time for three guideline cells: Sex offenders Figure C.11 – Predicted and actual sentences across maximum sentence in guideline cells: Sex offenders Figure C.12 – Differences in judge distributions: Sex offenders Figure D.1 – Cocaine: 500g Threshold Figure D.2 – Cocaine: 5000g Threshold Federal Sentencing Disparity: 2005–2012 Figure D.3 – Crack, Pre-2010: 5g Threshold Figure D.4 – Crack, Pre-2010: 50g Threshold Figure D.5 – Crack, Post-2010: 28g Threshold Figure D.6 – Crack, Post-2010: 280g Threshold Figure D.7 – Heroin: 100g Threshold Figure D.8 – Heroin: 1000g Threshold Figure D.9 – Marijuana: 100,000g Threshold Figure D.10 – Marijuana: 1,000,000g Threshold Figure D.11 – Methamphetamine (Mixture): 50g Threshold Figure D.12 – Methamphetamine (Mixture): 500g Threshold Figure D.13 – Methamphetamine (Pure): 5g Threshold Figure D.14 – Methamphetamine (Pure): 50g Threshold *************** List of Tables **************** Table 1 – Regression results: Males, no substantial assistance, no weapons or drugs Table 2 – An estimated skedastic function based on residuals Table 3 – Trends in prosecutorial behavior Table 4 – Conditional differences (male and female) Table B.1 – Number of observations for each guideline cell - Citizens Table B.2 – Parameter estimates from mixed models: Males Table B.3 – Parameter estimates from mixed models: Females Table B.4 – Parameter estimates from mixed models: Weapons offenders Table B.5 – Parameter estimates from mixed models: Sex offenders Table C.1 – Number of observations for each guideline cell - Citizens Table C.2 – Parameter estimates from mixed models: Males Table C.3 – Parameter Estimates from mixed models: Females, no weapons offenses, no sex offenders Table C.4 – Parameter estimates from mixed models: Weapons offenders Table C.6 – Parameter estimates from mixed models: Sex offenders Table D.1 – Boundaries chosen around drug threshold amounts based on graphical inspection Table D.2 – Percent missing weights, by race and drug Table D.3 – For range check estimation: Conditional differences (males and females) Table D.4 – For logistic estimation: Conditional differences (males and females) Table D.5 – For male versus female estimation: Conditional differences (males only) Table D.6 – For Male versus female estimation: Conditional differences (females only) ************** Introduction ************** As part of a cooperative agreement for the Federal Justice Statistics Program (FJSP), the Bureau of Justice Statistics (BJS) instructed Abt Associates to design a study of federal sentencing disparity, as requested by the U.S. Department of Justice’s Racial Disparities Working Group. This report responds to those instructions by presenting a new methodology for studying sentencing disparity. Although this report is principally a discussion of methods, findings are also discussed. For the purposes of this study, the broad research question is-- * Do non-Hispanic African American or black (hereafter black) offenders receive prison terms that are longer on average than the prison terms received by non-Hispanic white (hereafter white) offenders, accounting for the apparent facts surrounding the crime and the offender’s criminal record? We call observed differences disparity, although based on the evidence we cannot attribute disparate decision-making to racial bias. The principal research question concerns the sentencing disparity between white and black offenders. However, using the above measure of disparity, the broad research question is divided into the following specific questions: 1. Did the degree of disparity change between 2005 and 2012? As explained later in this report, the years are important because a 2005 Supreme Court decision (Booker v. United States) (hereafter Booker) rendered the Federal Sentencing Guidelines advisory. 2. Did the degree of disparity change with the seriousness of the offense and the offender’s criminal history? 3. To what extent was the disparity systematic and to what extent was it specific to individual judges? 4. Between 2005 and 2012, a period in which the guidelines were advisory rather than mandatory, did judges increasingly disagree about the appropriate sentences for offenders? The first two questions pertain directly to patterns regarding the differences in sentences received by white and black offenders. The third and fourth questions pertain to judicial disagreement about sentences without regard to race. Several recent studies have examined how the Booker decision affected disparity by giving judges increased latitude to impose sentences. Our study, which uses post-Booker data exclusively, does not purport to examine the impact of Booker. ***Footnote 1 Program evaluators recognize that assessing the impact of Booker is an application of program evaluation, which is complicated and uncertain outside of randomized experimentation because causation is difficult to establish. At the least, a study of Booker’s impact would require the use of pre- and post-Booker sentencing data, but the study reported here uses post-Booker data only. Even if it included pre-Booker data, attributing chances to Booker would raise validity challenges. The methodology discussed in the study reported here does not deal with methods that might be used to overcome those validity challenges***. Data for this study come from the FJSP, sponsored by BJS, which annually assembles federal criminal justice processing data from various federal agencies. The analyses rests heavily on data from the U.S. Sentencing Commission (USSC) because those data are the richest source of offense and offender information, as the USSC is the principal source of data for sentencing. However, this study draws on other parts of the FJSP for judicial identity. The data used in this study and the study itself do not identify specific judges by name. ***************************************** 1.0 Federal Sentencing Guidelines ***************************************** Federal Sentencing Guidelines are a set of rules and policy statements for federal judges to use when imposing sentences. (See appendix A for more information.) At the time of sentencing, a judge considers the facts surrounding the case along with the offender’s criminal history and his or her cooperation with the government and then assigns the offender to a cell in a two-dimensional 43x6 sentencing grid. The grid’s vertical axis corresponds to the facts surrounding the case (e.g., brandishing a weapon during a bank robbery). The grid’s horizontal axis corresponds to the offender’s criminal history (e.g., the offender previously served a prison term in excess of 1 year). If the offender cooperated with the government, the sentencing judge can move the offender from one cell to another, according to prescribed rules. The guideline cell stipulates a recommended sentence based on the facts surrounding the case (e.g., the charge, use of a weapon, and amount of drugs involved), the offender’s criminal record, and the offender’s cooperation with the government. Some of the cells allow for probation sentences and some allow for a combination of probation and prison. All of the cells identify a lower and upper limit for any recommended prison term, each with no more than a 25% difference between the lower and upper limit (excluding cells recommending the shortest sentences). When promulgated in 1987, the guidelines were mandatory and judges were expected to sentence within the lower and upper limits, although they could depart from the guidelines with written justification subject to appellate review. Since 2005, the guidelines have been advisory and the scope of appellate review is limited. Although our study examines the current application of the guidelines, a historical perspective is helpful for defining current: * Mandatory guidelines went into effect for most criminal cases in 1987. The guidelines have been revised at the USSC’s discretion, subject to Senate approval. * In 1996, Koon v. United States (hereafter Koon) clarified the role of appellate court review. Deference was paid to fact findings at the district court level; i.e., an appellate court had to accept the facts determined by the sentencing judge. This meant that review was limited to mechanical errors in applying the guidelines and the legitimacy of reasons for departure. * In 2003, Congress passed the PROTECT Act (hereafter PROTECT), which required justification for departures, thereby reducing judicial latitude to depart from the guidelines. In exchange for cooperation with the government, the Commission strengthened the guidelines consistent with PROTECT and formalized some provisions for reducing sentences. Congress specified that higher court review would be de novo, meaning that circuit courts no longer had to defer to lower court findings of fact. As a result, Koon was nullified. * In 2005 Booker v. United States (hereafter Booker), the Supreme Court ruled that the guidelines were advisory rather than mandatory and reestablished the level of deference to findings of fact consistent with Koon. The PROTECT Act retained the features that reward cooperation with the government. * In 2007, the Supreme Court ruled in Gall v. United States (hereafter Gall) that the federal appeals courts may not presume that a sentence falling outside the range recommended by the guidelines is unreasonable. This decision strengthened the authority of district court judges to depart from the guidelines. The USSC identifies four periods in the evolution of the guidelines (Commission, 2012). Ignoring pre-Koon, the periods are Koon to PROTECT, PROTECT to Booker, Booker to Gall, and post-Gall. The Commission’s report shows how disparity has changed over these four periods. However, BJS is concerned with disparity under current sentencing laws. Our analysis is focused on post-Booker sentencing, meaning that we examine sentences imposed during the last two periods. In Gall, the Supreme Court specified the procedure for post- Booker sentencing. Although the guidelines are advisory, a sentencing judge must compute and consider the guideline range and the Commission’s policy statements. Thus, although the guidelines are currently advisory, they are not irrelevant. An empirical study can still treat the elements entering into guideline computations (as reported by the Commission) as representing the facts surrounding the case and the offender’s criminal history as established by a preponderance of the evidence (i.e., the evidentiary standard for application of the guidelines).***Footnote 2 2 Chapter 6 § 6A1.3 specifies the evidentiary rules: “In resolving any dispute regarding a factor important to the sentencing determination, the court may consider relevant information without regard to its admissibility under the rules of evidence applicable at trial, provided the information has sufficient indicia of reliability to support its probable accuracy”***. For our purposes, this means that we can consider the guideline cell as the starting point for studying disparity under the guidelines. This is extremely important because otherwise we would be unable to distinguish between variation in sentences that are attributed to facts surrounding the case or criminal record and systematic unwarranted variation. The judge must consider the factors set forth in 18 U.S.C. § 3553(a) taken as a whole. *** Footnote 3 18 U.S.C. § 3553(a) states the purposes of sentencing, states the role of the guidelines when imposing a sentence, and provides justification for sentencing below mandatory minimums and for rewarding offender assistance to the government***. There are disagreements in circuits and among legal scholars regarding when courts may disregard commission policy--and even congressional policy--and the permissible grounds for doing so have not been resolved. Further, the courts are divided on two important questions: “How much weight should be given to guidelines resulting from congressional directives to the Commission?” and “What is the appropriate interaction between the proscriptions and limitations on consideration of offender characteristics in section 994 of Title 28 and the courts’ consideration of offender characteristics in section 3553(a)?” ***Footnote 4 28 U.S. Code § 994 prescribes duties of the USSC.***.Booker has given judges substantial discretion, reinforced by Gall, to impose sentences using subjective decisions about the adequacy of the sentences recommended by the Federal Sentencing Guidelines. This observation is important because it provides motivation for studying how that discretion is being exercised and whether sentencing disparity is associated with that judicial discretion. A principal difficulty when studying disparity is that the facts surrounding the case cannot be known with certainty. Assistant U.S. Attorneys and defense council may manipulate facts to bind the judge or to avoid mandatory minimum sentences (Commission, 2011). Even when the facts surrounding the case accurately reflect offense behavior and consequences, the judge may observe additional facts (relevant for sentencing) that do not appear in the guidelines and, as a result, do not appear in our data. Fact manipulation and data limitations raise difficult problems with interpretation, which are addressed later in this report. (See section 5.3.) ******************************************* 2.0 Recent Studies of Sentencing Disparity ******************************************* Many researchers have examined disparity under the Federal Sentencing Guidelines, but fewer researchers have focused their attention on the post-Booker era. We limit this review to selected studies that examine post-Booker federal sentencing. All analyses of sentencing disparity are predicated on a normative position that similarly situated offenders who have been convicted of similar crimes should receive similar sentences. The exact meaning of this normative position is debatable, but it seems as though most people would agree that black and white offenders, convicted of the same crime under the same conditions, should receive equivalent sentences. Researchers examining the post-Booker era have taken two approaches to testing the null hypothesis of sentencing equality. One approach has been to start with the facts surrounding the case as relevant to application of the Federal Sentencing Guidelines and to determine whether whites and blacks receive comparable sentences. Several studies (Motivans & Snyder, 2009; Ulmer, Light, & Kramer, 2011; Commission, 2012) follow this approach. An alternative approach is to start with the facts surrounding the case at the time of prosecution, with the assertion that prosecutors manipulate facts before they are considered for guideline application and determine whether offenders accused of the same crime receive the same treatment. Other works (Starr & Rehavi, 2013; Rehavi & Starr, 2013; Fishman & Schanzenback, 2012; Yang, 2014) are consistent with this alternative approach. These two lines of inquiry answer different research questions, although both are framed as studies of sentencing disparity. This section summarizes these studies and compares the current study’s approach with extant studies. Consistent with its role in the federal justice system, the USSC frequently studies federal sentencing patterns. Its Report on the Continuing Impact of United States v. Booker on Federal Sentencing (2012) is a comprehensive assessment of how the Booker decision affected the application of federal sentences. Much of that assessment is tabulation and graphical representation; the descriptive nature of the analysis appropriately tells a story suitable for the Commission’s audience. Part of the Commission’s assessment also includes an empirical analysis that is multivariate and inferential, and is similar to the methods presented in our study. To assess racial disparity at the time of sentencing, the Commission used an ordinary least squares (OLS) regression model, with the length of the prison term as the dependent variable, the minimum recommended sentence range as the principal covariates, and race and sex as multiplicative factors. ***Footnote 5 The Commission used OLS to regress the logarithm of time served on the logarithm of the minimum sentence and a linear combination of variables, including race. This is equivalent to assuming that the race variable has a multiplicative effect on the sentence imposed. The specification is cumbersome because sentences and guideline minimums are frequently zero and the logarithm of zero is undefined. The Commission set the logarithm of zero to a small positive number***. The Commission concluded that “…unwarranted disparities in federal sentencing appear to be increasing” (Commission, 2012, p. 3). Summarizing its findings: “The Commission’s updated multivariate regression analysis showed, among other outcomes, that black male offenders have continued to receive longer sentences than similarly situated white male offenders.... In addition, female offenders have received shorter sentences than similarly situated male offenders.” (Commission, 2012, p. 9) As with the analysis reported in our study, the Commission used the recommended sentence in the Federal Sentencing Guidelines as the starting point for an analysis, asking how sentences for blacks differed systematically from sentences for whites. Regarding salient differences between our study and the Commission’s study, the Commission used a pre- and post-Booker selection of data, given its concern with the impact of Booker, while our analysis is concerned only with post-Booker trends. The Commission used what we view as strong assumptions about the underlying sentencing decisions of the structural model, while we have used a structural model that is more flexible. The Commission used an OLS regression model; our approach uses a linear random effects regression model. Our principal analysis excludes noncitizens while the Commission included noncitizens, and our analysis makes a somewhat different selection of offenses than was made by the Commission. (This report provides a separate analysis of the sentencing of noncitizens.) Motivans and Snyder (2009) analyzed USSC data for fiscal years 1994 through 2008. Their results show that blacks receive mean prison terms that are, in general, longer than whites, and are longer than whites after adjusting for offense seriousness and criminal history, both together and separately. Much of that disparity disappeared once a regression was used to control for departure type, ***Footnote 6 There are many reasons for departures. The most important reasons when studying sentencing disparity are departures attributable to the initiative of the assistant U.S. Attorney and departures attributable to judicial sentencing discretion***. offense type, and whether there was a weapons charge. Motivans and Snyder (2009) examined sentences for whites and blacks within each guideline cell and then averaged over guideline cells. Their approach to estimating differences within guideline cells and then summarizing over the cells is in the spirit of the approach taken in our study. However, we adopted a regression model that provides a systematic summary of variation in disparity across the cells and over time that often uses stratification instead of covariates and leads to standard statistical testing. Ulmer, Light, and Kramer (2011) wrote another study as a critique of a 2010 commission report (the predecessor of the report cited above). They examined a period that started well before 2005 through fiscal year 2009. Their findings proved sensitive to the short post-Booker period (Commission, 2012, pp. 11, part E); based on a reanalysis reported by the Commission (Commission, 2012, pp. 11, part E), revised Ulmer, Light, and Kramer findings for the post-Booker period are similar to those reported in the Commission’s study. An anonymous reviewer of an earlier draft of this BJS study reports replicating the Ulmer, Light, and Kramer approach using data extended through 2012. The reviewer found that disparity increases initially and then stops increasing. This BJS study will report similar findings. Ulmer, Light, and Kramer made decisions about excluding some cases that are consistent with our decisions, and they made decisions about including or not including covariates that are inconsistent with our decisions. They used an estimation procedure (a two-step estimator) with which we disagree, ***Footnote 7 The approach to two-step models is complicated (Rhodes, 2014). We do not object to estimating the first-stage equation of whether the judge imposes a prison term, which is a principal part of the Ulmer, Light, and Kramer study. However, consistency of the parameters in the second-stage equation depends on strong assumptions regarding independence of the first-stage and second-stage decision or else instrumental variables. When independence does not hold, estimated parameters will be biased estimates of their population counterparts. Ulmer, Light, and Kramer carefully attempted to counter these problems, but there is no good solution. An alternative approach is to use a generalized linear model to estimate the conditional mean instead of underlying parameters***. but we still find their results informative and credible. Starr and Rehavi (2013) are critical of the above tradition that treats the guideline recommendations as the starting point for analysis; while they provide a methodological critique, their harshest criticism is that the above researchers are asking the wrong research question. ***Footnote 8 We are concerned with the Starr and Rehavi methodology, much of which is described in a second paper (Rehavi & Starr, 2012). We are not convinced that the initial charge is a good starting point for a disparity study. Our investigation shows that the charges registered by the U.S. Marshals Service are very broad and not good descriptions of underlying offense conduct. Also see Rehavi and Starr (2013)***. Ulmer, Light, and Kramer were concerned with disparity, conditional on the facts surrounding the case as determined at the time of guideline administration. Starr and Rehavi opine that the correct concern is with disparity, conditional on the original offense. They dismiss answers to the first question because apparent disparity at the sentencing stage may merely reflect charging and bargaining decisions by prosecutors. Another recent study (Johnson, 2014) expands on this line of inquiry by examining racial disparities and prosecutorial decision-making in the context of a federal prosecutor’s decision to decline prosecution and his or her decision to prosecute an offender under a lesser charge than the arresting charge. Johnson’s work uses both fixed and random effects (logit) estimation to study a cohort of federal arrestees from 2003 to 2005, and finds at least some evidence to support the argument that racial disparities exist in prosecutorial decision--making--although disparities tend to favor blacks, not whites. It is difficult to conclude from this study whether Johnson’s findings ultimately support or refute the argument by Starr and Rehavi that disparate sentences are a reaction by judges to bargaining decisions by prosecutors. On the one hand, Johnson demonstrates that charge reductions result in materially lower sentence lengths, but this decrease is not as large as it should have been given the decrease in the presumptive sentence that results from moving to a new guideline cell (Johnson, 2014, p. 74, table 9). On the other hand, Johnson shows that, even after controlling for the presence of charge reduction and the associated presumptive sentence, blacks still receive significantly longer sentences (Johnson, 2014, p. 74, table 9). It is difficult to conclude how behavior ultimately affects sentencing disparities on average. However, Johnson’s work emphasizes the importance of considering prosecutorial practices in studying sentencing disparities. Fishman and Schanzenbach’s (2012) study is in the spirit of Starr and Rehavi’s (2013) study. ***Footnote 9 Fishman and Schanzenbach introduce the terms endogenous and exogenous, although not necessarily in a way that we find helpful. Their argument might be summarized to say that prosecutors make charging and bargaining decisions that are endogenous because they take judicial responses into account. Nevertheless, we consider the charging and bargaining outcomes as being exogenous to the judge’s decision***. It is straightforward, examining whether transitions from more to less restrictive guidelines (as a result of Supreme Court decisions) caused disparity to increase or decrease. They find that court decisions sometimes greatly alter the administration of justice (especially departures and sentencing at the statutory minimum), but our interpretation of their work is that the alteration in the administration of justice did not have a large impact on racial disparity. Yang (2013) takes a similar approach of basing inferences on short-term changes. After accounting for the exercise of prosecutorial discretion with regard to charging decision, Yang (2013, p. 2) finds evidence of a 4% increase in racial sentencing disparity post-Booker. Our study reports an increase of the same magnitude. ***Footnote 10 Although Yang’s findings are similar to ours, there are large differences in methodology. Yang uses a regression model that imposes structural restrictions that are much more restrictive than those adopted for our study; includes noncitizens in the analysis while, for reasons explained later, we limit our analysis to citizens; and attempts to control for the exercise of prosecutorial discretion, while we examine the exercise of discretion but do not build it into our statistical model***. Our view is that both lines of inquiry pose valid research questions. We agree with Starr and Rehavi that it may be impossible to fully discount an explanation that other unobserved variables--perhaps variables that can be attributed to prosecutorial discretion--account for estimated differences in the sentences for white and black offenders. However, if we observe trends toward increasing or decreasing disparity during a period where those unobserved factors are presumably or demonstrably constant, then those trends provide strong evidence of disparity attributable to race. ***Footnote 11 Although they are critical of the traditional sentencing guidelines, Starr and Reventi (2013, p. 40) recognize the advantage of studies based on changes***. We can further strengthen the inferences by checking trends in other observed variables. Finding that there are no strong trends in observed variables, it seems reasonable to conclude that there are no strong trends in unobserved variables. Trend analysis plays an important role in the inferences drawn in our study. Figure 1 summarizes the arguments. Presumably there is a true set of facts regarding the offender and the offense, although offenders attempt to hide their criminal behaviors; as a result, the facts may be known imperfectly. The facts are filtered by an assistant U.S. Attorney, who decides what to charge and attempts to prove what he or she charges, bargains with defendants and defense attorneys, and ultimately presents his or her set of facts to the judge. A probation officer investigates the offense conduct according to police reports (e.g., Federal Bureau of Investigation and Drug Enforcement Administration), learns about the offender’s criminal history, and studies the offender’s background (e.g., employment, and residential and marital status), and prepares and submits a presentence investigation report to the judge. Given the convicting charges and the facts surrounding the case, the judge imposes a sentence. *********************************************** Figure 1 – A causal model of how offense and offender facts affect the sentence *********************************************** The first set of studies examines the causal path that runs from “conviction charges” and “apparent facts” through the judge to the sentence. Conditional on the charge and facts that result from prosecutorial discretion, the judge imposes the sentence, and one can say that sentencing is disparate, conditional on those prosecutor-mediated facts. The second set of studies examines the causal path that runs from the facts through the sentence. Discretion exercised by prosecutors and judges is cumulative, so the disparity is jointly attributable to the prosecutor and judge. Our study shows that the first branch of the causal path (which involves the prosecutor) has changed little during the period of our study; that is, prosecutorial charging and bargaining practices have remained constant over the study period, or when practices have changed, the change has been the same for white and black offenders. In contrast, the second path (which involves the judge) has changed considerably; that is, judges have changed their behaviors conditional on the facts presented to them. While prosecutorial behaviors have remained fairly constant, racial disparity has increased. Although these trends toward greater disparity postdate Booker, we cannot say that Booker caused them and we cannot say what would happen if Booker were reversed. Another line of enquiry has investigated inter-judge sentencing disparity post-Booker. This work has been limited because the USSC has not provided guidelines data matched with judge identifiers. As a result, studying inter-judge disparity on a wide scale using data that account for extensive variation in offense seriousness and offender criminal records has been restricted. Except for studies done at the Commission (Commission, 2012) and a study by Yang (2013) discussed below, researchers have had to work with data that have limited geographic (Scott, 2010) or offense (Mason & Bjerk, 2013) coverage, or with datasets that have limited detail about offenses and offenders (Hofer, 2012; Yang, 2014), or the studies have examined variation across districts rather than judges (Lynch & Omori, 2014). In contrast, the data used in this study have judge identifiers for all felonies and serious misdemeanors sentenced in federal courts between 2005 and 2012. Having similarly matched judges to guideline cases, Yang (2014) reports that judges who are appointed to the bench post-Booker are more likely to depart from the guidelines, but we could not replicate those findings.12 Using the same data and applying a simple analysis of variance technique, Yang (2013) shows that inter-judge disparity has increased from pre-Booker to post- Gall. ***Footonote 13 Yang assumed that cases are randomly assigned to judges within districts, an assumption that seems reasonable given the rules used by most districts to assign cases and an assumption that passes diagnostic testing for that analysis. Our analysis is based on a similar assumption, but we control for offense and offender characteristics. As a result, our findings are less sensitive to whether or not assignment is random***. As explained in the methodology section (section 4.0), we examine systematic differences across judges in the imposition of sentences for white and black offenders. Others have reported intra-judge disparity; applying more formality to the problem, similar to an approach used by others (Anderson & Spohn, 2010), our statistical model uses a random effect regression to estimate intra-judge disparity. Intra-judge disparity--as estimated using random effects--is less easily excused as disparate prosecutorial decision making or by unobserved factors relevant to sentencing because judges essentially receive a random selection of cases for sentencing. We consider the estimation of random effects as a methodological advance. Finally, our study differs from other studies by examining a longer period of post-Booker sentencing. Extant studies discussed previously have examined a shorter period. Our study may reveal trends that were obscured by a shorter period. ************************ 3.0 Defining disparity ************************ There is no universally accepted definition of sentencing disparity. We propose a working definition to support empirical analysis. The working definition is necessarily abstract and readers who are less concerned with methodology may wish to skip sections 3.0 and 4.0 after reviewing the following summary. This section presents an argument for estimating the following: * How blacks are disadvantaged compared to whites at the time of sentencing. * How that disadvantage varies with offense seriousness and criminal record. * How that disadvantage has varied over time. * How the dispersion of sentences in general has varied over time. We specify an operational model where the effect of race can be divided into four components: a basic race effect (first bullet), a race effect interacted with the guideline cell (second bullet), a race effect interacted with time (third bullet), and a skedastic function (dispersion about the regression line) as a function of time (fourth bullet). ********************************** 3.1 Disparity and the rule of law ********************************** In the abstract, under the rule of law, offenders should know what will happen to them if they are convicted of a crime, and offenders convicted of similar crimes under similar circumstances should expect to receive similar sentences. In the abstract, then, there should be a function where a sentence S follows from the facts surrounding case A, the offender’s criminal history B, the government in exchange for cooperation C: [1] S = f(A,B,C) Throughout this report, we measure the sentence as the length of time sentenced to prison.***Footnote 14 For less serious crimes, we could examine the decision to impose a prison term. Most federal crimes for which the guidelines are relevant result in prison terms. As a result, the decision whether to sentence to prison is of secondary importance. The analysis of this binary outcome poses a few new analytic problems that are not considered when studying the prison sentence. Therefore, this design report does not consider the analysis of binary outcomes***. The function represents a normative standard, and any departure from this standard is called disparity. In the abstract, a study of disparity is straightforward: Given equation [1], a researcher simply needs to measure the extent to which sentences depart from this standard. An immediate problem is determining the standard to which a researcher can identify and measure disparity. There are three issues. First, what are the specific components of A, B, and C? Second, how should those components be weighted by importance? Third, how should the weighted components be combined to determine a sentence? Congress has set broad parameters on the imposition of sentences in the form of statutory minimums and maximums. Presumably, sentences that fall outside these parameters are disparate, but that standard is not especially helpful because the parameters are wide and few sentences are imposed illegally outside these bounds. Within these broad parameters, Congress has delegated to the USSC, subject to Congressional veto, the power to determine the standard. The guidelines identify the elements of A, B, and C; specify how those elements should be weighted; and instruct how those weighted elements should be combined to impose a sentence. Essentially, the guidelines provide equation [1], subject to major caveats. One caveat is that the guidelines were never an exact formula for imposing a sentence. The guidelines always gave judges latitude to sentence within a 25% range and always allowed judges to depart from the range when warranted. The justification is that while the guidelines should apply to most cases, the sentencing judge may uncover exceptional cases that require less severe or more severe punishment. Although the guidelines always allowed judicial latitude, the Commission stipulated factors that could not be considered-- such as race or sex--when departing from the guidelines. Prior to Booker, disparity might be defined as sentences that were explained by forbidden factors (e.g., race) or that departed from the guidelines without acceptable justification. Prior to Booker, sentencing disparity was conceptually simple to define. Post-Booker, existence of a standard is arguable and disparity is more difficult to define. The guidelines are advisory but, under Booker, they are still important as a standard that a judge must consult prior to imposing a sentence. The problem is that a sentence departing from the advisory guidelines cannot be identified as disparate because Booker and subsequent court decisions have ceded judges authority to impose sentences they see as appropriate given the purposes of sentencing. This means that after giving due consideration to the guidelines, an individual judge can make his or her own determination of equation [1]--the elements that are relevant for sentencing, how they should be weighted, and how they should be combined. One might even conclude that post-Booker, there is no meaningful standard. ****************************************** 3.2 How to define disparity post-Booker ****************************************** How, then, should a researcher define and estimate sentencing disparity? The answer requires expanding the vocabulary used to describe sentencing disparity. A researcher can always observe how the sentence imposed differs from the sentence recommended by the guidelines. A difference cannot be declared to be disparate given that judges have discretion to depart from the guidelines. Nevertheless, some patterns in those differences are suggestive of disparity. To explain, this section identifies a model of how sentences are imposed and explains what that model tells or suggests about disparity. We start with a relatively simple model and progressively incorporate complexity. We start with a model of how sentences are imposed within a given guideline cell. Rewrite equation [1] as equation [2]. The subscripts identify the ith offender sentenced by the jth judge in the kth guideline cell. Theijke represents the difference between the average sentences imposed within guideline cell k and the sentence actually imposed: [2] ijkkijkeSS+= kSis the average sentence imposed for offenders sentenced within the kth guideline cell. A researcher would be concerned with observed patterns of e. We consider a guideline cell to be an anchor; specifically, we consider the mean sentence within the guideline cell as a standard. It is possible that judges, as a collective, have common rules for departing from the guidelines. One way to account for a common rule is to introduce additional explanatory variables: [3] ijkkijkkijkeXSS++=? X is a row vector of variables associated with the ith offender sentenced by the jth judge in the kth cell. In this relatively simple model, the weight assigned to these additional factors (the column vector ?) is the weighted average over judges. (Weights are proportional to the number of offenders sentenced per judge.) To the extent that these variables explain some of the residual variance, sentencing decisions will be uniform (but different from the guidelines) and the distribution of e will be smaller. Provided it excludes clearly inappropriate considerations, such as race, thekijkX?term is not disparity; rather, it represents a judicial consensus of how the average sentence should vary systematically within a guideline cell. Many A, B, and C variables exist, and it is impractical and arguably unnecessary to include all of the variables in a statistical model. ***Footnote 15 Some researchers have followed a tradition of introducing a presumably comprehensive set of explanatory variables into a regression. A problem with this approach is that it altogether ignores the structure imposed by the guidelines and replaces that known structure with a statistical search for the correct model. This is a daunting and uncertain exercise that we forego by anchoring our analysis on the guideline cell and then looking for systematic departures from that anchor***. Because the guidelines already factor most X ariables into the identification of the guideline cell (e.g., the seriousness of the offense and the danger posed by the offender), there is limited variation with which to base inferences. ***Footnote 16 For example, for property crimes the dollar value of the loss is a primary determinate of offense seriousness. But within a guideline cell, there is likely to be little variation in dollar loss. A regression (the statistical procedure used to estimate the parameters in equation [3]) will be uninformative when the independent variable (the X in the equation) has small variation***. Consequently, this study will make limited attempts to add X variables to the model, only incorporating them into the analysis when they are obviously important. Nevertheless, we know from reviewers’ comments on an earlier draft that the decision to include some X variables and exclude others is contentious, and we return to the issue in the next subsection. So far, the discussion has considered residual variance within a guideline cell as variance that is unexplained by the legitimate factors (including knowledge of the applicable guideline cell) incorporated into the estimation. Unexplained residual variance is not the same as disparity, but it is probative. Expanding the model specification goes more directly to the issue: [4] ijkkijkkijkijkeZXSS+++=?? The Z represents factors that Congress and the Commission recognize as inappropriate considerations during sentencing (see appendix A). For our purposes, the important Z variable is race. Again, currently there is disagreement about the legal standing of the Commission’s policy statements, but we doubt that many readers would argue race--the principal focus of this analysis- -is ever a valid consideration at the time of sentencing. Given the model represented by equation [4], interesting aspects of the problem are-- * Does the?in equation [4] differ from zero? The question is whether there is systematic difference across offenders in sentences with regard to race. * In equation [4], after accounting for systematic differences in X (common rules used by judges for departing from the guidelines) and Z (race and other factors that should not be considered during sentencing), what is the residual variation in e (the average difference between the average sentence imposed within the guideline cell and the sentence actually imposed)? * Has the residual variation in e changed over time? An extension to [4] almost completes the modeling. Although the ? and ? parameters vary across the guideline cells, as written in equation [4], the parameters are otherwise fixed as researchers frequently use that term with hierarchical linear models. An extension is to write the parameters as being random, so that [4] is written as [5]: [5] ijkijkijkkijkijkeZXSS+++=?? Note that [4] and [5] are the same, except for the new subscripts that appear on the ? parameter. The appearance of the j subscripts allows for the weight that each individual judge assigns to the variable Z to vary. Some additional structure is required to understand this. Presuming for simplicity that Z is a single variable (a dummy variable representing the condition that the offender is black), we might express ? as a function of a new vector of variables W. ***Footnote 17Model specifications need to provide additional variables for whites and other races entering the analysis. For simplicity, we do not show those additional races***. [6] jWkijkkijkuW++=??? Here ijk?is the effect that being black has on the sentence within the kth cell. The ij Ware interesting variables that explain systematic patterns in the effects of race (being black) on sentences within the kth cell. This formulation is often called a hierarchical linear model or a random effect model. It decomposes the effect of race into three parts. Parameter k? captures the direct effect that race has on the sentence imposed. For example, this parameter might cause us to infer that, on average, black offenders receive longer prison terms than do white offenders, taking other factors into account. Paramete rWk? captures the effect of interacting race with some other variable W. For example, if the other variable (W) is the year when the sentence was imposed, this parameter might cause us to infer that, over time, black offenders receive prison terms that are increasingly longer than the terms received by white offenders, when taking other factors into account. This is helpful for studying trends. The first two decomposition effects are called fixed effects and a third is called a random effect. The random effect uj is attributable to the specific judge imposing a sentence. Statisticians might say that judge is nested within race, but regardless of the wording, the point is that judges have different reactions to an offender’s race. ***Footnote 18 One reviewer of an earlier draft of this report observed that researchers using hierarchical linear models typically follow our approach of identifying a variable, such as race, whose effect is allowed to vary with time, while other variables are presumed to have fixed effects. The reviewer’s objection is that this is a specification error. We grant the validity of this point but assert that justification for using this potentially misspecified model comes from the advantage of simplifying a model that otherwise would become so complex as to be uninterpretable***. A final question concerns what to include as X, Z, and W variables. The question has no simple answer and we know from reviewers’ comments that the answer is contentious. We explain our approach using the concept of bundling, explained in the next subsection. **************************************** 3.3 Race is bundled with other factors **************************************** When studying racial disparity in sentencing, it is necessary to define race and racial disparity. The definition may be comparatively simple for a geneticist, who might observe that whites and blacks fundamentally share a genetic pool with a few differences that account for skin color; predisposition toward certain diseases, such as sickle cell anemia; and other factors. However, when used in the context of social responses to race, including criminal sentencing, race seems to mean something other than genetic variation. Numerous studies show that blacks have been sociologically and economically disadvantaged; as a racial group, they have less education, lower earnings, more serious criminal records, and other factors distinguishable, on average, from whites. The authors of this report think of genetically defined race as being bundled with these social and economic factors. When studying racial disparities in sentencing, we must make a decision: Should we seek to estimate a pure genetic race effect by controlling for the bundled factors? Or should we treat those bundled factors as inseparable from race and not account for them in the analysis? Or should we account for some of the bundled factors and ignore others? Reviewers of an earlier draft of this report had differing opinions. We have a preferred approach, although we also attempted to accommodate different opinions. Overall, we prefer to study the effect of race as a bundle of factors, so that our analysis has a race variable but no control variables for education, employment, and other factors associated with race. With exceptions, identified and justified below, the only control variables are those recognized by the guidelines. ***Footnote 19 One can argue that the Federal Sentencing Guidelines have a built-in, but not necessarily intentional, racial bias. The recently changed crack cocaine guidelines illustrate this built-in bias. As another example, blacks may acquire lengthier criminal histories than whites for reasons that have nothing to do with inherent criminality, and because the guidelines take criminal record into account, blacks may be disadvantaged. This study, which takes the guidelines as a normative standard, is not a study of built-in racial disparity***. One exception is that we include a covariate that captures nuances in criminal records. To explain, the guidelines already identify criminal history categories, derived from collapsing more detailed criminal history scores. Because judges potentially see the criminal history scores, because the differences between these scores and the criminal history categories may be considered at the time of sentencing, and because criminal history is generally considered an appropriate consideration at sentencing, we included a transformation of the criminal history scores as covariates. The transformation is explained in section 5.1. In addition to criminal history scores, our preferred model controls for the judicial circuit in the regression specification. Empirically, circuits that tend to sentence both white and black offenders harshly, compared to other circuits, tend to have a higher proportion of black offenders. Circuits that tend to sentence both white and black offenders leniently, compared to other circuits, tend to have a lower proportion of black offenders. Even if blacks suffered no racial disadvantage within every circuit, black offenders would appear to be disadvantaged at the national level if the circuit were not taken into account. In the case of circuits, we have unbundled circuit as a race attribute. This approach is arguable, so we have made accommodations in the form of sensitivity testing. While we focus attention on the regression specification with circuit dummy variables, we also report findings from a regression that lacks circuits as a control variable and from a regression that substitutes districts for circuits. ***Footnote 20 One review noted that we should use districts to remove regional variations, and while the reviewer’s arguments are persuasive, the use of districts instead of circuits does not materially change results. We found that some models could not be estimated when district was substituted for circuit***. Thus, our preferred model includes a transformed version of criminal history score and circuits as the X variables, but at the request of reviewers, we have added education and employment to the model to see if these factors account for some of the racial effect. We report findings as a sensitivity test of the preferred model. Despite reviewer recommendations, we do not include covariates that may differentiate offenders within guideline cells. For example, the guidelines may place some property offenders and some drug-law violators into the same guideline cell. Despite the fact that these two types of offenders fall into the same guideline cell, judges may treat drug offenders more severely than property offenders. If drug offenders are more likely to be black, then failing to distinguish between property offenders and drug offenders within the same guideline cell may mistakenly associate valid reasons for sentence differences to race. From this perspective, the analysis should account for within-guideline differences in crimes. While giving credit to this argument, we find it difficult to deal with the problem. The differences within guideline cells are idiosyncratic and accounting for them would require complicated statistical analysis that might obscure more than it explains.***Footnote 21 Reviewers have suggested adding dummy variables to account for offense type, but on reflection, this simple solution loses its appeal. Depending on the refinement of the offense type variable, certain offenses will appear in a limited number of guideline cells, where they will be proxies for those cells. The parameters associated with those dummy variables will not have their usual interpretation as shift parameters. The use of offense variables may introduce specification errors that actually mask race effects***. Furthermore, unless the intra-cell differences uniformly favor whites (or blacks) across all of the cells, the existence of those intra-cell differences will not bias estimates of black– white disparity across the cells. Because we have no reason to presume that the guideline cells were constructed systematically to disadvantage blacks, we prefer to treat the intra-cell differences as essentially random across cells. We recognize that this argument will not satisfy all readers.***Footnote 22 Furthermore, considering the bundling argument, the question about intra-cell variation may be unanswerable. In the property crime and drug crime example, one could plausibly argue that within a guideline cell, judges sentence drug offenders more severely because they are predominately black, while property offenders are predominately white***. Although we adopt a simple model with few X variables, based on reading and discussions with others, we felt that the structure of equation [5] may differ across settings, an expectation confirmed by statistical testing. Evidence shows that females are sentenced less severely than are males. Some analysts might deal with this difference by using a dummy variable in a single regression to distinguish males from females, but we disagree with that approach. Adding a sex variable as a linear additive effect will not account for the fact that the ? and ? parameters differ for males and females. (The truth of this assertion is demonstrated when discussing empirical results.) Instead of using a dummy variable to distinguish sex, we partition the data into 14 strata (see section 4.1) and estimate separate regressions for each partition. The ? and ? parameters differ demonstrably across these partitions. As a result, we cannot simply add dummy variables as control variables.***Footnote 23 It is possible to specify a model that has interactions between the partitions and the ? and ? parameters, but that is essentially the same as estimating separate models for each partition***. The disadvantage of this approach is that with 14 partitions, we have 14 results. However, this is only a bookkeeping disadvantage. As explained in the next section, the dependent variable is standardized. As a result, the parameters of interest are in the same units and can be averaged across partitions, allowing us to make summary statements about disparity in sentencing without reporting the results for all 14 partitions. ******************************************************** 4.0 Statistical methodology: A random effect model ********************************************************* This methodology section has two components. The first section discusses estimation and the second section describes the data and variables used in the analysis. ***************** 4.1 Estimation ***************** The previous section identified a theoretical model for measuring sentencing disparity. The model is potentially useful because it answers questions relevant to this study. Unfortunately, the model is not practically useful as written for two reasons: (1) there are too few observations per guideline cell to estimate parameters reliably and (2) there is no simple way to summarize results across the 258 guideline cells. Recognizing these problems, this section provides a model simplification that retains the important aspects of modeling already discussed. We have defined ijkSas the sentenced imposed on the ith offender by the jth judge in the kth guideline cell. For reasons to be explained, it will be convenient and useful to rescale the sentence. LetkNbe the number of sentences imposed within cell k. kkjiijkkNSM??=, ()1,2??=??kkjikijkkNMSSC The notationkji?,means summation over all i and j in cell k. kMis the average sentence imposed in the kth cell. kSCis the standard deviation for sentences imposed within the kth cell. Then the rescaled measure of sentence is-- [7] k This rescaled measure has two useful properties: (1) within any cell, the average rescaled sentence will be zero and (2) within any cell, the standard deviation for the rescaled sentence will be one. Using this rescaled version of the sentence, we write the model using all the cells as-- [8] jWijkijijkijijkijkijkuWeZXs++=++=?????0 Equation [8] looks similar to equation [5], but there are important differences: * Equation [5] pertained to a specific guideline cell, while equation [8] applies to all guideline cells. * In equation [5], the ? and ? have k subscripts, implying that those parameters vary from cell to cell, but the k subscript has been dropped in equation [8], implying that those parameters are invariant across the cells. Given 258 guideline cells, this reduces the parameter space by a multiple of 258, greatly reducing the estimation problem and providing a useful summary measure across cells. This simplification may be an incorrect specification, and we will subsequently introduce some model flexibility. * Although we have retained the ? and ? notation, these are not the same parameters as those in equation [5]. Rescaling Sijk causes the interpretation of the parameters to change, so parameters are now interpreted as changes in standard deviation units. Although this is analytically convenient, readers may have trouble interpreting standard deviation units, but we will discuss how standard deviation units can be translated back into natural units to facilitate interpretation. The dependent variable now has a mean of zero and a variance of one for every guideline cell, so treating the ? as the same across the cells is equivalent to saying that a unit change in variable X increases the sentence by ? standard deviations, regardless of the cell. Likewise, treating the ? as the same across the cells is equivalent to saying that a unit change in variable Z increases the sentence by ? standard deviations, regardless of the cell. Using standardized scores greatly simplifies interpretation of the statistical analysis because a standard deviation change is the same, regardless of the cell. Although using standardized scores simplifies the model, the simplification may be an incorrect specification leading to misleading conclusions. We take two steps to guard against misspecification. The first step is to introduce the year when the sentence was imposed as a W variable. This allows us to determine whether the severity of sentences is increasing or decreasing over time.***Footnote 24 More accurately, given the model specification, we are able to determine whether sentence severity for white offenders is increasing or decreasing over time. The interaction of time with black offenders tells us how the sentences for black offenders relative to white offenders-- the measure of racial disparity--has changed over time***. We also introduce the year when the sentences imposed interacted with the variable black as another W variable. This allows the proportionality of sentences imposed on white and black offenders to vary systematically across time. (For additional flexibility, we use a linear spline with the date of the Gall decision as the join point.) The second step is to introduce the maximum of the guideline cell as a W variable. This allows the proportionality of sentences imposed on white and black offenders to vary systematically across the guideline cells. Think of the statistical model as estimating a smoothed version of the relationship between sentences for white and black offenders across the guideline cells and over time. An attractive feature of building a statistical model capturing variation in sentences for white and black offenders is that the results from the statistical model are readily expressed using figures. The analysis will deal with special considerations. For relatively minor crimes committed by offenders with minor criminal records, there is a practical lower limit on the sentence imposed: Time served may be zero for offenders sentenced to probation. For comparatively serious crimes committed by offenders with major criminal records, there is a practical upper limit on the sentence imposed, and some sentences may be very long (e.g., when a judge imposes consecutive sentences). Researchers have struggled with ways to deal with censored variables (as the above problem is known in the econometrics literature (Sullivan, McGloin, & Piquero, 2008; Britt, 2009; Ulmer, Light, & Kramer, 2011), but there is a special consideration for this study. Many of the solutions do not lend themselves to hierarchical modeling, at least not using conventional software. Examining sentences imposed under the guidelines, within most guideline cells, shows that most offenders receive jail or prison terms. We can include most guideline cells within the study by setting a rule: Include a cell when 85% or more of the offenders sentenced within the cell receive some prison time.***Footnote 25 This still leaves a lower limit problem, but a linear model will provide a consistent estimate of the average sentence when there is a lower limit. Standard errors are corrected using robust standard errors. The upper limits are a minor problem. A reviewer of an earlier draft disagreed with this statement, so some clarification is required. Using OLS when data are censored will lead to inconsistent estimates of the parameters in an underlying latent variable model, a problem discussed extensively in the econometrics literature. However, a correctly specified OLS model will be consistent for the conditional mean by definition. This distinction is discussed by Angrist and Pischke (2009), among others. Our claims about consistency pertain to the conditional mean***. Within each cell, sentences for offenders who received non-prison sentences are set to zero. (See the table at the beginning of appendix B.) Within each cell, sentences for offenders who received non- prison sentences are set to zero. Adopting this allows us to use a linear hierarchical model for the analysis.***Footnote 26 With exceptions, guidelines are not required for misdemeanors below class A misdemeanors. Consequently, the least serious federal crimes are not sentenced under the guidelines. U.S. Attorneys often employ declination standards that limit federal prosecution to serious crimes, and less serious crimes are referred to state courts. These two observations may explain why most federal offenders sentenced under the guidelines receive some prison time***. Limiting the analysis to certain cells does not bias the analysis because selection is based on an exogenous variable: guideline cell***Footnote 27 One might argue that the guideline cell is not exogenous. The argument is that prosecutors wait to determine the judge appointed to the case and then manipulate the facts surrounding the case in recognition of judge sentencing proclivities. If this happens, the effect does not appear to be large (Yang, 2013), so we treat the guideline cell as exogenous***. However, we acknowledge that the findings pertain strictly to these included cells. We could have relaxed the inclusion rule, but dealing with probation terms requires special considerations. In the federal system, terms of probation typically come with onerous behavioral restrictions, including house arrest and electronic monitoring. When only a few sentences are to probation, we do not set the sentence to zero. However, if a larger number of probation sentences were included, we would have to deal with the terms of probation varying in severity. ***Footnote 28 Some reviewers of an earlier version of this report encouraged us to extend the analysis to probation terms, suggesting that we estimate separate regressions with binary outcomes (e.g., logistic or probit models). We see the required analysis as more difficult than suggested by the reviewers, and resource and time limitations precluded taking probation sentences into account***. Expecting application of the guidelines to vary across certain offense and offender groups, we partition our data into 14 strata. ***Footnote 29 One might argue that there should be more or fewer partitions. In its study of the use of mandatory minimum penalties across 13 districts, the USSC partitioned offenses into drug offenses, firearm offenses, and child pornography offenses (Commission, 2011, pp. 111-115--ssentially the same partitioning that we have used. The Commission also distinguished aggravated identity theft, but we did not make that partition because there are too few cases to treat these as a partition. Separate analyses are performed within each stratum. The strata are defined as follows: * The USSC makes a useful distinction between upward departures (always attributable to the judge), downward departures that are attributable to the judge, and downward departures that are attributable to the government. ***Footnote 30 A departure is a sentence that is higher than the upper guidelines limit or lower than the lower guidelines limit. Downward departures attributable to the government are legitimate rewards for cooperating with the government to further criminal cases against others. It seems best to assume that departures attributable to the government fundamentally alter the application of the guidelines. As a result, the analysis described above (and illustrated below) should be done separately for cases that have departures attributable to the government and for cases that do not have departures attributable to the government. We make that distinction for this study, so the explanation of sentence variation is found exclusively in judicial decision making. * Federal criminal law always sets an upper limit on the sentence and, for some crimes, federal criminal law sets a lower limit greater than zero. Mandatory minimum sentences are especially likely for drug violations, and these mandatory minimums are often so severe that Congress has provided provisions allowing judges to ignore the mandatory minimums for a class of offenders (see appendix A). Weapons enhancements, which often trigger mandatory minimum sentences, are incorporated into the guidelines. The rules for sentencing drug offenders, those who receive weapons enhancements, and other offenders are so different that we have created a second partition determined by four broad offense categories: - drug violations that do not involve weapons enhancement - nondrug violations that do not involve weapons enhancements - drug violations that involve weapons enhancements - nondrug violations that involve weapons enhancements. In the federal system, sex offenses are mostly for child pornography, and most of the convictions for sex offenses are of white offenders. As a result, analyzing disparity among sex offenses is of little interest. Nondrug violations account for all nondrug crimes, excluding sex offenses. However, when examining sentence variation across judges, where race is not a consideration, sex offenses make up their own class. * Although being male or female is not a legitimate consideration according to guideline policy statements, sex has an important effect on sentence imposed. This justifies treating sex as a stratifying variable. Females are rare participants in offenses that involve weapons enhancements. As a result, every partition involving females and weapons enhancements is treated as null. Also, females are rarely participants in sex violations. In summary, ignoring sex violations, we partition the data into 12 subsets: eight subsets for males and four subsets for females. We repeat the same analyses for each of the 12 subsets. When estimating the skedastic function, we use all 14 cells because race is not a factor for the skedastic function. The principal analysis eliminates noncitizens. The treatment of noncitizens differs from the treatment of citizens for at least four reasons. First, the guidelines factors—especially those required to construct the criminal history category--are likely to be imprecise for noncitizens. Second, noncitizens can be deported, so the stakes at issue are different for citizens and noncitizens. Third, race appears to be inaccurately reported for noncitizens, who are (according to the data) predominately white Hispanics. Fourth, fast-track provision introduces a complication that requires special attention going beyond the analysis reported in this study (McClellan & Sands, 2006; Gorman, 2010; Cole, 2012). Consequently, we drop noncitizens from the principal analysis, but given that noncitizens are a large proportion of federal offenders, we report a preliminary analysis of noncitizens in appendix C. ************************* 4.2 Data and variables ************************* The USSC maintains a detailed database pertaining to the application of guidelines; this study uses an extract of variables from those data. The box below lists the specific variables from the USSC database that entered into our analysis. This list allows interested readers to track variables back to the USSC’s data. The box puts variables into three categories. The first category includes stratification variables that were used to identify the 14 strata and to determine if an offender was a U.S. citizen. The second category--the dependent variable--is the sentence. The third category identifies variables used to determine independent variables. ********************************************************** ************************** Stratification variables ************************* FEMALE--A variable coded by the USSC that equals 1 if the offender is female. REAS*--The first, second, etc. reason given by the court for why the sentence imposed was outside the range (there are 10 variables, for REAS1 to REAS10). The value indicating substantial assistance is “19” ((5K 1.1) substantial assistance with government motion). These departures are explained in appendix A. BookerCD--An additional variable used to identify a handful of additional substantial assistance cases. The post-Booker reporting categories (there are 12 categories) are based on the relationship between the sentence and guideline range and the reason(s) given for being outside of the range. The value indicating substantial assistance is “5.” OFFTYPE2--The offense type variable used to determine drug offenders (values “10,” ”11,” and ”12”) and sex offenders (values “4,” ”28,” ”42,” ”43,” and “44”). WEAPON--Used to identify weapons offenders. It is a USSC- created variable, where 1 indicates the offender received a specific offense characteristics (SOC) weapons enhancement or had an 18:924(C) charge present. NEWCIT--Used to identify U.S. citizens. The value “0” indicates a U.S. citizen. ***************************************************** ***************************************************** ******************** Dependent variable ******************** TOTPRISN--The total number of months of imprisonment ordered. Sentences of less than 1 month are coded as zero. In its analysis, the USSC chose to include alternative sentences (i.e., home confinement, community confinement, and intermittent confinement), but these alternatives are not considered to be prison terms in our analysis. This variable is top-coded at 470 months, meaning that any sentence in excess of 470 months (including life terms and death sentences) are recoded to 470 months. *********************** Independent variables *********************** XCRHISSR--The offender’s final criminal history category (I– VI), as determined by the court. The guidelines translate the total criminal history points into the six categories defined by the criminal history category. XFOLSOR--The final offense level, as determined by the court. The combination of XCRHISSR and XFOLSOR indicates the guideline cell. MONRACE--The race of the offender: white, black, and other race. Race is independent of Hispanic ethnicity. Note that while our findings are focused on non-Hispanic whites and non- Hispanic blacks or African Americans, the analysis uses data representing all races identified by the USSC. We removed any offender with an unreported race from the analysis. HISPORIG--An indication of Hispanic ethnicity. We designate an offender as Hispanic if this variable is coded “2” (i.e., we consider an offender of Hispanic ethnicity only if there is an indication of Hispanic origin). XMAXSOR--The maximum guidelines range for imprisonment, as determined by the court. When used in the analysis, XMAXSOR is recoded to a maximum of 470 (including life terms) and is rescaled by dividing by 470. The recoded variable runs from 0 to 1. SENTYR--The year the sentence was imposed. When used in this analysis, SENTYR--is recoded from 2005-2012, to 0- 1 by subtracting 2005 and dividing by 7. We allow the time trend to break near the Gall v. United States decision. Since it was decided on December 10, 2007, we allowed the break for sentences filed beginning on January 1, 2008. TOTCHPTS--The offender’s total criminal history points. To capture the variation of criminal history with each criminal history level that may explain differences in sentencing, we use the criminal history points interacted with the offender’s criminal history level (XCRHISSR). In particular, for every offender, we standardize his or her criminal history points within the offender’s criminal history level. For example, if an offender has a criminal history level of III, his or her criminal history points are standardized with all other offenders that have a criminal history level of III. Then, we interact the standardized history points with the criminal history categories and insert this measure into the statistical models (as six variables, for each of the six criminal history categories). NEWCNVTN--Indicates that the adjudication decision was reached by a trial. This variable enters the model as a covariate. MONCIRC--Indicates the judicial circuit in which the offender was sentenced. We used fixed effects to indicate the judicial circuit, using the 9th judicial circuit as a reference category. CIRCDIST--Indicates the judicial district in which the offender was sentenced. JUDGE--A unique identifier assigned to a judge, used as a random effect in the statistical models. *************************************************** For the analyses, we use USSC data for all offenders who were sentenced cases where government-initiated departures were not a factor from cases where government-initiated departures were a factor, we inspected the variables REAS1 through REAS10. If any of these variables reported “(5K 1.1) substantial assistance with government motion,” then we assigned the case to the government-initiated departure category. The BookerCD variable identified a few additional government-sponsored downward departures. We identified the guideline cells by interacting the variables XFOLSOR and XCRHISSR. We computed the percentage of cases within each cell that resulted in prison and we discarded all cells where that rate was lower than 85%. Others (Ulmer, Light, & Kramer, 2011) have considered disparity in the imposition of prison terms, and we do not investigate that question. The dependent variable is a transformation of TOTPRISN. The transformation was described earlier using the formula: kkijkijkSCMSs?= Substitute TOTPRISN forijkS. ********************************** 5.0 Analysis and interpretation ********************************** Results are presented in two sections. The first section on sentencing disparity is divided into four subsections that pertain to the research questions raised in the introduction. Regression results are reported in appendices B and D; appendix C reports analyses for noncitizens. The second section pertains to prosecutorial discretion; this section tests whether the exercise of prosecutorial discretion has changed during the study period. Additional analyses regarding prosecutorial discretion appear in appendix D. Statistical output is voluminous; therefore, to assist the reader, we have taken the following steps in this section. First, we interpret the statistical results in detail for the analysis of one partition of the data. Interested readers can apply that interpretation to the tables appearing in appendices B and C. Second, we provide summary statements that appear in bold. These summary statements take advantage of the standardization of the dependent variable, which allows parameters to be averaged across strata. When we average, we take a simple average across the strata. Third, we report sensitivity testing in exhibits 1 to 3. ************************************************** 5.1 Operational variables entering the analysis ************************************************** Estimation uses a multilevel mixed-effects linear model programmed as procedure mixed in version 13 of Stata. (This is xtmixed in earlier versions of Stata.) This section explains the modeling, interpretation, and presentation of results. (See appendix B for detailed findings and appendix C for results of analysis of noncitizens.) As described previously, the dependent variable is the rescaled total months of prison imposed. The fixed effects are: Race/Ethnicity: Black--This is a dummy variable that is coded one when the offender is black and coded zero otherwise. White offenders are the omitted category. Hispanic and other enter into the analysis, but we will not discuss these racial or ethnic categories separately because modeling for them is identical to modeling for blacks. year_pregall--Calendar time is modeled with a linear spline. This variable models calendar time up to the Gall decision. Calendar time has been recoded to run from 0 at the beginning of the observation period to 1 at the end of the observation period. year_postgall--This variable models calendar time after the Gall decision. Black x Year Pre-2008--This is the year_pregall variable interacted with the Race/Ethnicity: Black variable. Black x Year Post-2008--This is the year_postgall variable interacted with the Race/Ethnicity: Black variable. Note that the parameters associated with the last two variables allow us to estimate how the sentencing disparity for white and black offenders has changed over time. Interpreting these parameter estimates is the principal concern of this study. Max Sentence in Guideline Cell--This is the maximum sentence specified for the guideline cell. It has been top coded at 470 months and rescaled to run from 0 to 1. Black x Max--Sentence This is Max Sentence in the guideline cell interacted with the Race/Ethnicity: Black variable. The introduction of these two variables relaxes the restrictive model specification by allowing the degree of racial disparity to vary across the guideline cells. Possible disparity is pronounced within guideline cells that call for relatively short sentences but is insignificant for guideline cells that call for comparatively long sentences. The use of the above two variables expands the model’s flexibility, and the parameters associated with the last variable estimate how disparity varies across the guideline cells. Newcnvtn--This is a dummy variable indicating that the offender was convicted by trial. c_pts_n--There are six of these criminal history point variables distinguished by allowing n to run from 1 to 6. The c_pts_n variable requires comment: Guideline calculations assign criminal history points based on the offender’s criminal record and then collapse the criminal history points into six criminal history levels that form one dimension of the guideline grid. Because of the collapsing, actual criminal histories are heterogeneous within guideline cells, and the c_pts_n variables take that heterogeneity into account. For example, c_pts_1 is the standardized criminal history points for offenders whose criminal history is classified in the first criminal history category. If the typically subtle distinctions in criminal history category matter at the time of sentencing, the parameters associated with the c_pts_n variables should be positive. Except as control variables, these parameters are of no interest to this study. circ_n--There are 11 dummy variables, distinguished by allowing n to run from 1 to 11, that denote the circuit. The ninth circuit is the omitted reference category. Additionally, a sequence number distinguishing judge identity (JUDGE) enters the model as a random effect. For every judge, there is a judge random effect for white and black offenders. ********************************************** 5.2 Findings regarding sentencing disparity ********************************************** Table 1 reports selected regression results***Footnote 31 The data include Hispanic and other offenders, but they are not of principal interest for this report; therefore, results pertaining to Hispanic and others are suppressed. Similarly, we suppress the random effects for these two racial and ethnic groups. We also suppress the fixed circuit effects. Complete results appear in appendix B***. for the regression using the data partition males, no drug violations or weapons enhancements, and no government-sponsored departures for substantial assistance.***Footnote 32 We only show the parameter estimates that are of interest for estimating racial disparity. The appendix shows complete regression results***. This partition includes 76,405 offenders who were sentenced by 1,292 judges. An average judge sentenced 60 offenders from this partition, but some sentenced only a single offender and at least 1 judge sentenced 460 offenders. All of the shaded results are statistically significant at p < 0.01. The first column in table 1 identifies selected variables that entered the regression. The table identifies the parameter in the first column and reports the parameter estimate in the second column, a heteroscedastic robust standard error in the third column, and a 95% confidence interval in the last two columns. When the confidence interval does not overlap 0, the parameter is deemed statistically significant at p < 0.05. The shading denotes parameters that are statistically significant at p < 0.01. The first two parameter estimates are associated with the variables “year_pregall” and “year_postgall.” Collectively, these provide a spline telling us how sentences have changed for white offenders between 2005 and 2012. These two parameters tell us roughly that the sentences imposed on white offenders have decreased by -(3/8)x0.246-(5/8)x0.207=-0.222 standardized units between 2005 and 2012. This change is statistically significant at p < 0.01. For white males convicted of crimes other than drug violations and weapons offenses, and who did not provide substantial assistance to the government, sentences became more lenient between 2005 and 2012. Moreover, based on the results reported in appendix B, examining the estimated changes over the 8-year period, it appears that white males and females received sentences that decreased in severity. When we include sex offenders in the analysis, there is an estimated reduction in 13 of 14 strata. The change is statistically significant at p < 0.01 in six tests and at p < 0.05 in another three tests. Given that the dependent variable is measured in standardized units, the changes appear large: The simple average change over the 14 strata is -0.22 standardized units, which is significant at p < 0.001. Apparently, judges have exercised discretion during the post-Booker era to reduce the average length of time that offenders serve in prison. Although overall trends in sentences are not the main concern of this report, we note that the study is being conducted within an overall context of increasing leniency in federal sentencing. Sensitivity testing reported in exhibit 1 indicates that findings are insensitive to model specification. Our interest is principally focused on the parameters associated with Black x Year Pre-2008 and Black x Year Post- 2008 because these parameters estimate how disparity (the difference between the average sentence for blacks and whites) has changed over time. From Booker to Gall, the disparity increased (p < 0.01), but it does not appear to have changed any more since Gall. To consider the entire period post-Booker, see the parameter associated with Trend from 2005 to 2012. Over the entire post-Booker period, the sentences imposed on black offenders have increased by 0.146 standardized units, an amount that is statistically significant at p < 0.01. Interestingly, we previously reported that the sentences imposed on white offenders decreased by 0.222 standardized units, and now we report that racial disparity in sentencing has increased by 0.146 standardized units, implying that blacks did not receive harsher sentences between 2005 and 2012; rather, blacks have not benefited as much from the increased leniency afforded to whites, and this has widened disparity. *************************************************************** ************************************ Exhibit 1. There has been a trend toward more lenient sentences ************************************ In the preferred model, sentence severity for white offenders decreased by an average of 0.222 standardized units between 2005 and 2012 (p < 0.01). When the model substitutes district fixed effects for circuit fixed effects, the estimated decrease in overall sentence severity is 0.194 standardized units (p < 0.01). When fixed effects are omitted from the model, the decrease is 0.226 (p < 0.01). Returning to the preferred model and including age and education as covariates, we find that sentence severity fell by 0.215 standardized units (p < 0.01). The decrease in sentence severity for white offenders appears robust to model specification. *************************************************************** Excluding sex offenses (because there are few black sex offenders) in all eight strata where we contrasted the sentences for black males and white males, the trends in sentences for black males were increasingly longer than the sentences for white males. The contrast was statistically significant at p < 0.01 in four of the eight contrasts, and it was significant at p < 0.05 in a fifth. The simple average across the eight contrasts showed that at the end of 2012, blacks received sentences that were 0.173 standardized units higher than their white counterparts, a difference that was statistically significant at p < 0.01. Evidence reported in exhibit 2 indicates that these findings are robust to model specification. We do not find that black females are disadvantaged compared with white females. *************************************************************** ************************************** Exhibit 2. Sentencing disparity has increased for black males ************************************** Using the preferred model with circuit fixed effects, sentencing disparity has increased by an average of 0.173 standardized units (p < 0.01). Exchanging districts for circuits, we estimate that sentencing disparity has increased by an average of 0.161 standardized units (p < 0.01). Dropping both circuits and districts from the model specification, sentencing disparity increased by an estimated 0.190 standardized units (p < 0.01). Using the preferred model, but including age and education in the model specification, the increase is 0.168 standardized units (p < 0.01). Findings regarding trends in disparity are robust to model specification. ***************************************************************** According to the Black x Max Sentence parameter, the disadvantage suffered by blacks decreases as the maximum sentence for a guideline cell increases. This does not mean that blacks are disadvantaged for minor crimes and advantaged for serious crimes; rather, the statistics imply that the disadvantage between blacks and whites narrows as crimes become more serious. Table 1 also reports random effects holding the judicial circuitconstant. ***Footnote 33 Differences across circuits are large. Averaging across the 14 partitions (including sex offenses), the second circuit sentence is an average of 0.46 standard deviation units below the national average and the fifth circuit sentences are an average of 0.31 standard deviation units above the national average. That disparity is much larger than the racial disparity reported in this study***. Statistical modeling assumes that judges differ regarding the sentences imposed on white offenders, the sizes of those differences are randomly distributed across judges, and that distribution is normal. The variance for that distribution is 0.075, which appears to be large in terms of standardized sentences. The confidence interval is tight about this estimate. Likewise, judges differ regarding sentences imposed on blacks. The variance for that distribution is 0.077, and again, that variance seems large in terms of standardized sentences. Observe that the correlation between the random effect across judges for whites and the random effect across judges for blacks is very high: 0.808. Looking across the 14 strata and converting from variances to standard deviations, on average the standard deviation for the random judge effect is 0.36 for whites and 0.40 for blacks (see exhibit 3). This is evidence of high disparity in general: Judges who impose above average prison terms on black offenders tend to impose above average prison terms on white offenders, and judges who impose below average prison terms on white offenders tend to impose below average prison terms on black offenders. In this regard, sentences are disparate in the sense that similarly situated offenders who have committed similar crimes receive sentences that differ depending on the judge who imposes the sentence.***Footnote 34 Generalizing this statement is difficult because we could only estimate covariances for four of the regressions. This is a practical limitation of working with random effect models. The correlation was close to 0.85 in three regressions and closer to 0.6 in one regression. In standard deviation units, the random effects are 15% to 20% larger for females than for males, implying that judges disagree more about the sentences to impose on females than the sentences to be imposed on males. *************************************************************** ******************************************** Exhibit 3. Judges disagree about sentences ******************************************** According to the preferred model, when sentencing whites, the distribution of judge random effects has a standard deviation of 0.36 on average across 12 data partitions. When sentencing blacks, the standard deviation is 0.40. (There is no change when age and education are added to the model.) As expected, when district is substituted for circuit, the standard deviations for the random effects fall to 0.23 for whites and 0.30 for blacks because the districts account for more residual variance than do the circuits. Likewise, when neither district nor circuit enters the model, the standard deviations for judge random effects increase to 0.44 for whites and 0.50 for blacks. These patterns are expected because circuit fixed effects account for some of the residual variance otherwise attributed to judges and district fixed effects account for even more of the residual. *************************************************************** Although the analysis reported in table 1 pertains to a regression using the standardized scores for sentences, estimates based on these results are readily translated back into the original metric of months sentenced by reversing the transformation represented by equation [7]. We have applied this reverse transformation when producing the figures reported below. ******************************************* 5.2.1 Converting findings on disparity from standardized units to original units ******************************************* Statistical testing shows that the table 1 parameters associated with time interacted with blacks (Black x Year Pre- 2008 and Black x Year Post-2008) imply positive, statistically significant trends. Apparently, blacks have been increasingly disadvantaged over time. The parameters are difficult to interpret, but they can be translated into natural units (months) and their implications can be graphed. Graphing also allows us to extend findings to other partitions of the data. Figure 2 represents trends in racial disparity for the first four partitions of data: Males convicted of crimes that do not involve weapons violations, both with and without substantial assistance to the government. The horizontal axis in each panel is the year that the sentence was imposed. When we translate backwards from standardized units to original units, we have to make the translation for specific guideline cells. (The translation first multiplies by the standard deviation of sentences within a cell and then adds the mean for that cell.) The translation is reported for maximum guideline sentences at the break for the first quartile of cells used in the analysis, for the mean, and at the break for the third quartile. For example, in the first panel (males, no drugs or weapons, and no substantial assistance), a quarter of offenders were sentenced under guidelines calling for maximum sentences of 33 months or less, half were sentenced under guidelines calling for maximum sentences of 46 months or less, and a quarter were sentenced under guidelines calling for maximum sentences of 87 months or more. The quartile breaks differ across the data partitions. The first two panels appear to show a sharp break at the Gall decision. (We reference panels from left to right and then from top to bottom.) We are not inclined to take the break seriously, as it could result from our decision to place the knot for the spline at the time of Gall. A different placement might show a different pattern, but our principal concern is with the level of disparity at the end of 2012, and the estimates at the end of the period should be relatively insensitive to the placement of the knot. All four panels agree that disparity increased from 2005 to 2012, although the change is not statistically significant in the fourth panel. If we consider all panels jointly, giving each an equal weight in the calculations, we find they are jointly significant at p < 0.01. The next set of four partitions (figure 3) pertains to males convicted of crimes that involve weapons enhancements. Some of these crimes are drug-related and others are not. Again, we distinguish between cases where the offender was rewarded for substantial assistance to the government and cases where there was no reward for substantial assistance. The figure shows patterns of increasing racial disparity for offenses that involve weapons enhancement. The increases are not statistically significant when offenders receive reductions for substantial assistance to the government, but the increases are always positive. If we consider the four trends jointly, they are statistically significant at p < 0.05. Results for females are different. There are too few cases involving females with weapons violations to allow testing. Considering the four offenses without weapons enhancements, one partition shows a statistically significant (p < 0.05) trend while the other three do not, and the joint effect is not statistically significant. We do not graph the results. Increasing racial disparity in sentencing appears to be limited to males. Although other factors may have been changing during this same 8-year period, a subsection on “evidence of prosecutorial discretion” (see section 5.3) does not find trends in prosecutorial behavior, causing us to discount prosecutorial behavior as the cause of increasing disparity. Other explanations are plausible, but it seems reasonable to conclude these trends could be attributed to judicially induced disparity in the treatment of black offenders, compared to white offenders. Some reviewers of an earlier draft were concerned that findings might be sensitive to the choice of a linear spline and suggested replacing the linear spline with an alternative approach. The data definitely suggest that the trend is nonlinear, so using a simple linear trend would be a misspecification. (One of the reviewers performed a reanalysis using the methodology of Ulmer, Light, & Kramer (2011), reporting that the trend was positive post-Booker and then appears to have reached a plateau, consistent with the spline.) We tried using a polynomial but--as often happens with polynomials--the end points seem uninformative and perhaps misleading about the trend as it nears 2012. A nonparametric alternative is to substitute dummy variables for every post- Booker year. Figures 4 and 5 are the exact counterparts to figures 2 and 3, with dummy year variables substituted for the spline. Readers can bring their own interpretations to figures 4 and 5. There is considerable random fluctuation from year to year, which is why we employed a smoothing technique--linear splines- -to provide a summary. (We discourage readers from making much of year-to-year changes.) Some of the figures suggest why using polynomials can be misleading: There are apparently random spikes in the latter years. Our interpretation is that the general trend toward increasing racial disparity seems to reach a rough plateau sometime after the Gall decision. The linear splines appear to be reasonably consistent with this interpretation. *********************************************** 5.2.2 Racial disparity across guideline cells *********************************************** Allowing the sentences to vary across the guideline cells is of some importance for model specification (significant in 5 of 14 partitions), but this finding is not of substantive interest. ***Footnote 35 By construction, the mean sentence is zero across the guideline cells. Given the regression specification, the statement pertains to the variation in average sentences for whites across the guideline cells***. Racial disparity is statistically significant at p < 0.01 in 3 of 12 comparisons. We do not consider this to be an important finding. The signs of coefficients associated with the race effect vary, and the coefficients are statistically significant in only 3 of 12 comparisons. The results are sensitive to model specification. Evidence is that standardizing the sentences has been useful for model specification. ************************************** 5.2.3 Racial disparity across judges ************************************** Table 1 also reports random-effects parameters. We have already discussed those findings by presenting variance estimates in standardized units, and this subsection provides a visual impression after converting back to original units. Random effects may be difficult to interpret, so some intuition may be useful. The fixed effects (i.e., all of the parameters except the random effects) provide an estimate of the average sentence imposed on white and black offenders, conditional on the facts surrounding the case. Given that average, we can estimate the amount by which an individual judge differs from the average when sentencing black offenders and when sentencing white offenders. These estimated differences are the random effects. Subtracting the random effect for a judge when sentencing a white offender from the random effect for a judge when sentencing a black offender translates the random effects into a difference score--how much a judge sentences blacks more severely than whites. The statistical model assumes that the random effects are distributed as bivariate normal. This is probably not exactly true, but we assume it is sufficiently approximate that we can graph the implied difference scores. By construction, the difference scores are themselves distributed as normal, explaining why the graph in figure 6 has the familiar shape of a normal distribution. This discussion is focused on three parameters: the variance in judge effects for whites()2w?, the variance in judge effects for blacks()2b?, and the correlation in the judge effects for whites and blacks()wb?. The variance for the distribution of differences in effects for whites and blacks is()wbbw???222?+. Because the analysis was done using standardized sentences, the disagreement across judges in sentences for blacks is large, ***Footnote 36 The analysis reported in table 1 was based on transformed sentences, where the distribution of sentences within a guideline cell had a mean of zero and a standard deviation of one. The judge random effects are scaled according to these unitary standardized deviations, suggesting that the random effects are large***. but then so too is the disagreement across judges in sentences for whites. Furthermore, when we were able to compute the correlation in the effects for blacks and whites, that correlation is high. The story is that some judges typically apply relatively harsh sentences to both blacks and whites, while other judges typically apply relatively lenient sentences to both blacks and whites. This is disparity but it is not necessarily racial disparity. When we performed the analysis with all of the data (as reported previously), we discovered that the covariance estimate was unstable. We could improve the estimates by limiting the data to judges who sentenced at least five offenders. The results we present below pertain to the random effects after imposing that data limitation. Even with this data restriction, we could not always estimate a covariance reliably. We only present figures when a covariance estimate was available. Figure 6 represents the variance for males convicted of non-weapons violations. Figure 7 is similar but pertains to females. Figures 6 and 7 aid in interpreting regression results. For reasons already discussed, we draw the distribution of the difference scores for three levels of guideline cells. These levels are at the 25th, median, and 75th percentiles of the maximum sentence lengths of the guideline cells. For each judge, we compute the difference in predicted sentences for blacks and whites for the three sentence lengths. The two figures depict the extent of disagreement across judges regarding the sentencing of white and black offenders. The horizontal axis reports the difference in months of the predicted difference in sentence lengths for black and white offenders for each judge. The vertical axis reports the density of the distribution. The estimates are highly inferential and intended to approximate and illustrate the extent to which federal judges disagree about sentences by race. By construction, the judge random effects are centered on zero. The differences in random effects will also be centered on zero. However, for purposes of data visualization, we have centered them on the mean differences in sentence for black and white offenders. Thus, for males, we observe that the centers of the distributions are all above zero. Blacks receive longer sentences than whites for the average judge; for females, the distributions are centered near zero because female white and black offenders receive similar terms. The distributions are approximations, but if we took them literally, we would conclude that judges tend to sentence blacks more severely than they sentence whites. For some judges, the sentencing disparity seems especially large, while for others it seems substantively insignificant. For a few judges, blacks appeared to be advantaged compared to whites, and while this may be true, the advantage typically appears to be small, and it may occur because we have used the bivariate normal as a useful approximation of reality. If judges sentenced blacks and whites to equivalent terms, conditional on the facts surrounding the case, the distributions of the difference in sentences for blacks and whites would collapse to zero. This does not happen. (Table 1 provides a test of the null hypothesis that these distributions have a variance of zero, meaning that they collapse to zero. The test rejects the null, although we recognize that the estimated variance has a large confidence interval, so there is some uncertainty about the spread of these distributions.) It seems likely that black and white offenders differ systematically in ways that cause judges to sentence them differently and, if we could observe those differences, the sentencing differences might be appropriate under the rule of law. However, that does not explain why some judges sentence blacks especially severely compared to whites. Unobserved, systematic differences between whites and blacks cannot account for the fact that the average difference in sentences for black and white offenders varies across judges. ************************************************************** 5.2.4 Increases in disparity: Variance about the guidelines ************************************************************** We re-estimated the regression discussed previously without the judge random effects. From the regression, we estimated the squared residual, equal to2ijkein the notation used earlier. ***Footnote 37 Our intent is to examine changes in the variance of the regression***. Then we regressed the squared residual on the linear splines representing post-Booker trends and onto the maximum sentence specified by the guidelines. Table 2 shows the results of repeating this exercise for the 12 partitions of the data. As noted previously, we can estimate how dispersion has increased about the average as of 2012 by weighting and summing year_pregall and year_postgall parameters. Standard error calculations are provided in the table. Except for the weapons violations, the trends are toward higher dispersion and all trends are statistically significant at p < 0.05 or better. Post-Booker, we previously saw that sentencing has become more lenient. We now see that it has become more disparate. Excluding weapons violations, similarly situated offenders convicted of similar crimes are increasingly sentenced differently. As a robustness check, we find that the qualitative patterns (and tests of statistical significance) do not change when district fixed effects are included in the model. ***************************************** 5.3 Evidence of prosecutorial discretion ***************************************** Because prosecutors exercise wide discretion to charge and bargain with offenders, prosecutorial discretion may be exercised to disadvantage blacks. That possibility is difficult to discount using FJSP data because FJSP data do not provide a rich description of offenses and offenders at the time prosecution is initiated. The full story may not emerge before a federal probation officer writes a presentence investigation report based on a narrative of the crime provided by a law enforcement source, a criminal record check, and interviews with the offender and his or her associates. Furthermore, the current version of FJSP data provides limited means to link data from the Executive Office of U.S. Attorneys with sentenced offenders, so any study using Executive Office data necessarily works with a selected dataset. Even if U.S. Attorneys treat blacks differently than whites, it is nevertheless difficult to discount the findings reported in the previous section. If federal prosecutors discriminate against blacks, and if judges could somehow recognize that differential treatment,***Footnote 38 The federal probation officer prepares a presentence investigation report for the judge. The report includes a description of the crime according to case files, the offender’s criminal history according to a records check, and interviews with the offender and his or her associates. In theory, then, the judge could form an opinion about the case that is independent of the rendition communicated by the prosecution and defense. However, others have reported that judges are typically deferential (Commission, 2011; Commission, 2012)***. we might expect federal judges to partially rectify the injustice by being more lenient with black offenders than with white offenders, conditional on the guideline cell. That relative leniency is the opposite of what we find. Rather, if federal prosecutors discriminate against blacks, federal judges appear to reinforce discrimination by disparate sentences. It is possible that federal prosecutors discriminate in favor of black offenders and we are simply estimating corrective action by federal judges (although we cannot prove otherwise, this explanation seems unlikely and we do not pursue it). Figure 2 shows trends toward increasing disparity. If prosecutorial behavior explained those trends, we would expect to see coincident trends in prosecutorial behavior. Evidence reported in the following subsections suggests otherwise. ********************************** 5.3.1 Facts surrounding the case ********************************** Table 3 shows summary statistics reflecting changes in prosecutorial behavior. It is possible that prosecutors have increasingly manipulated offenders’ criminal history scores. If so, we should be able to observe changes over time, and we are especially interested in determining whether blacks are increasingly advantaged or disadvantaged, as this may explain what we are observing about trends in sentencing disparity. Table 3 shows seven indicators of prosecutorial behavior. We are interested in whether those indicators change materially over time. Evidence shows that blacks have higher criminal history scores than do whites, but we do not observe a strong difference in the trends for whites and blacks (table 3). Blacks tend to have higher offense level scores. We see some narrowing in the difference in the scores for whites and blacks, but overall we do not see large trends in the offense severity score and certainly no trends that disadvantage blacks. The original commission considered acceptance-of-responsibility adjustments as a substitute for plea bargaining. As table 3 shows, blacks receive slightly smaller acceptance-of- responsibility adjustments. (The numbers reflect the average number of reductions in the offense level, so the more negative the reduction, the greater the benefit to the offender.) There has been a modest increase in the size of acceptance-of- responsibility adjustments, but the increase has not been large and both whites and blacks have benefited to the same degree. Prosecutors exercise choice over petitioning the court for substantial assistance departures. As shown in table 3, blacks and whites receive substantial assistance departures at about the same rate. Over time, the rate has been constant for blacks and has decreased for whites. However, these changes have not been large. Prosecutors also exercise choice over petitioning the court for other downward departures. We see modest trends in other government-sponsored departures, and blacks and whites receive other government-sponsored departures at about the same rates. Data for government-sponsored departures below the guideline range are only available for cases sentenced since the Booker v. United States decision. Therefore, no data appear for 2003 and 2004. Continuing this logic, table 3 also shows the proportion of cases involving the imposition of a mandatory minimum, regardless of the offense associated with the minimum (i.e., for a drug offense or a weapons offense).***Footnote 39 We used the USSC variables MAND1 to MAND6 to determine whether any mandatory minimum sentence was imposed in the case. These variables are only available since 2005***. It shows that blacks are more likely to receive mandatory minimums, and it apparently shows random fluctuations. Overall, the proportion of cases where a mandatory minimum was imposed has not changed significantly over time for either blacks or whites. Few offenders demand a trial, but blacks may be more likely to demand trials if they are disadvantaged by plea agreements. Blacks are convicted at trial rather than by plea more frequently than are whites, but the differences are not large (because trials are infrequent) and there is no evidence that the decreasing frequency of trials is disadvantaging black offenders. The evidence does not show that prosecutorial behavior changed from 2003 through 2012. The relative constancy of prosecutorial practices cannot explain the trends reported in figure 2. ********************************** 5.3.2 Gaming drug amounts near mandatory minimums ********************************** The evidence presented in the previous subsection pertained to trends in indicators of prosecutorial discretion. In this subsection, we examine charging decisions with respect to drug amounts. This is not based on trend analysis, but it is nevertheless informative about prosecutorial decision-making. Drug offenses provide one venue for finding evidence of whether prosecutors have exercised discretion to the disadvantage of blacks. Mandatory minimums for drug violators are triggered by the amount of drugs that were trafficked. For example, when an offender is convicted of trafficking 500 or more grams of cocaine, he or she is subject to a mandatory minimum sentence, absent some mitigating considerations. If there were evidence of discretion favoring one group over others around a statutory mandatory minimum, we would expect to observe larger percentages of favored groups having drug amounts just shy of the minimum, compared to the other groups. Our analysis suggests that this is not happening. We look for evidence of prosecutorial manipulation in recorded drug weights for drug trafficking cases. Specifically, we look to see whether blacks are more likely than whites to be above a mandatory minimum threshold for drug cases, with amounts near a threshold that triggers the application of mandatory minimum sentencing laws. If blacks are systematically disadvantaged, then blacks should be more likely than whites to be above a mandatory minimum threshold. We limit this investigation to the six major drugs that make up the overwhelming majority of sentenced cases: cocaine, crack, heroin, marijuana, mixture methamphetamine, and pure methamphetamine. Figure 8 provides an example based on powder cocaine. We have included Hispanics in the figure, so the categories are non- Hispanic whites (white), non-Hispanic blacks or African Americans (black), and Hispanics. This figure shows three distributions of drug amounts for offenders in a broadly defined range (+/- 100 grams) around the lower mandatory minimum threshold amount for cocaine (500 grams). For this figure, offenders have been grouped into discrete bins spanning approximately 10 grams (i.e., 480 to 489.99, 490 to 499.99, 500 to 509.99). The horizontal axis shows the grams of cocaine associated with the offense, and the vertical axis shows the percentages of offenders within each grouping that fall into each bin. The bins themselves have been mapped to a smoothed line to aid in visualization. Figure 8 – Distribution of offenders within 100 grams of the 500-gram mandatory minimum threshold, by race and ethnicity Five hundred grams of cocaine is one-half kilogram, and perhaps this is a standard unit of transaction, explaining the concentration around 500 grams. A more likely explanation is that prosecutors are most interested in establishing that offenders have transacted at least 500 grams, which triggers a mandatory minimum, and that prosecutors have little incentive to demonstrate that offenders have transacted somewhat more than this amount until the transaction reaches about 5 kilograms, which triggers the next application of a mandatory minimum. This figure depicts a divergence between the three racial or ethnic groups on the interval between 480 and 500 grams and suggests that, relative to whites, blacks and Hispanics are more likely to fall just below the 500-gram cutoff. The differences are not large, however, implying that during the post-Booker period, prosecutors have not discriminated against blacks when establishing that a case meets the mandatory minimum threshold. To formally test for differences by race, we estimate a linear regression that models the probability of falling above a threshold amount, controlling for offender criminal history, education, sex, substantial assistance to the prosecution, sentencing year, whether the case went to trial, and circuit fixed effects. We estimate separate models for each drug and each mandatory minimum threshold amount. In addition, the sample is restricted to cases where the safety valve provision was not applied. Based on drug statute 21 U.S.C. § 841, we identify two thresholds for each drug--low and high--defined in table D1 in appendix D. Overall, we do not find strong evidence to support the argument that blacks face systematic prosecutorial discrimination. Rather, racial inequities around minimum thresholds appear more idiosyncratic or drug-specific. Table 4 shows the result of our estimation. This table shows the estimated difference in the probability of being just above a threshold amount by race, drug, and threshold. For cocaine, it shows that blacks have a probability that is 0.10 lower than whites to be above the 500-gram threshold, but not statistically more or less likely to be above the 5,000-gram threshold (i.e., the next step in the mandatory minimum gradient). Hispanics are also less likely than whites to be above either the 500- or 5,000-gram threshold. For crack, heroin, and methamphetamine, no strong differences emerge. For marijuana, blacks are more likely to be above the lower (1,000 kilogram) threshold, with no detectable differences for Hispanics. Based on these results, there is no obvious pattern of preference that favors or disadvantages blacks. In addition to the results presented here, there are other aspects of the data that warrant further discussion and from which we conduct additional sensitivity analysis. The first is that drug amounts are not always recorded as exact weights or amounts in the sentencing data. Instead, amounts are often recorded as ranges. We find that this occurs roughly 25% of the time for drug trafficking cases overall, although this percentage varies depending on the drug type. Table D2 in appendix D shows the percentage of cases in which no exact drug amount is reported, stratified by drug type and race category. Given the sizeable number of such cases, we would not want to exclude them from analysis. However, because the reported ranges themselves are relatively wide, we cannot identify offenders as being close to the thresholds. Our solution is to analyze these cases as a separate group. For this group, we find that reported ranges generally (1) do not overlap thresholds and (2) often use threshold amounts as range boundaries. Given this, we select and analyze offenders with recorded ranges bounded by an amount that is also a threshold cutoff (e.g., as in a range of cocaine amount [from > 0 to 500 grams or from 500 to 1,500 grams]) and test whether black offenders are more likely to be strictly at or above the threshold, relative to whites. The specification for this test is identical to the estimation performed using the exact weights discussed earlier (i.e., estimating a linear probability model using covariates, such as criminal history, sex, race and circuit, and separate models for each drug). Table D3 in appendix D reports the results of these estimations. This table shows estimates that are largely consistent with our earlier findings. And while there are some differences from our earlier results, some of these differences may simply be due to chance. We find no discernable evidence of systematic bias in prosecutorial practice. ******************** 6.0 Conclusions ******************** At least since Marvin Frankel’s 1973 book, Criminal Sentences; Law without Order, was published, sentencing disparity has been a concern of federal justice administration. That concern led Congress to pass the Comprehensive Crime Control Act in 1984, which created the USSC. The duly appointed commission crafted the first Federal Sentencing Guidelines in 1987. For decades, scholars have debated whether the guidelines have reduced disparity; with the Booker decision, which rendered the guidelines advisory, scholars have argued whether disparity has subsequently increased or decreased, and they have debated whether a return to some form of mandatory guidelines would benefit or harm justice administration. Our study does not attempt to answer the question of whether the guidelines increased or decreased disparity, whether the Booker decision increased or decreased disparity, and whether a new mandatory guideline system that passed Supreme Court scrutiny would improve justice administration. Commissioned by BJS, our study has proposed a way of studying sentencing disparity that helps answer questions about the level of disparity and post-Booker trends. The methodology could be extended to study the causal effects of Booker, although the grounds for making causal statements in a non-experimental setting are treacherous. Like earlier studies, our study treats the guideline cell as the anchor point for any further analysis of sentencing patterns. Using data transformations that standardize sentences within each guideline cell, we have introduced a regression- based methodology that allows us to make summary statements about how racial disparity varies across guideline cells and over time. By using a linear random effects regression model, we are able to make summary statements about how racial disparity varies across judges. We do not claim this is the only valid methodology for studying disparity, and for some research questions it may not even be the best; however, for the questions posed by BJS, this methodology has strong appeal. The methodology is solidly within the tradition of studying disparity, given the facts known at the time of sentencing, but some researchers claim that locating a study of disparity this late in the judicial process ignores disparity in prosecutorial decision-making. While we do not necessarily find this counterargument compelling, we have dealt with the critique indirectly by showing that prosecutorial discretion does not appear to have changed much since 2005 (the beginning of our study), although we find trends toward increased racial disparity between 2005 and 2012. These trends are likely attributable to judicial behavior, not prosecutorial behavior. This conclusion is strengthened by evidence from the estimated random effects of considerable inter-judge differences in the sentences for white and black offenders. What we find is that black males receive harsher sentences than white males after accounting for the facts surrounding the case, and we also find that the sentencing disparity has grown over the 8 years since Booker. We find that females receive sentences that are less harsh than their male counterparts, but curiously we find that black and white females receive similar sentences. Something other than skin color and racial prejudice per se is driving these results. We find it difficult to attribute racial disparity to skin color alone. While it is an obvious distinction, in the United States race is bundled with a large number of unobserved characteristics. We have observed that blacks are more concentrated within circuits that impose harsh sentences compared with more lenient circuits. It is possible that blacks receive the same sentences as whites within every circuit, but that blacks receive harsher sentences than whites nationally. After we account for these circuit differences, racial disparity remains, but the point is that race is correlated with other characteristics that may account for different sentences among whites and blacks. For example, we know that blacks sentenced in the federal justice system are, on average, less educated than are whites sentenced in the federal justice system. Therefore, if judges take education into account (along with correlates such as earnings and demeanor), then racial disparity could be explained by factors that might be deemed to be reasonable desiderata when imposing sentences. A study of disparity is not a study of bias. Our study cannot get at the ultimate reasons why black males receive harsher sentences than do white males, after accounting for the facts surrounding the case. We are concerned that racial disparity has increased over time since Booker. Perhaps judges, who feel increasingly emancipated from their guidelines restrictions, are improving justice administration by incorporating relevant but previously ignored factors into their sentencing calculus, even if this improvement disadvantages black males as a class. But in a society that sees intentional and unintentional racial bias in many areas of social and economic activity, these trends are a warning sign. It is further distressing that judges disagree about the relative sentences for white and black males because those disagreements cannot be so easily explained by sentencing-relevant factors that vary systematically between black and white males. (The judge-specific effects take random variation into account.) We take the random effect as strong evidence of disparity in the imposition of sentences for white and black males. *************** References *************** Anderson, A. L., & Spohn, C. (2010). Lawlessness in the federal sentencing process: A test for uniformity and consistency in sentence outcomes. Justice Quarterly : JQ, 27(3), 362. Retrieved from: http://search.proquest.com.libproxy.highpoint.edu/docview/22816 9004?accountid=11411 Booker v. United States. 543 US 220. Supreme Court of the United States. 2005. Britt, C. L. (2009). Modeling the distribution of sentence length decisions under a guidelines system: An application of quantile regression models. Journal of Quantitative Criminology, 25(4), 341-370. doi:http://dx.doi.org.libproxy.highpoint.edu/10.1007/s10940- 009-9066-x Cole, J. M. (2012). DOJ Memo to Prosecutors: Department Policy on Early Disposition or “Fast- Track” Programs. Federal Sentencing Reporter, 25(1), 53-56. Commission, U. S. (2011). U.S. Sentencing Commission Report on Mandatory Minimum Penalties in Federal Sentencing. Washington, D.C.: U.S. Sentencing Commission. Commission, U. S. (2012). Report on the Continuing Impact of United States v. Booker on Federal Sentencing. Washington, D.C.: U.S. Sentencing Commission. Federal Sentencing Reporter, Vol. 25 No. 5, June 2013; (pp. 327-333) DOI: 10.1525/fsr.2013.25.5.327 Fishman, J., & Schanzenback, M. (2012). Racial Disparities under the Federal Sentencing Guidelines: The Role of Judicial Discretion and Mandatory Minimums. Journal of Empirical Legal Studies vol 9 (4). Frankel, M. E. (1973). Criminal sentences: Law without order. Gall v. United States. 552 US 38. Supreme Court of the United States. 2007. Gorman, T. E. (2010). Fast-Track Sentencing Disparity: Rereading Congressional Intent to Resolve the Circuit Split. University of Chicago Law Review, 77, 479. Hofer, P. J. (2012). Data, disparity, and sentencing debates: Lessons from the TRAC report on inter-judge disparity. Federal Sentencing Reporter, 25(1), 37-45. doi:http://dx.doi.org.libproxy.highpoint.edu/10.1525/fsr.2012.2 5.1.37 Johnson, B. (2012). The missing link: Examining prosecutorial decision-making across Federal District Courts. In ACJS 2012 conference, New York, NY. Koon v. United States. 518 US 81. Supreme Court of the United States. 1996. Lynch, M., & Omori, M. (2014). Legal change and sentencing norms in the wake of booker: The impact of time and place on drug trafficking cases in federal court. Law & Society Review, 48(2), 411-445. Retrieved from: http://search.proquest.com.libproxy.highpoint.edu/docview/15533 97580?accountid=1141 Mason, C., & Bjerk, D. (2013). Inter-judge sentencing disparity on the federal bench: A examination of drug smuggling cases in the southern district of california. Federal Sentencing Reporter, 25(3), 190. Retrieved from: http://search.proquest.com.libproxy.highpoint.edu/docview/13544 53416?accountid=11411 McClellan, J. L., & Sands, J. M. (2006). Federal Sentencing Guidelines and the Policy Paradox of Early Disposition Programs: A Primer on Fast-Track Sentences. Ariz. St. LJ, 38, 517. Prosecutorial Remedies and Other Tools to End the Exploitation of Children Today (PROTECT) Act of 2003, Pub. L. No. 108-21, 117 Stat. 650 § 151 (2003-2004) Starr, S. B., & Rehavi, M. M. ( 2013). Mandatory sentencing and racial disparity: Assessing the role of prosecutors and the effects of booker. Yale Law Journal, 123, 1, 2-80. Scott, R. (2010). Inter-Judge Sentencing Disparity After Booker: A First Look. Stanford Law Review vol 63 (2). Starr, S. B., & Rehavi, M. M. (2013). On Estimating Disparity and Inferring Causation: Sur-Reply to the US Sentencing Commission Staff. Yale LJ Online, 123, 273-2559. Sullivan, C. J., Mcgloin, J. M., & Piquero, A. R. (2008). Modeling the deviant Y in criminology: An examination of the assumptions of censored normal regression and potential alternatives. Journal of Quantitative Criminology, 24(4), 399- 421. doi:http://dx.doi.org.libproxy.highpoint.edu/10.1007/s10940- 008-9051-9 Ulmer, J., Light, M. T., & Kramer, J. (2011). The “Liberation” of Federal Judges’ Discretion in the Wake of the Booker/Fanfan Decision: Is There Increased Disparity and Divergence between Courts?. Justice Quarterly, 28, 6, 799-837. doi:10.1080/07418825.2011.553726 United States Sentencing Commission, Guidelines Manual, §3E1.1 (Nov. 2013) Yang, C. (2013). Free at Last? Judicial Discretion and Racial Disparities in Federal Sentencing. Chicago: Coase-Sandor Institute for Law and Economics Working Paper No. 661: The University of Chicago Law School. Yang, C. (2014). Have Inter-Judge Sentencing Disparities Increased in an Advisory Guideline Regime? Evidence from Booker. Coase-Sandor Institute for Law and Economics: Research Paper No. 662. **************************************** Appendix A: Mechanics of guidelines **************************************** The United States Sentencing Commission Guidelines Manual (2013) provides instructions for applying the guidelines. These instructions are detailed, and interested readers should consult them in the original. Our current intention is to provide an overview. The guidelines stipulate a sentencing table that has 43 rows and 6 columns defining 258 cells. The applicable row is determined by the offense seriousness and the applicable column is determined by the offender’s criminal history. Calculations required to identify the row and column are described below. The cells are clustered into four zones. A probation term is authorized in zones A and B, but a probation term in zone B must be accompanied by an alternative to confinement, such as home detention. A prison term is required in zones C and D. The sentencing table cell specifies the length of the prison term. For example, an offense level of 16 and a criminal history category of IV require a prison term between 33 and 41 months. Even when the guidelines were mandatory, a judge could depart upward or downward from the stipulated range. Now that the guidelines are advisory, there is no obligation to adhere to the range. We discuss departure reasons later in this appendix; first we review how the guidelines establish the offense level. ***************** Offense level ***************** Determination of offense level begins with the basic offense. For example, chapter 2 in the guidelines defines an aggravated assault. A judge who is sentencing an offender convicted of an aggravated assault starts with a baseline offense level of 14 and considers other aspects of the case. Instructions are-- * If the crime involved more than minimal planning, add 2 points, increasing the offense level from 14 to 16. * Add 3 to 5 points depending on the nature of the weapon (if any) and how it was used. * Add 3 to 10 points depending on the extent of injury. * Add 2 points if the crime was motivated by profit. * Add 2 points if certain statutory provisions are met. The guidelines provide detailed instructions and definitions. Other offenses have different baseline offense levels and recognize case elements that distinguish cases based on severity within an offense type. Points for property crimes are determined by the dollar loss. Points for drug crimes are determined by the type and amount of drugs bought or sold. Offense categories sometimes overlap, and the guidelines provide cross-references to resolve ambiguities. Some elements of criminal cases are common to multiple types of cases. These elements are identified in chapter 3, where they are called adjustments, and consist of five categories: * Victim adjustments (e.g., an enhancement for a vulnerable victim, as defined by the guidelines). * The offender’s role in the offense (e.g., points are added if the offender leads a criminal enterprise, and points are subtracted if the offender was a minimal participant). * Points are added if the offender obstructed or impeded the administration of justice. * Offenders are often convicted of multiple counts for the same type of offense or for different offenses. The guidelines provide rules for imposing a sentence given multiple counts of conviction. * The guidelines apply what they call “acceptance of responsibility provisions.” If the offender clearly demonstrates acceptance of responsibility for his or her offense, the offense level is decreased by 2 levels. Under some conditions, on motion of the government stating that the offender has assisted authorities in the investigation or prosecution of his or her misconduct by timely notifying authorities of an intention to enter a guilty plea, the offense level is decreased by 1 additional level. Criminal history category ****************************** Chapter 4 provides rules for determining the criminal history category. This chapter applies points to the offender’s criminal record, taking into account prior sentences and whether the instant crime was done while the offender was under community supervision. The criminal history score makes special provisions for career criminals and criminal livelihoods. Departures ************* Using the offense level from chapter 3 and the criminal history category from chapter 4, calculations identify the guideline cell. The guidelines sometimes use the term heartland to mean the guidelines capture most of the elements of the offense and offender. As a result, most sentences should be imposed consistent with the guideline cell. The guidelines prohibit departures under some conditions and allow departures for others. In fact, because the guidelines are now voluntary, there is great latitude for departures. Some latitude for departures is built into the guidelines, and its presence needs to be recognized by the study of disparity--a point made below. Mandatory minimum sentences **************************** Federal criminal codes specify maximum sentences for all crimes. For example, a code might specify that an offender can serve 0 to 5 years if convicted for a count of larceny. There is no minimum prison term. Other federal criminal codes-- especially for drug violations--specify both a minimum and a maximum. For example, if someone is convicted of distributing X grams of cocaine, the code may specify that the sentence is between 2 and 5 years. The guidelines might then require a sentence between 2 and 2½ years. The minimum sentence sets a lower limit on the guidelines and on any legitimate sentence. However, federal law (18 U.S.C. 3553(f)(1)-(5)) allows the court to sentence below the mandatory minimum when the following hold: * The offender has no more than 1 criminal history point. * He or she did not use violence or credible threats. * There was neither death nor serious bodily injury. * The offender was neither an organizer, leader, manager, or supervisor of a criminal enterprise. * The offender revealed all known information about the crime. Provided the minimum sentence is 5 years or more, the minimum guidelines range may be reduced, but no lower than level 17. Substantial assistance ************************ The sentencing judge is able to depart downward on a motion by the government that the offender “…has provided substantial assistance in the investigation or prosecution of another person who has committed an offense…” Commentary in the guidelines instructs: “Substantial weight should be given to the government’s evaluation of the extent of the defendant’s assistance, particularly where the extent and value of the assistance are difficult to ascertain.” Government-initiated downward departures are frequent. Warranted departures ********************* While the guidelines are intended to cover most circumstances, the Commission indicates that the sentencing judge may confront situations where the circumstances faced by the court are so unusual that applying the guidelines would be an injustice. In those cases, the sentencing judge can depart from the guidelines provided he or she provides an explanation. The guidelines also provide policy statements identifying special circumstances when a departure would apply. For example, if a victim or victims suffered psychological injury much more serious than that normally resulting from commission of the offense, the court may increase the sentence above the authorized guidelines range. In addition, the guidelines offer numerous examples of when a departure would be appropriate. Prohibited departures ************************ The guidelines identify factors that cannot be taken into account when departing from the guidelines range. A judge cannot base a sentence on race, sex, national origin, religion, and reconsider the weighting of factors, such as acceptance of responsibility and role in the offense, that are already incorporated into the guidelines. Characteristics of the offender ********************************* Chapter 5, part H, discusses some specific characteristics of offenders that may not be taken into account at the time of sentencing. Referring to the Sentencing Reform Act, according to the guidelines manual: First, the act directs the Commission to ensure that the guidelines and policy statements "are entirely neutral" as to five characteristics—race, sex, national origin, creed, and socioeconomic status. See 28 U.S.C. § 994(d). Second, the act directs the Commission to consider whether 11 specific offender characteristics, "among others," have any relevance to the nature, extent, place of service, or other aspects of an appropriate sentence, and to take them into account in the guidelines and policy statements, only to the extent that they do have relevance. See 28 U.S.C. § 994(d). Third, the act directs the Commission to ensure that the guidelines and policy statements, in recommending a term of imprisonment or length of a term of imprisonment, reflect the "general inappropriateness" of considering five of those characteristics—education, vocational skills, employment record, family ties and responsibilities, and community ties. See 28 U.S.C. § 994(e). Fourth, the act also directs the sentencing court, in determining the particular sentence to be imposed, to consider, among other factors, "the history and characteristics of the defendant." See 18 U.S.C. § 3553(a)(1). According to the Commission: ***************************** The Supreme Court has emphasized that the advisory guideline system should "continue to move sentencing in Congress’ preferred direction, helping to avoid excessive sentencing disparities while maintaining flexibility sufficient to individualize sentences where necessary. See United States v. Booker, 543 U.S. 220, 264-65 (2005). Although the court must consider "the history and characteristics of the defendant" among other factors, see 18 U.S.C. § 3553(a), in order to avoid unwarranted sentencing disparities the court should not give them excessive weight. Generally, the most appropriate use of specific offender characteristics is to consider them not as a reason for a sentence outside the applicable guideline range but for other reasons, such as in determining the sentence within the applicable guideline range, the type of sentence (e.g., probation or imprisonment) within the sentencing options available for the applicable Zone on the Sentencing Table, and various other aspects of an appropriate sentence. To avoid unwarranted sentencing disparities among defendants with similar records who have been found guilty of similar conduct, see 18 U.S.C. § 3553(a)(6), 28 U.S.C. § 991(b)(1)(B), the guideline range, which reflects the defendant’s criminal conduct and the defendant’s criminal history, should continue to be "the starting point and the initial benchmark." Gall v. United States, 552 U.S. 38, 49 (2007). Accordingly, the purpose of this part is to provide sentencing courts with a framework for addressing specific offender characteristics in a reasonably consistent manner. Using such a framework in a uniform manner will help "secure nationwide consistency" (see Gall v. United States, 552 U.S. 38, 49 (2007)), "avoid unwarranted sentencing disparities" (see 28 U.S.C. § 991(b)(1)(B), 18 U.S.C. § 3553(a)(6)), "provide certainty and fairness" (see 28 U.S.C. § 991(b)(1)(B)), and "promote respect for the law" (see 18 U.S.C. § 3553(a)(2)(A)). The Commission identified several offender characteristics regarding which sentencing judges may have dissenting views. The Commission deemed that age may be relevant but considered the situation where the frail may not require prison. Education and vocational skills are considered irrelevant, unless they are pertinent to the crime. Mental and emotional conditions may be relevant but only in extreme circumstances. Similar to age, physical condition may be relevant. Drug and alcohol dependence is ordinarily not a reason for a departure, unless it accomplishes a specific treatment purpose. Employment is ordinarily irrelevant. Family ties and responsibilities are not ordinarily relevant, although the Commission makes exceptions for loss of caretaking and financial support.