29. Sobre escalas cualitativas y los errores que se cometen al utilizarlas


(Tiempo de lectura: 15 minutos) En esta oportunidad vamos a tocar el tema de la construcción de escalas cualitativas y su empleo en modelos de decisión multicriterio. El texto ha sido extraído de mi tesis doctoral. Desafortunadamente carezco de tiempo para hacer la traducción, pero confío en que la gran mayoría de los lectores de este blog tengan fluidez en la lectura del Inglés.

Texto extraído de: Sanchez-Lopez, R., 2008. Evaluating development projects based on multiple intangible criteria: Theoretical framework and applications to coca producing regions of Bolivia, Ph.D. thesis, Ghent University, Ghent, Belgium, ISBN: 978-90-5989-214-9
 

A scale can be understood as a set of referential entities, known as scale levels, associated with an attribute. Scale levels are instrumental for judging the performance of an object regarding an attribute of interest. The main purpose of the scale is to provide information as to how an object (i.e., an Alternative) performs with respect to the scale levels that compose the scale.

Consistently, let us understand the term "scaling" as the process of identifying a collection of referential entities and ordering them with respect to their performance on a given attribute. The process of scaling normally includes some visual representation, usually a linear or multidimensional "map", which should retain the basic information about the objects and the references that make part of the scale.

Types of scales

In very broad terms, scales can be either quantitative or qualitative. In what follows, we offer an overview of the different ways of describing the performance of objects with respect to attributes. We will adopt Bana e Costa's terminology, who uses the term "descriptor" to refer to an ordered set of impact levels associated with an attribute (Bana e Costa and Beinat, 2005). The term "descriptor" can be seen as a homonymous of "scale"; and "impact level" as homonymous of "scale level". However, "descriptor" and "scale" are not exactly the same thing. Descriptor refers to how scale levels are defined semantically. The difference being so subtle, we will use the terms "scale" and "descriptor" interchangeably.

There are different kinds of descriptors:

Natural descriptors are descriptors whose impact levels directly reflect the state of objects. For instance: the weigh of an object can naturally be measured in kilograms.

Proxy descriptors are descriptors whose impact levels indicate causes more than effects. The levels of a proxy descriptor reflect the degrees to which an associated goal is met but do not directly measure the object. Thus, proxy descriptors indirectly measure the achievement of the object with respect to a stated goal (Keeney and Raiffa, 1993, Page 55). For instance: Gross Domestic Product to measure the economic health of a country, or the Human Development Index to measure the general development of a particular society.

Constructed descriptors are descriptors whose impact levels describe the performance of objects on attributes that are too complex to offer the possibility to use natural or proxy descriptors.

In turn, there are different types of constructed descriptors:

One dimensional qualitative scales are descriptors whose impact levels are non-numerical descriptions of the performance of objects on attributes. For instance, an example extracted from (Belton and Stewart, 2002): The Beaufort wind force scale developed by Francis Beaufort in 1805, which is an "empirical" measure for describing wind intensity based on observed sea condition.
  • (Level 1) Calm, sea like a mirror;
  • (Level 2) Light air, ripples only;
  • (Level 3) Light breeze, small wavelets (0.2m), crests have a glassy appearance;
  • (Level 4) Gentle breeze, small wavelets (0.6m), crests begin to break;
  • (Level 5) Moderate breeze, small waves (1m), some white horses;
  • (Level 6) Fresh breeze, moderate waves (1.8m), many white horses;
  • (Level 7) Strong breeze, large waves (3m), probably some spray;
  • (Level 8) Near gale, mounting sea (4m) with foam blown in streaks downwind;
  • (Level 9) Gale, moderately high waves (7m), dense foam, visibility affected;
  • (Level 10) Storm, very high waves (9m), heavy sea roll, visibility impaired. Surface generally white;
  • (Level 11) Violent storm, exceptionally high waves (11m), visibility poor;
  • (Level 12) Hurricane, 14m waves, air filled with foam and spray, visibility bad.
Another example is the Mercalli intensity scale developed by Giuseppe Mercalli in 1883, which is a scale used for describing the intensity of an earthquake. Notice that this particular scale considers the terms "Instrumental", "Feeble, "Slight", ... but in order to avoid ambiguity, there is a description of its meaning attached to every scale level.
  • "Instrumental". Not felt except by a very few under especially favourable conditions.
  • "Feeble". Felt only by a few persons at rest, especially on upper floors of buildings. Delicately suspended objects may swing.
  • "Slight". Felt quite noticeably by persons indoors, especially on the upper floors of buildings. Many do not recognize it as an earthquake. Standing motor cars may rock slightly. Vibration similar to the passing of a truck. Duration can be estimated.
  • "Moderate". Felt indoors by many, outdoors by few during the day. At night, some awakened. Dishes, windows, doors disturbed; walls make cracking sound. Sensation like heavy truck striking building. Standing motor cars rocked noticeably. Dishes and windows rattle.
  • "Rather Strong". Felt by nearly everyone; many awakened. Some dishes and windows broken. Unstable objects overturned. Clocks may stop.
  • "Strong". Felt by all; many frightened and run outdoors, walk unsteadily. Windows, dishes, glassware broken; books off shelves; some heavy furniture moved or overturned; a few instances of fallen plaster. Damage slight.
  • "Very Strong". Difficult to stand; furniture broken; damage negligible in building of good design and construction; slight to moderate in well-built ordinary structures; considerable damage in poorly built or badly designed structures; some chimneys broken. Noticed by persons driving motor cars.
  • "Destructive". Damage slight in specially designed structures; considerable in ordinary substantial buildings with partial collapse. Damage great in poorly built structures. Fall of chimneys, factory stacks, columns, monuments, walls. Heavy furniture moved.
  • "Ruinous". General panic; damage considerable in specially designed structures, well designed frame structures thrown out of plumb. Damage great in substantial buildings, with partial collapse. Buildings shifted off foundations.
  • "Disastrous". Some well built wooden structures destroyed; most masonry and frame structures destroyed with foundation. Rails bent.
  • "Very Disastrous". Few, if any masonry structures remain standing. Bridges destroyed. Rails bent greatly.
  • "Catastrophic". Total damage - Almost everything is destroyed. Lines of sight and level distorted. Objects thrown into the air. The ground moves in waves or ripples. Large amounts of rock may move.
Pictorial descriptors are descriptors whose impact levels are visual representations (pictures, graphs, etc.) , of the performance of options when descriptive linguistic statements are not enough to convey the notions of the attribute being analyzed. For instance, such a type of scale can be used to evaluate the performance on "Landscape" when used as an evaluation criterion.

Constructed indices are descriptors whose levels are the mathematical combination of two or more quantitative variables. According to Bana e Costa and Beinat (Bana e Costa and Beinat, 2005), despite their quantitative nature, indices always represent a compromise between scientific accuracy and concise information.

Multidimensional qualitative scales are descriptors whose levels are linguistic combinations of the levels of other one-dimensional qualitative scales.

Common methodological mistakes

Facing the impossibility to find out natural empirical references for certain issues, decision makers often face "soft" criteria though the use of qualitative scales. This is especially common in the social sciences, where the use of qualitative scales is often flawed with the following three methodological mistakes.

  • Qualitative scales are constructed using ambiguous scale levels:
Common examples are: 

{"very good", "good", "acceptable", "bad", "very bad"} 

or 

{"no impact", "minor impact", "major impact"}. 

Ambiguity arises when two evaluators are able to use the same scale level to describe two objects that are clearly different. For instance, the term "good" is an ambiguous term since its meaning depends on the evaluator's demands. 

In contrast, subjectivity arises when two evaluators can legitimately use two different scale levels to describe the same object

Subjectivity is inherent to multicriteria analysis, but ambiguity must be eliminated from the model. In order to prevent decision makers from being misled by ambiguous scale levels, the qualitative levels of the scale need to be either attached to corresponding unambiguous descriptions, for instance in the form of written statements or graphical representations (for instance the Beaufort or Mercally scales just described above), or they should be interpreted on their ordinal connotation, without attaching to them any notion of difference measurement or strength of preference (Krantz et al., 1971).

  • Arbitrary numerical values assigned to qualitative scale levels are decisive in the final ranking of alternatives:
As obvious as it may be, this concept is seldom well understood by practitioners on multicriteria analysis: the way a given scale is constructed should not affect the ordering of alternatives. A common example refers to arbitrary numerical values of ordinal nature, such as 

{"very good = 5", "good = 4", "acceptable = 3", "bad = 2", "very bad = 1"}, 

that are utilized in arithmetical operations (Bana e Costa and Beinat, 2005). 

Instead of the arbitrary set {5, 4, 3, 2, 1} of numerical values attached to the scale levels, we could have proposed for instance {10^5, 10^4, 10^3, 10^2, 10^1} or {5, 4.5, 3.2, 1.8, 1}. French (French, 1988, Page 76) shows that arithmetic calculations (like the statistical mean) on ordinal values are meaningless. 

Another example refers to intervals attached to scale levels, which define abrupt changes in value due to small changes in performance. For instance, 

"Level 1": [From 0% to 10%] 
"Level 2": [From 10.1% to 20%]
"Level 3": [From 20.1% to 30%]
"Level 4": ... 

In this example, two alternatives performing at levels 10.0% and 10.1% respectively would be assigned to two different scale levels(Level 1 and Level 2), despite of the alternatives being almost identical. To our view, conclusions should not be influenced by arbitrary choices like the boundaries of categories or the cardinal meaning of ordinal values.

  • Hidden uncertainty behind the use of precise numbers:
During our experience in intangible criteria evaluation contexts we have observed how practitioners in charge of evaluations frequently associate the idea of a numerical evaluation with the idea of an objective and accurate evaluation. Precise numbers do not always reflect accurate measurements, since the concept of accuracy is meaningful only when there is a valid claim of the existence of a true value, and when the measurement instrument is coherent with the precision of results. When the evaluation process requires the intervention of a group of evaluators, scales need to become the instrument that parties use for communication. Therefore, hidden uncertainty is especially unsound when participatory decision making processes are being undertaken in the context of qualitative criteria evaluation.

In what follows we offer a real-world example of a potentially misleading scale (this time in Spanish...):

Ejemplo:

En un estudio real, el análisis está basado en dos escalas de evaluación. Una que mide la importancia de los criterios utilizando una escala cualitativa de tres niveles: “Neutral”, “Importante” y “Muy importante”, a los que se asocia arbitrariamente los valores 1, 2 y 3.

Escala de evaluación de la importancia de los criterios:

"Neutral" = 1
"Importante" = 2
"Muy importante" = 3

Por otro lado, se trabaja sobre una escala cualitativa de evaluación en los criterios compuesta también por 3 niveles: “Desfavorable”, “Favorable” y “Muy favorable”, a los que se asocia también arbitrariamente los valores 1, 2 y 3:

Escala de evaluación del desempeño de las alternativas:

"Desfavorable" = 1
"Favorable" = 2
"Muy favorable" = 3

¿Por qué se asocian los números 1, 2 y 3, y no por ejemplo 10^1, 10^2 y 10^3? No hay respuesta a esta pregunta porque los valores se han asociado arbitrariamente. La asignación de valores cuantitativos a los distintos niveles de las escalas ha sido hecha en este caso sin mediar ningún proceso que permita justificar dichos valores. Una escala cardinal (es decir, cuantitativa, numérica) posee mayor información que una escala ordinal (es decir, cualitativa, lingüística), y ese enriquecimiento de información, ese "salto" de lo ordinal a lo cardinal, debe ser justificado metodológicamente. Debe mediar un proceso matemático, racional, lógico que lo justifique.

Supongamos que el análisis concierne dos alternativas, la alternativa V2 con una valoración global de 34 y la alternativa Vh con una valoración global de 33:

CRITERIOS     PESO                                         Valoración V2      Valoración Vh
Criterio 1             "Muy importante"=3                    1x3=3                    1x3=3
Criterio 2             "Importante"=2                            3x2=6                    3x2=6
Criterio 3             "Importante"=2                            3x2=6                    1x2=2
Criterio 4             "Muy importante"=3                     1x3=3                    2x3=6
Criterio 5             "Muy importante"=3                     1x3=3                    1x3=3
Criterio 6             "Muy importante"=3                     1x3=3                    1x3=3
Criterio 7             "Importante"=2                             2x2=4                    1x2=2
Criterio 8             "Muy importante"3                       1x3=3                    2x3=6
Criterio 9             "Neutral"=1                                  3x1=3                    2x1=2
                                                                               Total V2=34        Total Vh=33

V2 > Vh. Es decir, V2 es mejor que Vh.

Hagamos ahora la siguiente consideración: 

Nótese que en la escala {"Desfavorable"=1; "Favorable"=2 ; "Muy favorable"=3} existe la misma distancia (o diferencia) entre cada uno de los tres niveles, es decir, 1. Pero... ¿es razonable aceptar que exista la misma distancia entre lo "Muy favorable" y lo "Favorable", que entre lo "Favorable" y lo "Desfavorable"? Parece que no!: lo "Muy favorable" y lo "Favorable" deberían ser más próximos que lo "Favorable y lo "Desfavorable".  Al fin y al cabo, si una alternativa es "Muy favorable" o solo "Favorable", podría ser aceptada, mientras que si la alternativa es "Desfavorable" tendremos mucho más cuidado en aceptarla.

Por lo tanto, podría proponerse (de forma igualmente arbitraria) la siguiente escala, acercando la distancia ente lo "Muy Favorable" y lo "Favorable" y alejando la distancia entre lo "Favorable" y lo "Desfavorable":

Desfavorable=1
Favorable=2,5
Muy favorable=3

Tan solo con este cambio, haciendo los cálculos correspondientes, resulta que la alternativa V2 obtiene un valor global de 35 y la alternativa Vh un valor de 36,5.

Por lo tanto, Vh > V2. Es decir, Vh es mejor que V2.

Se ha experimentado un cambio en el orden de las alternativas a causa de una consideración absolutamente arbitraria, como es el asociar a-priori valores numéricos a los niveles de una escala cualitativa.

Es por este motivo que el "salto" desde una escala ordinal a una escala cardinal, es decir, el proceso de asociar valores cuantitativos a los niveles de las escalas cualitativas, debe ir acompañado siempre por una justificación metodológica basada en la lógica matemática y la racionalidad.

---

Las consideraciones arbitrarias (sin justificación) en un modelo de análisis multicriterio no deben tener efecto sobre la ordenación de las alternativas!

---

References:

C. A. Bana e Costa and E. Beinat (2005), 'Model-structuring in public decision-aiding', in Working Paper Series: The London School of Economics and Political Science.

R. L. Keeney and H. Raiffa (1993), Decision with Multiple Objectives: Preferences and Value Tradeoffs: Cambridge University Press.

D. H. Krantz, R. D. Luce, P. Suppes and A. Tverksky (1971), Foundations of Measurement. Volume I: Additive and Polynomial Representations, New York: Academic Press, INC.

S. French (1988), Decision Theory: An Introduction to the Mathematics of Rationality Ellis Horwood Limited.

V. Belton and T. J. Stewart (2002), Multiple Criteria Decision Analysis: An Integrated Approach: Kluwer Academic Publishers.

Entradas populares de este blog

18. Atribuir pesos a criterios de evaluación: ¿una trampa?

34. El modelo MACBETH y el software M-MACBETH para el análisis multicriterio

05. Las siete etapas del Análisis Multicriterio