29. Sobre escalas cualitativas y los errores que se cometen al utilizarlas
(Tiempo de lectura: 15 minutos) En esta oportunidad vamos a tocar el tema de la construcción de escalas cualitativas y su empleo en modelos de decisión multicriterio. El texto ha sido extraído de mi tesis doctoral. Desafortunadamente carezco de tiempo para hacer la traducción, pero confío en que la gran mayoría de los lectores de este blog tengan fluidez en la lectura del Inglés.
Texto extraído de: Sanchez-Lopez, R., 2008. Evaluating development projects based on multiple intangible criteria: Theoretical framework and applications to coca producing regions of Bolivia, Ph.D. thesis, Ghent University, Ghent, Belgium, ISBN: 978-90-5989-214-9
Visítanos: Ramiro A. Sánchez López | Vista Sur
A scale can be understood as a set of referential entities, known as scale levels, associated with an attribute. Scale levels are instrumental for judging the performance of an object regarding an attribute of interest. The main purpose of the scale is to provide information as to how an object (i.e., an Alternative) performs with respect to the scale levels that compose the scale.
Consistently,
let us understand the term "scaling" as the process of identifying a collection of referential entities and
ordering them with respect to their performance on a given attribute. The
process of scaling normally includes some visual representation, usually a linear
or multidimensional "map", which should retain the basic information
about the objects and the references that make part of the scale.
In very broad terms, scales can be either quantitative or qualitative. In what follows, we offer an overview of the different ways of describing the performance of objects with respect to attributes. We will adopt Bana e Costa's terminology, who uses the term "descriptor" to refer to an ordered set of impact levels associated with an attribute (Bana e Costa and Beinat, 2005). The term "descriptor" can be seen as a homonymous of "scale"; and "impact level" as homonymous of "scale level". However, "descriptor" and "scale" are not exactly the same thing. Descriptor refers to how scale levels are defined semantically. The difference being so subtle, we will use the terms "scale" and "descriptor" interchangeably.
There are
different kinds of descriptors:
Natural descriptors are descriptors whose impact levels directly reflect the state of
objects. For instance: the weigh of an object can naturally be measured in
kilograms.
Proxy descriptors are descriptors whose impact levels indicate causes more than effects.
The levels of a proxy descriptor reflect the degrees to which an associated
goal is met but do not directly measure the object. Thus, proxy descriptors
indirectly measure the achievement of the object with respect to a stated goal
(Keeney and Raiffa, 1993, Page 55). For instance: Gross Domestic Product to
measure the economic health of a country, or the Human Development Index to
measure the general development of a particular society.
Constructed
descriptors are
descriptors whose impact levels describe the performance of objects on
attributes that are too complex to offer the possibility to use natural or
proxy descriptors.
In turn, there
are different types of constructed descriptors:
One dimensional qualitative scales are descriptors whose impact levels are non-numerical
descriptions of the performance of objects on attributes. For instance, an example extracted from (Belton and Stewart,
2002): The Beaufort wind force scale developed by Francis Beaufort in 1805,
which is an "empirical" measure for describing wind intensity based
on observed sea condition.
- (Level 1) Calm, sea like a mirror;
- (Level 2) Light air, ripples only;
- (Level 3) Light breeze, small wavelets (0.2m), crests have a glassy appearance;
- (Level 4) Gentle breeze, small wavelets (0.6m), crests begin to break;
- (Level 5) Moderate breeze, small waves (1m), some white horses;
- (Level 6) Fresh breeze, moderate waves (1.8m), many white horses;
- (Level 7) Strong breeze, large waves (3m), probably some spray;
- (Level 8) Near gale, mounting sea (4m) with foam blown in streaks downwind;
- (Level 9) Gale, moderately high waves (7m), dense foam, visibility affected;
- (Level 10) Storm, very high waves (9m), heavy sea roll, visibility impaired. Surface generally white;
- (Level 11) Violent storm, exceptionally high waves (11m), visibility poor;
- (Level 12) Hurricane, 14m waves, air filled with foam and spray, visibility bad.
- "Instrumental". Not felt except by a very few under especially favourable conditions.
- "Feeble". Felt only by a few persons at rest, especially on upper floors of buildings. Delicately suspended objects may swing.
- "Slight". Felt quite noticeably by persons indoors, especially on the upper floors of buildings. Many do not recognize it as an earthquake. Standing motor cars may rock slightly. Vibration similar to the passing of a truck. Duration can be estimated.
- "Moderate". Felt indoors by many, outdoors by few during the day. At night, some awakened. Dishes, windows, doors disturbed; walls make cracking sound. Sensation like heavy truck striking building. Standing motor cars rocked noticeably. Dishes and windows rattle.
- "Rather Strong". Felt by nearly everyone; many awakened. Some dishes and windows broken. Unstable objects overturned. Clocks may stop.
- "Strong". Felt by all; many frightened and run outdoors, walk unsteadily. Windows, dishes, glassware broken; books off shelves; some heavy furniture moved or overturned; a few instances of fallen plaster. Damage slight.
- "Very Strong". Difficult to stand; furniture broken; damage negligible in building of good design and construction; slight to moderate in well-built ordinary structures; considerable damage in poorly built or badly designed structures; some chimneys broken. Noticed by persons driving motor cars.
- "Destructive". Damage slight in specially designed structures; considerable in ordinary substantial buildings with partial collapse. Damage great in poorly built structures. Fall of chimneys, factory stacks, columns, monuments, walls. Heavy furniture moved.
- "Ruinous". General panic; damage considerable in specially designed structures, well designed frame structures thrown out of plumb. Damage great in substantial buildings, with partial collapse. Buildings shifted off foundations.
- "Disastrous". Some well built wooden structures destroyed; most masonry and frame structures destroyed with foundation. Rails bent.
- "Very Disastrous". Few, if any masonry structures remain standing. Bridges destroyed. Rails bent greatly.
- "Catastrophic". Total damage - Almost everything is destroyed. Lines of sight and level distorted. Objects thrown into the air. The ground moves in waves or ripples. Large amounts of rock may move.
Constructed indices are descriptors whose levels are the mathematical combination of two or
more quantitative variables. According to Bana e Costa and Beinat (Bana e Costa
and Beinat, 2005), despite their quantitative nature, indices always represent
a compromise between scientific accuracy and concise information.
Multidimensional qualitative scales are descriptors whose levels are linguistic combinations
of the levels of other one-dimensional qualitative scales.
Facing the impossibility to find out natural empirical references for certain issues, decision makers often face "soft" criteria though the use of qualitative scales. This is especially common in the social sciences, where the use of qualitative scales is often flawed with the following three methodological mistakes.
- Qualitative scales are constructed using ambiguous scale levels:
Common examples are:
{"very good", "good", "acceptable", "bad", "very bad"}
or
{"no impact", "minor impact", "major impact"}.
Ambiguity arises when two evaluators are able to use the same scale level to describe two objects that are clearly different. For instance, the term "good" is an ambiguous term since its meaning depends on the evaluator's demands.
In contrast, subjectivity arises when two evaluators can legitimately use two different scale levels to describe the same object.
Subjectivity is inherent to multicriteria analysis, but ambiguity must be eliminated from the model. In order to prevent decision makers from being misled by ambiguous scale levels, the qualitative levels of the scale need to be either attached to corresponding unambiguous descriptions, for instance in the form of written statements or graphical representations (for instance the Beaufort or Mercally scales just described above), or they should be interpreted on their ordinal connotation, without attaching to them any notion of difference measurement or strength of preference (Krantz et al., 1971).
- Arbitrary numerical values assigned to qualitative scale levels are decisive in the final ranking of alternatives:
{"very good = 5", "good = 4", "acceptable = 3", "bad = 2", "very bad = 1"},
that are utilized in arithmetical operations (Bana e Costa and Beinat, 2005).
Instead of the arbitrary set {5, 4, 3, 2, 1} of numerical values attached to the scale levels, we could have proposed for instance {10^5, 10^4, 10^3, 10^2, 10^1} or {5, 4.5, 3.2, 1.8, 1}. French (French, 1988, Page 76) shows that arithmetic calculations (like the statistical mean) on ordinal values are meaningless.
Another example refers to intervals attached to scale levels, which define abrupt changes in value due to small changes in performance. For instance,
"Level 1": [From 0% to 10%]
"Level 2": [From 10.1% to 20%]
"Level 3": [From 20.1% to 30%]
"Level 4": ...
In this example, two alternatives performing at levels 10.0% and 10.1% respectively would be assigned to two different scale levels(Level 1 and Level 2), despite of the alternatives being almost identical. To our view, conclusions should not be influenced by arbitrary choices like the boundaries of categories or the cardinal meaning of ordinal values.
- Hidden uncertainty behind the use of precise numbers:
During our experience in
intangible criteria evaluation contexts we have observed how practitioners in charge
of evaluations frequently associate the idea of a numerical evaluation with the
idea of an objective and accurate evaluation. Precise numbers do not always
reflect accurate measurements, since the concept of accuracy is meaningful only
when there is a valid claim of the existence of a true value, and when the
measurement instrument is coherent with the precision of results. When the
evaluation process requires the intervention of a group of evaluators, scales
need to become the instrument that parties use for communication. Therefore, hidden
uncertainty is especially unsound when participatory decision making processes
are being undertaken in the context of qualitative criteria evaluation.
In what
follows we offer a real-world example of a potentially misleading scale (this
time in Spanish...):
Ejemplo:
En un estudio real, el análisis está basado en dos escalas de evaluación. Una que mide la importancia de los criterios utilizando una escala cualitativa de tres niveles: “Neutral”, “Importante” y “Muy importante”, a los que se asocia arbitrariamente los valores 1, 2 y 3.
En un estudio real, el análisis está basado en dos escalas de evaluación. Una que mide la importancia de los criterios utilizando una escala cualitativa de tres niveles: “Neutral”, “Importante” y “Muy importante”, a los que se asocia arbitrariamente los valores 1, 2 y 3.
Escala de evaluación
de la importancia de los criterios:
"Neutral" =
1
"Importante" = 2
"Muy importante" = 3
"Importante" = 2
"Muy importante" = 3
Por otro lado, se trabaja sobre una escala cualitativa de
evaluación en los criterios compuesta también por 3 niveles: “Desfavorable”, “Favorable”
y “Muy favorable”, a los que se asocia también arbitrariamente los valores 1, 2
y 3:
Escala de evaluación
del desempeño de las alternativas:
"Desfavorable"
= 1
"Favorable" = 2
"Muy favorable" = 3
"Favorable" = 2
"Muy favorable" = 3
¿Por qué se asocian los números 1, 2 y 3, y no por ejemplo
10^1, 10^2 y 10^3? No hay respuesta a esta pregunta porque los valores se han
asociado arbitrariamente. La asignación de valores cuantitativos a los
distintos niveles de las escalas ha sido hecha en este caso sin mediar ningún
proceso que permita justificar dichos valores. Una escala cardinal (es decir, cuantitativa,
numérica) posee mayor información que una escala ordinal (es decir, cualitativa,
lingüística), y ese enriquecimiento de información, ese "salto" de lo
ordinal a lo cardinal, debe ser justificado metodológicamente. Debe mediar un
proceso matemático, racional, lógico que lo justifique.
Supongamos que el análisis concierne dos alternativas, la
alternativa V2 con una valoración global de 34 y la alternativa Vh
con una valoración global de 33:
CRITERIOS PESO Valoración V2 Valoración Vh
Criterio 1 "Muy
importante"=3 1x3=3 1x3=3
Criterio 2 "Importante"=2 3x2=6 3x2=6
Criterio 3 "Importante"=2 3x2=6 1x2=2
Criterio 4 "Muy
importante"=3 1x3=3 2x3=6
Criterio 5 "Muy
importante"=3 1x3=3 1x3=3
Criterio 6 "Muy
importante"=3 1x3=3 1x3=3
Criterio 7 "Importante"=2
2x2=4 1x2=2
Criterio 8 "Muy
importante"3 1x3=3 2x3=6
Criterio 9 "Neutral"=1 3x1=3 2x1=2
Total V2=34 Total Vh=33
V2 > Vh. Es
decir, V2 es mejor que Vh.
Hagamos ahora la siguiente consideración:
Nótese que en la
escala {"Desfavorable"=1; "Favorable"=2 ; "Muy favorable"=3} existe la misma distancia (o
diferencia) entre cada uno de los tres niveles, es decir, 1. Pero... ¿es razonable
aceptar que exista la misma distancia entre lo "Muy favorable" y lo
"Favorable", que entre lo "Favorable" y lo
"Desfavorable"? Parece que no!: lo "Muy favorable" y lo
"Favorable" deberían ser más próximos que lo "Favorable y lo
"Desfavorable". Al fin y al
cabo, si una alternativa es "Muy favorable" o solo
"Favorable", podría ser aceptada, mientras que si la alternativa es
"Desfavorable" tendremos mucho más cuidado en aceptarla.
Por lo tanto, podría proponerse (de forma igualmente arbitraria) la siguiente escala,
acercando la distancia ente lo "Muy Favorable" y lo
"Favorable" y alejando la distancia entre lo "Favorable" y
lo "Desfavorable":
Desfavorable=1
Favorable=2,5
Muy favorable=3
Tan solo con este cambio, haciendo los cálculos
correspondientes, resulta que la alternativa V2 obtiene un valor global de 35 y la alternativa Vh un valor de 36,5.
Por lo tanto, Vh > V2.
Es decir, Vh es mejor que V2.
Se ha experimentado un cambio en el orden de las
alternativas a causa de una
consideración absolutamente arbitraria, como es el asociar a-priori valores numéricos a los niveles
de una escala cualitativa.
Es por este motivo
que el "salto" desde una escala ordinal a una escala cardinal, es decir,
el proceso de asociar valores cuantitativos a los niveles de las escalas
cualitativas, debe ir acompañado siempre por una justificación metodológica basada en la lógica matemática y la racionalidad.
---
Las
consideraciones arbitrarias (sin justificación) en un modelo de análisis multicriterio no deben tener
efecto sobre la ordenación de las alternativas!
---
References:
C. A. Bana
e Costa and E. Beinat (2005), 'Model-structuring in public decision-aiding', in
Working Paper Series: The London School of Economics and Political Science.
R. L.
Keeney and H. Raiffa (1993), Decision with Multiple Objectives: Preferences and
Value Tradeoffs: Cambridge University Press.
D. H.
Krantz, R. D. Luce, P. Suppes and A. Tverksky (1971), Foundations of
Measurement. Volume I: Additive and Polynomial Representations, New York:
Academic Press, INC.
S. French
(1988), Decision Theory: An Introduction to the Mathematics of Rationality
Ellis Horwood Limited.
V. Belton
and T. J. Stewart (2002), Multiple Criteria Decision Analysis: An Integrated Approach:
Kluwer Academic Publishers.