Beyond categorical definitions of life:
a data-driven approach to assessing lifeness
Christophe Malaterre(*), Département de philosophie et Centre
interuniversitaire de recherche sur la science et la technologie (CIRST),
Université du Québec à Montréal, Case postale 8888, succursale Centre-Ville,
Montréal (Québec) H3C 3P8, Canada
Phone: +1 514 850 9781
Email: malaterre.christophe@uqam.ca
https://orcid.org/0000-0003-1413-6710
(*) To whom correspondence should be addressed
Jean-François Chartier, Centre interuniversitaire de recherche sur la science et
la technologie (CIRST), Université du Québec à Montréal, Case postale 8888,
succursale Centre-Ville, Montréal (Québec) H3C 3P8, Canada
Abstract
The concept of “life” certainly is of some use to distinguish birds and beavers
from water and stones. This pragmatic usefulness has led to its construal as a
categorical predicate that can sift out living entities from non-living ones
depending on their possessing specific properties—reproduction, metabolism,
evolvability etc. In this paper, we argue against this binary construal of life. Using
text-mining methods across over 30,000 scientific articles, we defend instead a
degrees-of-life view and show how these methods can contribute to experimental
philosophy of science and concept explication. We apply topic-modeling
algorithms to identify which specific properties are attributed to a target set of
entities (bacteria, archaea, viruses, prions, plasmids, phages and the molecule of
adenine). Eight major clusters of properties were identified together with their
relative relevance for each target entity (two that relate to metabolism and
catalysis, one to genetics, one to evolvability, one to structure, and—rather
unexpectedly—three that concern interactions with the environment broadly
construed). While aligning with intuitions—for instance about viruses being less
alive than bacteria—these quantitative results also reveal differential degrees of
performance that have so far remained elusive or overlooked. Taken together,
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
these analyses provide a conceptual “lifeness space” that makes it possible to
move away from a categorical construal of life by empirically assessing the
relative lifeness of more-or-less alive entities.
Keywords
definition of life; lifeness space; topic-modeling; text-mining; experimental
philosophy of science; concept explication
1 Introduction
“Life” is of significant scientific and philosophical interest, and not just for the
definitional disputes it creates. It is often said that a clear concept is demanded
to adjudicate whether there is such a thing as life on other planets or whether its
de novo synthesis has been realized in the test tube, or still whether one has
succeeded in tracing its origin on Earth. But no one fully agrees on the
characteristics of life. There is, however, an implicit assumption that is often
made in conjunction with such claims: the assumption that there exists a strict
delineation between life and non-life (Joyce 1994; Maynard Smith and
Szathmáry 1997; Luisi 1998). Some have localized the threshold of life at the
level of self-replicating informational polymers capable of Darwinian evolution
(Gilbert 1986; De Duve 2005), others at the level of self-organized cross-catalytic
networks and metabolism (Boden 1999; Nghe et al. 2015), at a particular level of
specific thermodynamic properties (Prigogine, Nicolis, and Babloyantz 1972;
England 2013) or at the level of organizational closure or autopoiesis (Bitbol and
Luisi 2004; Moreno and Mossio 2015), yet others at the level of membrane and
vesicle formation (Morowitz 1992; Segré et al. 2001) or at the level of entities
such as viruses (Forterre 2010), and still others at junctures of several of the
above (Luisi 1998; Joyce 1994; Koshland 2002; Ruiz-Mirazo, Peretó, and
Moreno 2004; Dupré and O’Malley 2009; Benner 2010) or even at the allencompassing level of all extant and past organisms (Mariscal and Doolittle
2018).
Yet, numerous hard-to-classify entities are now known that challenge our very
intuitions about life and non-life (Pirie 1937; Lederberg 1960; Cairns-Smith
1982; Hazen 2005). These include macro-molecules that store information and
replicate themselves (Lincoln and Joyce 2009), complex chemical networks that
2
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
cross-catalyze their reactions (Ashkenasy et al. 2004), crystals that grow and reorganize spontaneously (Cairns-Smith 1982; Palacci et al. 2013), viruses that are
larger in size and genomic complexity than some of the smallest bacteria (Raoult
2004), and bacteria that are so functionally and genetically reduced that one
doubts of their being truly alive (Nakabachi et al. 2006). The menagerie of
microbiology also includes artificially simplified versions of natural organisms
(Hutchison et al. 1999), specific RNA viral particles or viroids (Dimmock,
Easton, and Leppard 2009), viral agents named satellites (Saunders and Stanley
1999), viruses of viruses that go by the name of virophages (La Scola et al. 2008),
autonomous DNA strands or plasmids (Norman, Hansen, and Sørensen 2009),
prions (Prusiner 1982), and still others. This state of affairs raises three
intertwined questions. First, what are the proper properties of life; how can they
be identified? Second, on the basis of these properties, does such lifeness come
in a binary or in a more gradual way; and how should one measure lifeness? And
third, if it turns out that lifeness is gradual, how do the different more-or-less
alive entities fare relative to one another?
Our objective, in this paper, is to argue against the binary assumption that the
distinction between life and non-life is a dichotomy, and to do so by turning to
science for a best-informed view. We aim to provide an alternative gradualist and
multi-dimensional view of lifeness on the basis of a data-driven analysis of the
scientific literature.1 The project fits within the metaphilosophical framework of
Carnapian conceptual explication and its recent revival (Carnap 1950; Maher
2007; Kitcher 2008; Justus 2012; Brun 2016), and more generally within the
framework of conceptual engineering broadly construed (Burgess and Plunkett
2013a, 2013b; Eklund 2015; Machery 2017; Cappelen 2018). Such frameworks
aim at addressing the representational deficiencies of our concepts—e.g.
inexactness, vagueness, inadequacy to certain technical contexts, incoherence,
unfruitfulness, political or social non-optimality—and at recommending
appropriate conceptual revisions. In support of our argument, we use text-mining
1
This paper builds on the conceptual idea of “lifeness signatures” as proposed by Malaterre (2010b). One
aspect of our contribution is to show that such conceptual construal of life can be operationalized and
rendered measurable. As noted by one referee for Synthese, our views bear some resemblance with GodfreySmith’s Darwinian space (2009) in that both are multidimensional. The two projects pursue however
different objectives: a characterization of more-or-less paradigmatically Darwinian populations for GodfreySmith, a characterization of more-or-less alive entities in our case. The methods are also different, relatively
qualitative in the case of Godfrey-Smith, more quantitative and data-driven in our case. Both instances show
the value of thinking multi-dimensionally for conceptual explication.
3
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
methodologies that we apply to a large corpus of scientific texts with the view to
elaborating a perspective on life that is deeply anchored onto the recent scientific
practice. We hope to show that such data-driven methodologies can provide
complementary tools to perform experimental philosophy of science, notably
when it comes to conceptual explication (Shepherd and Justus 2015; Machery
2016; Schupbach 2017).
Performing topic-modeling analyses across over 30,000 full-text biology
articles retrieved from the BioMed Central collection (see SI Appendix for
details), we identified topics that coalesced around a set of selected target
entities—such as bacteria, viruses or prions—and used them as a basis for
inferring, from the ground up, the clusters of properties that related the most to
these entities. Results show eight major clusters of properties: two that relate to
metabolism, one to structure, one to genetics, another to evolvability, and three
others to environment-interactions broadly construed. By measuring the semantic
proximity of the words representing the target entities to the set of words
associated with each one of the property clusters, we found that this semantic
proximity varied. We hence defined an 8-dimensional “lifeness space” on the
basis of these property clusters and used semantic proximity as a metric for
assessing the relative positioning of the target entities along these dimensions.
These findings corroborate the intuition that different entities perform differently
along each lifeness dimension. They also show how different degrees of overall
lifeness can be achieved in different ways. This multidimensional way of
mapping borderline entities exemplifies the need of reformulating questions of
the type “Is X alive?” into questions of the type “What is the lifeness of X?”.
Building on these dimensions-related differences, we also computed a single
value of overall lifeness for each more-or-less-alive entity, thereby providing an
empirically-founded quantification of degrees of lifeness.
The paper is structured as follows. To set the stage (section 2), we first unfold
the “problem of defining life” and, specifically, the “binary assumption” that
accompanies it and that is the target of our conceptual explication. We review
existing gradualist arguments about life that undermine this very assumption
(section 3), and, while endorsing the objectives of these arguments, we identify
some of their limitations with regards to defining a scale of lifeness. To
circumvent these limitations, we argue (section 4) that attention must be paid to
the different functional activities that are performed, at different degrees, by
entities with varying lifeness. In order to empirically ground our argument, we
4
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
propose to identify these functional activities by investigating the scientific
literature; we show how large-scale text-mining analyses (that we describe in
section 5) can be used to define a multidimensional lifeness space (section 6).
Such lifeness space, we argue, provides good grounds for doubting the binary
assumption and for operationalizing degrees of lifeness (section 7). After
discussing methodological limitations of our approach, we also highlight
consequences of our analyses of lifeness.
2 The “problem of defining life” and the “binary assumption”
The “problem of defining life” is often equated with the problem of identifying
an adequate definition of life. The scientific and philosophical literature abounds
with such definitions.2 In broad terms, this approach can be formulated as asking
the question:
(DL) What is the definition of “life”?
Answering (DL) presupposes that one agrees on what defining formally means.3
However the real difficulty lays elsewhere: in the surprising diversity of
borderline cases—giant viruses, sterile organisms, extreme microsymbionts,
etc.—and disputes that follow their categorization as alive or not. We will address
this point in section 3. But before we do, let us explicate the problem of defining
life along three additional perspectives that will later prove useful:
epistemological, methodological and ontological.
Indeed, some authors propose to frame the problem of defining life as a
problem that concerns the very possibility of identifying a definition of life, in
particular in light of available scientific knowledge (Cleland and Chyba 2007;
Machery 2012; Smith 2016). This epistemological question and the like can be
formulated as:
(DL-E) Given current knowledge, is an answer to (DL) within reach?
The motivation behind (DL-E) is that the transition from non-living matter to
living matter still remains unexplained by science, and that no comprehensive
“theory of life” is yet available. It is also motivated by the argument according to
which, despite its apparent diversity, life on Earth is just a single instance of life,
2
For lists of definitions, see e.g. (Popa 2004; Pályi, Zucchi, and Caglioti 2002). See also (Tirard, Morange,
and Lazcano 2010) for an historical perspective on definitions of life.
3
The analysis of how defining impacts (DL) is beyond the scope of this paper, but see (Malaterre 2010c).
5
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
hence an insufficient basis for any inductive generalization about the nature of
life or its definition.
Methodological questions also arise over the most appropriate approach to
elaborating and justifying any definition of life (Cleland and Chyba 2007).
Should “life” be defined on the basis of the common properties shared by
particular instances of life? But in this case, how should such instances of life be
identified in the first place? Or should “life” be first defined, with the resulting
definition only then being used to identify particular instances of life? Yet in that
case, where would such a definition come from and how would it be justified?
These methodological questions can be reformulated as:
(DL-M) What is the most appropriate method for answering (DL)?
Third, the “problem of defining life” can be explicated as an ontological
problem. In that case, questions of interest typically concern whether life picks
out a definite natural kind by delineating entities that are alive in some sense from
entities that are not, or whether some specific account of natural kinds is better
suited than another to capture “life”. Let us subsume these ontological questions
under:
(DL-O) Is life a natural kind?
In a sense, (DL-O) presupposes a philosophical account of natural kinds. But the
hard issue at stake is whether, given such an account, life is indeed a natural kind
made possible by that account.4
The purpose of making explicit these different facets of the “problem of
defining life” is that they all relate to another more fundamental issue: the
underlying assumption that life is binary in character. Indeed, asking (DL) is
often understood as searching for the criteria capable of sorting out entities that
are alive from entities that are not (e.g. Joyce 1994; Maynard Smith and
Szathmáry 1997; Luisi 1998; Griesemer 2003; Ruiz-Mirazo, Peretó, and Moreno
2004; Cleland and Chyba 2007; Mix 2015; Knuuttila and Loettgers 2017). And,
when asking (DL-E) or (DL-M), this assumption is also implicitly made since
both questions are intertwined with (DL). This is all the more so with (DL-O):
positive answers often result in sorting the furniture of the universe into living
4
Investigating whether life delineates a natural kind or, possibly, imposes adjustments to existing accounts
of natural kinds is beyond the scope of this paper, but see e.g. (Lange 1996; Khalidi 1998; Diéguez 2013;
Ferreira Ruiz and Umerez 2018).
6
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
entities on the one hand, and non-living entities on the other.5 This binarycharacter-of-life assumption can be formulated as:
(Bi) Given any entity E, E is either alive or not alive.
Assessing whether we have good reasons to accept (Bi) matters when one
considers any of the aspects of the problem of defining life, and all the more so
if (Bi) is not warranted. In that case, the usual ways in which the problem of
defining life is construed would be, at best, ill-framed.
This issue is significant because in fact, many accept (Bi), either explicitly or
implicitly. This stance is deeply anchored in the defining-life literature broadly
construed, from Antiquity till today. And many contemporary authors who
endorse (Bi) often do so without justifying its adoption: the existence of a clearcut divide between non-living matter and living matter is simply taken for granted
(e.g., Schuster 1984; Joyce 1994; De Duve 2005).6
Assumption (Bi) is even endorsed by authors who do not necessarily subscribe
to the project of answering (DL). Some argue that a definition of life is useless
on the basis that we all intuitively know when something is alive or not. Others
dismiss (DL) on the basis of a negative answer to (DL-E), arguing for instance
that positively answering these questions would require a theory of life that
remains to be identified, all the while endorsing the view that once such a theory
is available, life will be definable in a clear-cut fashion (Cleland and Chyba
2007). Yet others argue that definitions of life should just be considered
operational definitions that only serve as tools in the scientific practice, while
recognizing at the same time that such definitions play a role in how models
represent a system as living or non-living (Bich and Green 2018).7
5
The question whether natural kinds should partition the entities of the universe into unambiguous and
non-intersecting sets is the object of debate; see e.g. (Boyd 1999; Ellis 2001). For the sake of the present
discussion, suffice it to say that at least some authors argue simultaneously for a positive answer to (DL-O)
and a binary partitioning.
6
For guidance, one may look at the 100+ definitions listed in (Pályi, Zucchi, and Caglioti 2002; Popa
2004), most of which take the form of biconditionals delineating life in a binary fashion and without further
justification of that binary view. Some authors, though, appeal to phase-transitions or emergent properties
that would sharply distinguish life from non-life, e.g. (Lange 1996; Luisi 2006); for a critical assessment of
these, see (Malaterre 2010a).
7
Along the same line of thoughts, see also the operational role of definitions of life in the search for life
elsewhere than on Earth— be it on Mars, on the Jovian moons or, much further away, on exoplanets circling
other suns in our galaxy—with the mediation of “biosignatures” (Raulin 2010; Seager, Bains, and Petkowski
2016), as well as in the search for life in the test-tube to assess whether experimental attempts at creating
life have been successful or not (Blain and Szostak 2014).
7
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
Conversely, other authors defend the view that we do not have good reasons
to posit (Bi), and should, instead, assume the opposite. Some such positions result
from responses to ontological questions of the type (DL-O), most notably by
proponents of some version of homeostatic property cluster kinds, on the basis
that attribution of lifeness is more complex than the binary assumption (Diéguez
2013; Ferreira Ruiz and Umerez 2018). Yet others stem from empirical research
in biology broadly construed (Pirie 1937; Lederberg 1960; Cairns-Smith 1982;
Hazen 2005; Bruylants, Bartik, and Reisse 2010; Bedau 2011). Because we are
interested in how science can help us explicate the concept of life, we turn to
these presently.
3 Degrees of lifeness
From spontaneously self-reorganizing crystals to mega-viruses to drastically
reduced micro-endosymbionts, all these microscopic entities that we mentioned
in the introduction (section 1) constitute a set of entities that are hard to categorize
as either truly living or truly non-living. They definitely appear more alive than
water and stones, yet less than birds and beavers in many respects. As such, they
fuel the intuition that there exists some sort of a gray-zone of “lifeness” populated
by entities that certainly are complex by molecular standards yet simpler than
common microorganisms.8 These entities are such that they carry out some—but
not necessarily all—of the activities that are intuitively attributed to clearly living
organisms—e.g. metabolic self-sustenance, growth, replication, information
encoding—yet not at the level of performance of clearly living entities such as
common bacteria. The existence of such a gray-zone undermines assumption (Bi)
by supporting the very opposite view: that life is a matter of degrees. It also
contributes to explaining why attempts at defining life as a binary property have
led to a plurality of discordant cutting points.
8
As pointed to us by a referee, were the categories “alive” and “not-alive” be considered as determinate
membership categories (i.e., once something has been found to be a member, that thing is a full member),
entities of the gray-zone of lifeness could be considered as borderline cases with indeterminate membership
(neither members nor non-members of the two categories “alive” and “not-alive”). As a result, they could
be considered as delineating a third determinate membership category. Alternatively, they could be taken as
evidence that the categories “alive” and “not-alive” are not determinate membership categories but
categories with degrees of membership. Independently of which interpretation one chooses, our point is to
stress out that construing the two categories “alive” and “not-alive” as being mutually exclusive and
collectively exhaustive categories with determinate membership does not do justice to the variety of entities
that populate the world. Hence our argument against (Bi).
8
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
It is worth noting that the view that life is a matter of degrees allows for two
perspectives. First, one can interpret it in a synchronic context as stating that—at
any given time and place, notably now on Earth—entities exist that exhibit
different degrees of lifeness.9 This synchronic-gradualism view can be
formulated as:
(SG) Being alive is a matter of degrees
(SG) receives empirical support from present-day more-or-less-alive entities. It
does not, however, specify any historical nor genealogical relationships between
these entities: it could be the case that all (or most) more-or-less-alive entities
that we see today are derived from clearly-alive entities, or the other way around,
or even that both result from some form of co-evolution.10
Second, life as a matter of degrees can be interpreted in a diachronic context
as stating that the historical chain of events that led from non-living matter to
living matter, on Earth and possibly elsewhere, was populated by entities of
varying and, on average, increasing lifeness (e.g. Morowitz 1992; Eigen 1992;
Dennett 1995; De Duve 2005). This diachronic-gradualism view can be
formulated as:
(DG) The origination of life is a matter of degrees
(DG) is often taken as providing good reasons for (SG). The gradual
appearance of life on Earth, entailing the existence of a succession of more-orless-alive entities, helps make sense of the present-day profusion of more-or-lessalive entities, though these entities are likely to be strikingly different from those
that populated Earth four billion years ago. Conversely, (SG) can be taken to lend
support to (DG) in the sense that, if the present-day gap between non-life and life
is populated by more-or-less-alive entities, then the same could have been the
case in the past, and in particular during the origination of life. Yet, neither (SG)
nor (DG) formally entail each other. It is possible that the origination of life
involved more-or-less-alive entities without implying that those ancient entities
bear any relationships with current ones, the latter being secondary by-products
of present life forms. We could therefore conceive of a world in which (DG)
would be true yet (SG) false. On the other hand, despite the numerous more-or-
9
This synchronic view can contribute, for instance, to arguments about the lack of clear-cut delineation of
biodiversity’s scope when considering ever smaller biological entities (Malaterre 2013).
10
As an illustration of this point, see the debate on the role of viruses in the origin of DNA in (Forterre
2006).
9
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
less-alive entities we observe today, the origins of life may have resulted from a
sudden phase-transition-like event. In that case, (DG) would be false yet (SG)
true.
Both versions of gradualism relate to yet another more fundamental issue: the
existence of a lifeness scale. For argument’s sake, take this lifeness scale to range
from a 0- to a 1-value. In this context, gradualists of either the (DG) or (SG) types
need to answer two foundational questions:
(G1) What are the 0- and 1-values of the lifeness scale?
(G2) What is the gradation of the lifeness scale?
Together, these questions specify the start- and end-point of the lifeness scale, as
well as how one moves along this scale. Consider the following tentative answer
to these questions.
In the explication of their gradualist view of life, Bruylants, Bartik and Reisse
(2010) argue that the 1-value of lifeness should correspond to any organism that
belongs to any of the three domains of the tree of life—Archaea, Bacteria or
Eucarya—since all organisms of these domains are clearly alive. As for the 0value of lifeness, it should correspond to entities that lack one or more of the
following characteristics: (i) consisting of organic molecules and liquid water;
(ii) consisting of interacting/reacting molecules and (iii) exhibiting properties
which are different from those of their isolated components. This results in
attributing a 0-value to pure liquid water or solutions of amino acids or nucleic
bases (e.g. adenine). So much for (G1). As regards (G2), they argue that the
gradation scale will possibly result from further classificatory practice, but will
likely vary depending on scientific disciplinary interests (2010, 140–41). While
their objectives may be sound, we take issue with their answers to both (G1) and
(G2). First, defining the 1-value by reference to the three domains of life proves
circular since inclusion of entities into these domains presupposes that such
entities have been deemed worthy of belonging to the tree of life, and hence have
been previously characterized as alive.11 This way of setting the 1-value of
lifeness also characterizes as alive entities that are classified within one of the
three domains, even if their status as clearly living entities is disputed, as is the
case of reduced bacterial endosymbionts such as Carsonella ruddii (Nakabachi
et al. 2006). In addition, as Bruylants and colleagues recognize, the 1-value
11
See for instance the debate about whether viruses should be included or not in the tree of life (Moreira
and López-García 2009; Forterre 2010).
10
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
would need to be redefined whenever one would be confronted with any instance
of life not falling within the three known Terran domains, be it a novel domain
of terrestrial life, a novel form of extraterrestrial life, or a novel form of artificial
life. Second, the criteria for the 0-value appear too restrictive: they a priori limit
life to carbon-based chemistry, while it is conceivable that other chemistries
could be used (however unlikely such views may turn out in fact). Finally, it is
debatable whether the gradation scale should depend on scientific disciplines and
research interests, and not be grounded into more objective features of nature, at
least as they would surface from a well-informed transdisciplinary perspective at
a given point in time.
4 Lifeness as functionally multi-dimensional
Functional approaches to life have been defended by many, from Aristotle who
characterized the life of plants, animals and humans depending on their capacities
to absorb nutrients and grow, respond to stimuli, move and think, to the more
recent characterization of living entities as those entities that can self-sustain and
evolve through natural selection (Joyce 1994). The central question to be
answered for any such functional approach is that of identifying the key
functional activities or dimensions of lifeness. If, in addition, more-or-less alive
entities differ in how they perform along these dimensions, then it also matters
to identify the scales on which to grade them. Hence the following questions that
any functional-gradualist must answer:
(FG0) What are the functional dimensions of lifeness?
(FG1) What are the 0- and 1-values of each functional dimension of lifeness?
(FG2) What is the gradation of each functional dimension of lifeness?
One common approach to answering (FG0) is to trust our intuitions about the
properties of life. Because such intuitions are shaped by our experience about
living entities, they should help us capture the salient features of life. As shown
by Aristotle’s and Joyce’s positions, this approach often leads to identifying
activities such as metabolism, self-sustenance, growth, replication,
compartmentation, information storage or evolvability. However, disagreements
are many, notably when it comes to precisely defining each one of these
functions, justifying the choice of some functional activities as being more
11
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
relevant than others, and sifting out problematic borderline cases into living and
non-living entities. The last point in particular raises circularity worries about
answering (FG0) on the basis of intuitions that are shaped by our experience
about living entities, while at the same time relying on those intuitions to sort
entities into those that should and those should not be considered as truly alive,
and thus have an impact on (FG0).
Bedau (2011) explores such a functional approach in the context of protocell
synthesis. He argues that being alive is a matter of possessing three functionalities
(containment, metabolism, program) linked by coupling interactions (to a
maximum of nine interactions that include six directed interactions between
different functionalities, and three self-regulatory ones). In this framework, the
1-value of lifeness corresponds to entities that possess all three functionalities
and all nine interactions, and the 0-value to entities that possess only one
functionality and no interactions. In between is a discrete scale of nine steps
corresponding to the number of interactions. While we agree with Bedau on the
value of a functionalist approach, we take issue with his theoretical framework.
First, it is not obvious why life should revolve around the three functionalities of
containment, metabolism, and program, and only those three. Bedau argues that
the triad of functionalities follows from open-ended evolution, a central feature
of life (2011, 79). Yet, open-ended evolution does not entail the triad view:
proponents of the RNA-world, for instance, may argue that autocatalytic RNAs
are capable of open-ended evolution, yet not of compartmentation nor
metabolism. Conversely, the triad view does not entail open-ended evolution: one
can conceive of systems that would exhibit the three functionalities while being
only capable of sustaining themselves without replicating, hence not capable of
open-ended evolution. This in turn casts doubt on the justification of the 0- and
the 1-values of lifeness. Second, the scale of lifeness is nearly exclusively
interaction-centered and cannot differentiate entities that would differ only in
their degree of functional performance. Indeed, in the framework, entities possess
functionalities in an all-or-nothing mode and their position on the scale of lifeness
mostly depends on the number of interactions.12 Yet what an entity does and how
12
In Bedau’s framework, the position on the scale of lifeness is a function of the number of interactions. As
a consequence, a value of 1 interaction can be achieved either by having 1 functionality and 1 retroactive
interaction from this functionality onto itself, or by having 2 functionalities linked by a 1 interaction. With
the same reasoning, one can figure out that step 2 entities necessarily have 2 or 3 functionalities, and that
step 5 entities necessarily have all three functionalities. From step 5 to step 9, all entities necessarily have
12
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
well it does it intuitively makes more of a difference to its characterization as
“more-or-less-alive” than how it coordinates what it does. In other words,
functions and functional performance are often seen as the most significant
differences between entities, and not the extent to which functions interact with
one another.13
Other approaches have been proposed to overcome disagreements about
(FG0). Some have proposed the adoption of a majority view on the concept of
“life” following meta-analyses of definitions of life extracted from textbooks and
the scientific literature. For instance, Trifonov (2011) analyzed keywords used in
123 definitions of life, grouped them into thematic clusters and quantified their
frequency. In a similar fashion, Bains (2014) analyzed keywords in 27 textbook
and monograph definitions of life, interpreted their contextual meaning and
assigned them to thematic categories. Whereas both authors focused on explicit
definitions of life that they manually analyzed, we wanted to investigate the
possibility of assessing the dimensions of life in a ‘bottom-up’ way by looking at
the general biological literature at a broader scale and with the help of text-mining
tools. Also, while we concur with the functional approach that is present in both
studies, we take issue with methodological drawbacks of such meta-analyses
which (i) are highly dependent on the very limited sets of definitions they take as
input, (ii) rely on definitions that may themselves be based on intuitions—in the
general sense defined above—and may therefore be prone to the circularity
objections mentioned earlier, and (iii) construe life’s functional activities as all-
all 3 functionalities and are only distinguished by their number of interactions. Functionalities therefore play
a secondary role compared to that of interactions.
13
A referee pointed to us the importance of interactions between functions. Interactions do indeed capture
a very relevant feature of living organisms. This is a point emphasized notably by Gántí (2003), and that is
central in Bedau’s framework and in other perspectives that are largely focused on designing or engineering
protocells. Yet, we argue that characterizing the lifeness of entities concerns, above all, identifying what
these entities do (this is a project that is different from the one pursued by Bedau, and complementary).
Consider a simple analogy: a sidecar motorbike can be depicted as possessing the functions of ‘motorized
propulsion’, of ‘providing seating for a driver’ and of ‘providing seating for a passenger’. This third function
clearly is what differentiates a sidecar motorbike from a regular motorbike. The fact that there is an
interaction between the sidecar itself and the motorbike to which it is attached is of course crucial but it does
not spontaneously arise in the top list of differentiating properties of such sidecar motorbikes compared to
regular motorbikes. In biology, the characterization of microorganisms through the identification of genes
belonging to different clusters of orthologous groups clearly also illustrates the importance of this
perspective (Galperin et al. 2015). Indeed, the significant differentiating factors between microorganisms
are taken to be genes that correspond to specific functions, not genes that correspond to interactions between
functions.
13
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
or-nothing capacities, which, in turn, lead to the kind of disagreement we
mentioned above as to why certain entities should be said or not to possess this
or that set of activities, and hence be considered alive.
We argue that a proper answer to (FG0) should go back to the roots of
functional approaches to life. If, indeed, the functional dimensions of lifeness are
nothing other than the types of activities that more-or-less-alive entities engage
in, then, only an understanding of what those entities do will inform us about the
functional dimensions of lifeness. And if, in addition, by virtue of intellectual
humility, we endorse the view that it is the scientific community rather than any
particular scientist or philosopher that has the richest knowledge of these moreor-less alive entities, then we should turn to the practice of science as a whole to
uncover these functional dimensions and provide us with the insights necessary
for a proper conceptual explication. This is what we propose to do with semisupervised text-mining of the biological literature (sections 5 and 6). For now,
we still need to clarify a few points about (FG1) and (FG2).
First concerning (FG2), we argue that functional activities come in levels of
performance or degrees, the measure of which should be empirically grounded.
To grasp why, consider, as an example, the capacity that some more-or-less alive
entities have to develop some form of cellular compartmentation. At first sight,
one may think that compartmentation comes in a binary way: there is segregation
of “inner” molecular compounds from an “outer” environment, or there is none.
Yet, empirical findings have shown that compartmentation comes in varying
degrees of performance: for instance, a very permeable and transient form of
compartmentation can be realized through thermophoresis-induced microspacialization in liquid flows (Duhr and Braun 2006); surface adsorption on
minerals provides another form of segregation from the environment, possibly
more robust too (Cairns-Smith 1982), as do mineral pores in hydrothermal
chimneys (Martin and Russell 2003); membrane-like structures such as selfassembling lipid vesicles offer still more flexibility and nutrient permeability
(Blain and Szostak 2014), up to the sophistication of present-day microorganism
membranes that display motility-enabling features, signaling and defense
mechanisms, specialized transporters, proton-gradients or active catalysts
(Madigan 2015). Scales can therefore be developed at the level of individual
functionalities, even those that may appear binary at first sight. Ideally, once an
answer has been given to (FG0), we argue that such scales should be empirically
14
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
defined for each mode of lifeness, as is currently best understood in science. We
propose an operationalization of this idea in section 6 below.
Concerning (FG1), we propose to set the 0- and 1-value of lifeness by
choosing paradigmatic and uncontroversial references for each. Take the 1-value
of lifeness. It is uncontroversial that common bacteria such as E. coli are clearly
and truly living entities. They are so in virtue of the broad range of activities they
engage in, of “what they do for a living”.14 At the same time, they are probably
among the simplest entities known today that carry out such range of activities,
implying that they possess the minimal set of functions of lifeness (or near
enough). Thus, we can assume that whatever functions and degrees of
performance they possess are also possessed by all other uncontroversially alive
entities.15 Conversely, it is uncontroversial that simple chemical compounds such
as water, sugars, amino-acids, purine bases and other organic molecules clearly
are non-living entities. For these reasons, we propose to take common bacteria
such as E. coli as reference point to the 1-value of lifeness, and simple molecular
compounds such as those mentioned above as reference point to the 0-value of
lifeness.16
Unlike Bruylants and colleagues, it is not because of their membership to
either one of the three domains of the tree of life that entities are ranked at the 1value of lifeness, but because of how they functionally compare to common
bacteria (the latter being taken as paradigmatic reference points for the 1-value).
Likewise, the proposal to take simple molecular compounds as functional
reference points for the 0-value of lifeness has the advantage to make superfluous
any criteria of the like Bruylants and colleagues specify: any entity that is ranked
at the 0-value of lifeness is so because of how it functionally compares to simple
compounds, not because of how it fulfills a number of specific criteria. Also,
14
Note however that not all entities that are classified within the domain Bacteria would appear as
uncontroversially alive: as mentioned earlier, the status of the endosymbiont Carsonella ruddii and others
clearly is disputed.
15
This provides an additional element of answer to (FG0), namely that the functional dimensions of common
bacteria are the ones to be considered as functional dimensions of lifeness.
16
One anonymous referee pointed to us the risk that we might be introducing a form of circularity by
adopting bacteria as reference for the 1-value of lifeness (somehow similar to the circularity of Bruylants
and colleagues). The difference between the two approaches is that, in our case, we take common bacteria
such as E. coli as reference point, and then assess the lifeness of any other entity by comparison to this
reference point, whereas Bruylants et al. consider that any entity of the tree of life automatically deserves to
be at the 1-value of lifeness (relegating the question whether entities ought to be part of the tree of life to
scientists’ classificatory practices).
15
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
contrary to Bedau’s triadic proposal, there is no need for any a priori list of
functions and interactions to be had at the 1-value of lifeness (and conversely not
to be had at the 0-value), since the activities and levels of performance with which
we propose to characterize the 1-value (and conversely the 0-value) of lifeness
are those that scientists empirically observe and investigate when studying
common bacteria (and conversely simple molecular compounds). Let us now turn
to the text-mining approach that we used to answer (FG0) and (FG2).
5 Methodological preliminaries
In scientific texts, like in any texts, words are not used randomly but in
meaningful associations. As a result, the associative patterns that words form is
informative about their semantic proximity. As Firth stated, “you shall know a
word by the company it keeps” (Firth 1957, 11). Text-mining methods, and most
notably topic-modeling, make use of this underlying principle. For any text
corpus, words can be represented as vectors in a high-dimensional semantic
vector space (SVS), their coordinates depending on the combinatorial patterns
they form in that corpus. Though the technical details may appear complex to the
general reader, the approach is fairly straightforward. In what follows, we briefly
summarize the main steps, and then offer a detailed description of each one in
turn (this detailed description can be skipped without loss; more technical
information has been relegated to a separate SI Appendix document).
In short, text-mining theory has shown that words tend to cluster in regions
which correlate with the subject matters found in corpuses. Studying the
partitioning of SVS’es therefore makes it possible to identify groups of words—
or “topics”—that tend to co-occur within a given corpus. By using specific
metrics, one can also analyze the relative proximity of topics to one another,
group topics accordingly or build topic proximity networks. One can also study
the extent to which topics relate to specific target words (often those that express
a concept of interest). Our methods here identified 200 topics in a corpus of over
30,000 biological articles and quantified the patterns of associations between
these topics and sets of words representing 7 target entities: bacteria, archaea,
viruses, phages, plasmids, prions and the molecule of adenine. We chose these
entities with a view to providing a tractable and representative set of the types of
entities populating the intuitive grey-zone between the 0- and 1-values of
16
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
lifeness.17 At one end of the spectrum, we took bacteria as representative of
intuitively clearly-alive entities; at the other, the molecule of adenine as
representative of an entity that is clearly not-alive. The five other entities were
entities whose lifeness we wanted to assess and that were often mentioned in the
corpus.
Our approach, which can be taken as one particular way to do experimental
philosophy of science, relies on two premises. First, we endorse the view that,
for any of these entities, the more salient a property of that entity is, the more
scientific research it will draw; and the more research is published on a given
topic, the more likely that topic will be retrieved by text-mining algorithms. Our
premise is therefore that the topics that emerge from topic-modeling analyses
include the most typical properties of our target entities as they are characterized
in research papers. Among these topics, those that are the most strongly
associated with the set of target entities we chose shall indicate the main
properties of these entities. Our second premise is that the relative strength with
which topics are associated to the sets of words representing the target entities
captures how relevant any given topic is for any given entity. Our view is
therefore that, given any entity, the more sophisticated its properties are, the more
challenging these properties are for scientists, the more research they trigger, and,
as a result, the stronger the association between these properties and the words
representing that entity will be.18 In other words, we posit that variation in
association strength between entities and topics will give an indication of the
17
Ideally, we would have liked to conduct our analyses at the granularity of species—so as to clearly identify,
in particular, the performance of E. coli and use it as reference point for comparing other species (such as
C. ruddii and others). In practice, this was unfortunately not possible: first, it is quite difficult to identify
which particular species generic words such as ‘bacteria’ or ‘virus’ refer to; second, species names are rare
throughout the corpus, and therefore cannot be reliably used for text-mining purposes (this is a limitation of
the methodology we discuss in section 7). We thereby decided to conduct our analyses at a coarser-grained
level (hence the choice of target entities such as ‘bacteria’, ‘virus’ etc.). This implies that the lifeness of
particular species could not be assessed. It also implies that the actual reference point for the 1-value of
lifeness is actually the averaged performance of all bacterial species present in the corpus (and not
exclusively of E. coli). This is an area where further research could be conducted, on larger corpuses and
with more sophisticated text-mining tools.
18
As pointed to us by one referee, there are other reasons why properties could be strongly present in
relationship to entities throughout the corpus. This could be the case, for instance, for properties of wellstudied organisms such as Drosophila melanogaster or Escherichia coli compared to lesser-studied ones. In
our analyses, we did not seek to investigate the lifeness at the level of specific species but adopted a much
coarser-grained perspective (bacteria, archaea, viruses etc.). Such a perspective contributes to averaging out
such possible biases and to justifying our second premise.
17
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
relative degree of sophistication of each entity with regards to the properties
depicted by the corresponding topics.
We structured our semi-supervised text-mining analyses in five steps (the
general reader may want to skip and go directly to Section 6; if needed, see also
SI Appendix for details). First, the corpus was prepared on the basis of 30,622
full-text articles from 54 journals of the BioMed Central collection published
between 1969 and 2012 (74% of articles were published between 2007 and 2012,
thereby providing a representative state of current scientific knowledge).19 The
corpus was pre-processed in four standard stages: (i) removal of bibliographies,
figures and tables, (ii) filtering out of functional- and stop-words, (iii) filtering
out of rare words, and (iv) lemmatization (i.e., the grouping of the various
inflected forms of a word so they can subsequently be analyzed as a single item).
These operations resulted in a working corpus containing 12,671 different
lemmatized terms in 943,508 paragraphs.
Second, to construct the “semantic vector space” (SVS) of the corpus, we
measured the co-occurrence frequencies of words in paragraphs by calculating
the mutual information coefficient for all pairs of words. The mutual information
between two words 𝑤" and 𝑤# is calculated as follows:
𝑛++
𝑁 × 𝑛++
𝐼%𝑤" , 𝑤# ' = )
𝑙𝑜𝑔
5
(𝑛++ + 𝑛+3 )(𝑛++ + 𝑛3+ )
𝑁
𝑛+3
𝑁 × 𝑛+3
+)
𝑙𝑜𝑔
5
(𝑛+3 + 𝑛++ )(𝑛+3 + 𝑛33 )
𝑁
𝑛3+
𝑁 × 𝑛3+
+)
𝑙𝑜𝑔
5
(𝑛3+ + 𝑛++ )(𝑛3+ + 𝑛33 )
𝑁
𝑛33
𝑁 × 𝑛33
+)
𝑙𝑜𝑔
5
(𝑛33 + 𝑛+3 )(𝑛33 + 𝑛3+ )
𝑁
where 𝑛++ is the number of contexts (i.e. paragraphs) where the words 𝑤" and 𝑤#
are co-present, 𝑛+3 is the number of contexts where the word 𝑤" appears but not
the word 𝑤# ; 𝑛3+ is the number of contexts where 𝑤" does not appear but where
𝑤# does; 𝑛33 is the number of contexts where the words 𝑤" and 𝑤# do not appear;
and N is the total number of contexts in the corpus. The resulting SVS is the highdimensional sparse matrix 6𝐼%𝑤" , 𝑤# '7 ∈ 𝑅: , with m the length of the dictionary
19
One of the reasons for choosing the BioMed Central collection was its open-access availability for textmining via a dedicated API, as well as its diversified content in biology-related journals in which we had
strong reasons to suspect that the target entities we were interested in would be mentioned. See also
discussion in Section 7.
18
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
(m = 12 671 in the present study) and 𝐼%𝑤" , 𝑤# ' the mutual information between
two words 𝑤" and 𝑤# . Each word vector 𝑤" = %𝐼(𝑤" , 𝑤+ ), … 𝐼(𝑤" , 𝑤: )' models
the combinatorial pattern of this word with all other words in the corpus.
Third, we used k-means to partition the SVS into 200 clusters (the choice of
k=200 was done by trial and error, as no rules exist; lower values of k resulted in
heterogeneous topics, while larger values led to topic redundancy). One can find
a detailed presentation of the k-means algorithm in many data mining textbooks
(Aggarwal 2015). Its objective function is:
𝑎𝑟𝑔𝑚𝑎𝑥 A A 𝑐𝑜𝑠%𝑤" , 𝑐# '
@
FG ∈@ DE ∈FG
The algorithm is a mobile centers iterative procedure which seeks to move the
geometrical centers 𝑐# of each region 𝑅# in a partition 𝑃 in order to iteratively
maximize the proximity 𝑐𝑜𝑠%𝑤" , 𝑐# ' between a word vector 𝑤" and its nearest
center 𝑐# in the space, where the cosine 𝑐𝑜𝑠%𝑤" , 𝑐# ' between two vectors 𝑤" , 𝑤#
is given by:
𝑐𝑜𝑠%𝑤" , 𝑤# ' =
𝑤" ∙ 𝑤#
|𝑤" | ∙ K𝑤# K
The bigger the cosine between two vectors, the more semantically similar the two
words they represent are. We used Arthur and Vassilvitskii’s optimized
implementation (Arthur and Vassilvitskii 2007). The semantic interpretation of
the clusters was done manually by analyzing the top-50 words closest to each
cluster center. This interpretation was confirmed by comparison with text
excerpts that were retrieved from the corpus on the basis of the compounded
proximity of their words with the center of each cluster. Based on this
interpretation, similar topics were subsequently gathered into groups that we
named “categories”. This grouping of topics into categories was further aided by
the calculation of topic proximity within the SVS. These steps resulted in the
hierarchical clustering of the 200 interpreted topics into 11 higher-level
categories for which we calculated the center vector in the SVS.
In a fourth step, we focused on the entities whose lifeness we wanted to assess.
We started by defining an a priori set of target entity-keywords that represented
the reference points for the 0- and 1-value of lifeness, respectively a simple
compound (we chose “adenine”) and “bacteria”, as well as a number of other
more-or-less alive entities (“archaea”, “virus”, “phage”, “plasmid” and “prion”).
We then ran an analysis of the corpus, looking for related words so as to enrich
19
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
the set of terms describing each target entity (for instance, for “phage”, we
retrieved “prophage” and “bacteriophage” that we grouped together).
Fifth, to investigate the relationship between entities and categories, we
calculated their pair-wise proximities in the SVS with the cosine metric, which
we then normalized with respect to “bacteria”.20 We also functionally
reinterpreted the categories from the point of view of the entities considered, on
the basis of their most strongly associated topics. To assess the relative ranking
of entities along a single axis, we computed the Euclidean distance of each entity
relative to bacteria in the eight-dimensional lifeness space by giving an equal
weight to each dimension, and normalized to bacteria for the 1-value and adenine
for the 0-value.
4 Inferring lifeness from topic modeling
4.1. Topic modeling results
The topic-modeling analyses of the BioMed corpus resulted in the identification
of 200 topics. These topics were individually labeled on the basis of a manual
interpretation of the semantic that emerged from analyzing the top-50 words for
each topic as well as the top-10 text excerpts. Based on this interpretation, the
topics were then manually grouped into 11 categories that made the most sense
given the semantic content of each topic (Table 1). As could be expected, the 200
topics cover a broad range of subject matters.
A first major group of topics concerns cellular and biomolecular research
questions. Categories A and B include topics that concern metabolism broadly
speaking (including digestion, fermentation, oxydo-reduction processes, but also
thermodynamic- and energy-related considerations), as well as proteins and
catalysts more specifically (with topics that relate to the proteosome, to aminoacids, peptides and proteins). Category C includes topics about cellular and
structural features (for instance topics that refer to organelles, transmembrane
transporters, cell division, or the effect of stress onto cells) while category H
20
In short, this metric measures the distance between the average vector that represents the set of topics
belonging to a given category, and the vector that represents a given entity in the semantic vector space
constructed on the basis of the corpus. Note that this approach neutralizes—to a certain extent—possible
biases due to the differing frequencies with which the target entities are mentioned throughout the corpus:
what matters is how often entities and categories are jointly present in the same text excerpts (paragraphs),
independently of how often entities are present in the corpus.
20
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
concerns genetics-related topics (notably topics about genes, gene sequences but
also about methylation and post-translational modifications).
INSERT TABLE 1 ABOUT HERE
A second group of topics is more related to organism-environment interactions
broadly speaking. Category D includes topics that relate to cell-environment
interactions (including bacteriophages, signaling, quorum sensing, cell receptors
or motility) as well as interactions with the biotic and abiotic environment at
different scales (climate, geographic distribution, symbiosis). Category E
includes numerous topics that relate to either animals (cattle species, cattle
pathology, insects, crustaceans, fish, poultry, primates) or plants (cereals,
fruits/vegetables, plant morphology and pathology). Category F gathers topics
that directly concern human beings and associated pathologies (including
anatomy, nervous and ocular systems, odontology, stem cells, immune response
or cancer). Finally, we have grouped under category G topics that relate to
evolution and speciation (fitness, adaptation, mutation, recombination, selection,
speciation, phylogeny or taxa).
A third set of topics—that we decided to set aside in what follows—concerns
research applications, methods, experiments, and a few varia: category I gathers
topics that are applications-oriented (for instance about bioactivity,
bioremediation, diagnosis, vaccination, but also various other topics about nanoparticles, epidemiology or clinical studies). Category J relates to experimental
settings (e.g., light and humidity conditions, protein tags, centrifugation,
fluorescence or microscopy) while category K includes methods- and
epistemology-related topics (with topics such as belief, claim, hypothesis
preference, model, reliability).
4.2. Dimensions of lifeness
Topics capture general patterns of speech that spread throughout the corpus. In
our model, dimensions of lifeness are those groups of topics that repeatedly
gravitate around the set of words referring to the 7 target entities. We evaluated
the proximity of the eight topic categories A-H to the set of words representing
the target entities by means of a cosine metrics (see Section 5, step 5 of the
methodology). Results show that this proximity significantly depends on the
entities: whereas all topic categories were found to be associated to bacteria, only
21
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
some of these categories were strongly associated with the other entities, and with
varying degrees (Fig. 1). These associative patterns capture differing semantic
patterns in the articles that reflect how the target entities are empirically
characterized in the scientific literature. These eight topic categories can
therefore be understood as representing eight major dimensions along which to
assess lifeness.
INSERT FIGURE 1 ABOUT HERE
To facilitate the interpretation, we slightly relabeled three topic categories into
lifeness dimensions from the functional point of view of the entities themselves.
For instance, category F “Humans related” was relabeled “Humans related
(interactions)” to make sense of the semantic association between “bacteria” and
those topics grouped under category F from the point of view of bacteria (indeed,
from the point of view of bacteria, humans represent a wealth of interaction
opportunities). We also sorted them by subject matter (Table 2).
INSERT TABLE 2 ABOUT HERE
Several of these dimensions represent functionalities that are often intuitively
ascribed to living entities in many definitions of life. For instance, metabolism is
strongly present through both dimension A that captures metabolic activity in
terms of chemical processes made possible by thermodynamics, and dimension
B that relates to the ability of synthesizing proteins and other catalysts that are
required for the kinetics of metabolism. Dimension C shows how structural
elements (e.g. membrane, membrane transporters, organelles) fulfill significant
functional roles for some entities. Dimension G concerns the ability of entities to
evolve through natural selection. And dimension H gathers a number of topics
that describe the abilities of entities to possess a genome, replicate, translate and
transcribe it when needed. Interestingly however, other dimensions are also
present, and their relative importance is not always in line with intuitions-based
definitions of life. In particular, our results show strong ties between the entities
and their environment broadly construed: entities are not just active on their own,
but interact in many directions, as can be seen with dimension D that concerns
interactions with the abiotic environment (hence “macro”) and other microscopic
entities (e.g. motility, signaling, symbiosis or parasitic interactions, hence the
22
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
“micro”). Dimensions E and F can also be interpreted as representing abilities
that entities have developed to interact with plants, animals and humans, notably
in using them as resources for their own benefit. One could argue that such
interactions-related categories are just the result of research biases due to
anthropocentric interests (in the sense that bacteria, for instance, that impact our
health or crop yields likely trigger more research than bacteria that only affect
other bacteria). Yet we take them to reveal a significant dimension of lifeness in
that more-or-less-alive entities do not exist in a vacuum but are strongly
interconnected with their environment in its broadest sense, be it abiotic or
constituted by other entities.21 The fact that bacteria possess strong interaction
capabilities with humans, or plants or animals, is indeed one of the significant
things bacteria do, if one accepts as best-informed view the view that science
gives us today. Such interaction dimensions therefore matter when engineering a
concept of lifeness that is rooted in the current practice of science.
4.3. Measuring dimensions of lifeness
Taken together, the eight dimensions define a multidimensional “lifeness space”.
Fig. 1 shows how the semantic association patterns vary depending on
dimensions and entities. Note that some entities outperform “bacteria” on a
number of dimensions. In particular, “viruses” rank higher on dimensions E and
F that can be interpreted as abilities to interact with large organisms (humans,
plants and animals). In this sense, viruses can be said to be more alive than
bacteria along those functional dimensions. But “viruses” also rank lower than
21
One anonymous referee pointed to us a risk of circularity, lifeness scores being assessed along dimensions
that refer to humans, plants and animals, which are themselves living systems. But this circularity is only
apparent: lifeness scores along these dimensions do not depend on whether we consider humans, plants or
animals alive or not. What matters is the extent to which entities—whose lifeness is to be assessed—interact
or not with other entities that are labeled humans, plants or animals, independently of whether humans,
plants or animals are themselves considered alive or not. A second concern is whether our approach might
be too much based on life-as-we-know-it, in its existing context. As a result, lifeness dimensions and scores
would change depending on the properties of newly-found entities (e.g. Martian bacteria) or on
modifications in the environment (e.g. humans being present or not, as captured by dimension F). We do not
see this as a problem: the best perspective we can have on lifeness, we argue, is one that emerges from the
scientific practice in its recent state (hence without Martian bacteria and with humans around). Yet, as the
state of science changes, lifeness may also change (Martian bacteria might be discovered that have unheardof properties or humans may go extinct while robots still pursue the project of assessing lifeness). The
methodology we propose makes it possible to include novel entities and contexts, and revise our construal
of lifeness by re-running the analyses on updated scientific corpora (see also the discussion in section 7).
23
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
“bacteria” on all other dimensions, most notably on metabolism and
proteins/catalysis (dimensions A and B) and environment interactions
(dimension D), which is in line with the general characterization of viruses as
lacking metabolism and being incapable of making a living by themselves
(Moreira and López-García 2009). Note that “viruses” rank quite high in terms
of genetics (dimension H), and, interestingly, in terms of cellular/structural
features (dimension C), which may relate to their capsids and other structures.
“Archaea” have a fairly similar pattern to “bacteria”, but rank lower overall along
most themes, despite intuitions that they would be as alive as bacteria.22 Their
lower ranking in metabolism and proteins/catalysis (dimensions A and B) may
be attributable to a narrower range of metabolic activities compared to bacteria.
Similar comments could apply to their cellular/structural features (dimension C).
Their abilities of interaction are also lower than those of “bacteria”, which makes
sense especially when it comes to interactions with higher organisms (dimensions
E and F); however, it is more delicate to interpret in the case of environmental
interactions (dimension D). “Prions” have a much more restricted pattern. Their
association patterns with dimensions such as metabolism, genetics, evolvability,
environment interactions, cellular/structural features are all very low compared
to “bacteria”. Salient dimensions for “prions” concern catalysis (dimension B),
which is due to prions being proteinoid entities, and interactions with higher
organisms (dimensions E and F), which also makes sense since prions are
pathogenic agents in these organisms.
“Phages”, “plasmids” and “adenine” have weaker association patterns with all
eight dimensions, when compared to “bacteria”. Note how “phages” demarcate
themselves with higher genetics and environment interactions dimensions (H, D).
This is in line with intuitions since phages are RNA or DNA viral agents that
target bacteria (and not higher organisms directly). The association pattern of
“plasmids” is relatively stronger along the genetics dimension (H), which is also
to be expected since plasmids are small circular DNA molecules. Finally,
22
The overall lower performance of archaea compared to bacteria could come from two factors. (1) It could
be the case that archaea have been less studied than bacteria, and therefore that less is currently known about
what archaea do (compared to bacteria); in the future therefore, as publications on archaea increase in
number and research themes, one will likely see in increase in archaea lifeness. On the other hand (2), it
could be the case that archaea are simply less sophisticated in many respects when compared to bacteria. If
this is the case, their relative lifeness compared to bacteria will not change in the future. In any case, one
should bear in mind that the entities performance is relative to the current state of knowledge of the scientific
community (as sampled by the corpus).
24
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
“adenine” appears to have the narrowest pattern, which results from scoring
lowest (or nearly) on all dimensions (the fact that adenine is a component of DNA
and RNA, and intervenes in metabolism and protein synthesis could explain its
good scores in dimensions A, B and H).
4.4. Overall lifeness
In a second stage of analysis, we used the text-mining analyses to assess the
entities’ relative positioning along a single axis by computing the Euclidean
distance of the target entities in the eight-dimensional lifeness space relative to
bacteria, each dimension being given equal weight (see SI Appendix for a
discussion of the robustness of the results depending on the weights attributed to
each dimension). This resulted in the lifeness gradient represented in Fig. 2.
Results show a relative ranking of entities, from lesser-living ones such as
“adenine”, “prions”, “phages”, “plasmids” up to “bacteria”, with “viruses” and
“archaea” somehow half way. While the relatively poor ranking of “archaea”
compared to “bacteria” can be surprising, the overall ranking of all entities
compared to one another fits reasonably well with intuitions (see for instance the
debate over the lifeness status of viruses, that are definitely not alive for some for
lack of metabolism functionalities, yet quite alive for others due to all many of
their other functionalities (Moreira and López-García 2009; Forterre 2010)).
INSERT FIGURE 2 ABOUT HERE
7 Discussion
The results, generally speaking, are consistent with intuitions about life: lifeness
dimensions that relate to matter-energy, structure, evolution and informationprogram corroborate existing intuition-based definitions of life. At the entities
level also, results tend to agree with intuitions: “bacteria” score higher on all
dimensions than most other entities, and all entities generally score high on
dimensions that concur with existing knowledge (for instance “plasmids” scoring
high on information/genetics). Yet the results also reveal often overlooked
factors, namely the significance of interaction-related dimensions: more-or-less
alive entities are not just entities that survive and reproduce, they are also entities
that very significantly interact with all features of their environment, including
other such entities. In this respect, the relatively higher scores of “viruses” along
25
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
two environment-interaction dimensions compared to “bacteria” is revealing.
Such results naturally contribute to undermining the binary assumption (Bi)
while supporting a degrees-of-life view that is anchored to the scientific practice.
They also offer elements of response to the questions identified in section 1, in
particular the definition of life question (DL). Indeed, establishing the existence
of a multidimensional gray-zone of lifeness in between what is clearly-non-living
and what is clearly-living leads to construing life as a relatively high degree of
performance along each critical functional activity more-or-less-alive entities
engage in. In other words, we can now formulate a functional gradualist
definition of life that would run as:
(FGDL) Given an entity E, E is alive to the extent that E functions to a high level of
performance along all the dimensions of the lifeness space.
Of course, such definition does not entail any precise cut-off point: a high-level
of performance is all what is required. And this performance is relative to that of
common bacteria which serve as reference point for the 1-value of lifeness along
all identified functional dimensions.23
This definition, in turn, makes it possible to answer (DL-E) in a positive way,
since the lifeness space is in itself a means to answering (DL). In addition, the
methodology that we have used to undermine (Bi) also serves as a concrete and
operational answer to (DL-M), addressed within the framework of experimental
philosophy of science and conceptual engineering. Finally, the multifunctional
lifeness space, though not directly answering (DL-O), provides constraints with
regards to accounts of natural kinds, since only those accounts that can
accommodate multidimensional and gradual boundaries between kinds would be
compatible with our account of lifeness space. Independently of the details of
such answers, our main objective was to undermine the binary assumption (Bi)
and provide support to the degree of life thesis. Incidentally, from an
23
Note that an extension of the methodology to entities more complex than bacteria—for instance unicellular
or pluricellular eukaryotes—could place such entities at a higher level of performance than bacteria
(extending the methodology would raise a number of practical issues, such as extending the corpus to
properly cover the entities in question, but we see no principled reason why it would not work). We would
see no problem extending the dimensions in such a way as to reflect this state of affairs, and thereby confer
an even stronger degree of lifeness to such organisms, were it to be the case. Note that viruses already score
higher than 1 with regards to some dimensions. Here our objective was to undermine the binary assumption
(Bi) and to re-engineer the concept of life so as to better capture the existence of different degrees and modes
of lifeness as suggested by the current state of science. Hence our focus on entities that can intuitively be
characterized as belonging to the grey-zone of lifeness.
26
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
experimental philosophy of science perspective, we also wanted to show how
data-mining methodologies could provide insights for conceptual explication or
engineering grounded in the practice of science.
Let us stress again the empirical basis of our findings: the lifeness space, its
dimensions and scales are all inferred from the way researchers describe what
they find when studying such more-or-less-alive entities. In this respect, the
lifeness space tracks the diversity of more-or-less-alive entities that, in fact,
populate our world as science currently best describes it. It also stresses the
importance of functionally classifying more-or-less-alive entities. Indeed, these
entities play significant epistemic roles: they structure research programs within
biology or at the interface with biology (e.g., virology, research on prions, on
protocells etc.), and make possible the sort of inductive inferences that
taxonomies enable in other areas of science. In addition, the dimensions of the
lifeness space highlight the underlying—and otherwise hidden—reasons for
ranking some entities as more-or-less-alive than others, and by so doing, explain
why some such entities appear to us as somehow more alive than a simple
molecule yet maybe not as alive as a common bacterium.
The lifeness space also stresses the empirical and conceptual significance of
reformulating questions of the type “Is X alive?” into questions of the type “What
is the lifeness of X?”. This matters strongly to such scientific endeavors as
astrobiology or synthetic biology where the problem of defining life is so often
discussed. Stating that life has been found on other planets or has been created
anew in the test tube is at best misleading and should be replaced by statements
of the kind “Entity E has been identified, with a lifeness of li along dimension
di”. Replacing the categorical view of life by a multidimensional and gradual
lifeness space is—we believe—fruitful in that it opens up the possibility to
compare entities of different types and to understand how they differ in lifeness.
It illustrates not only that an entity may be more-or-less-alive, but also that some
degree of lifeness can be achieved in quite specific ways, hence also encouraging
the development of a taxonomy of more-or-less-alive entities.
Text-mining constraints unfortunately limit the robust application of the
methods to words that are relatively frequent in the corpus. It was thus not
possible to find out how entities such as “protocell” or “LUCA” (Last Universal
Common Ancestor) scored in terms of lifeness as these words appear too rarely
(in <100 paragraphs). It was also not possible to assess lifeness at the level of
specific species for each one of the different target entities (species of bacteria,
27
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
archaea, viruses etc.). Comparing lifeness between, for instance, the common
bacteria E. coli and the reduced endosymbiont C. ruddii would have been very
interesting.24 Yet species names are usually rare words. In addition, these words
are often replaced by generic words—such as “bacteria”—and it is still difficult,
from a methodological point of view, to identify which species these generic
words refer to. This is an area that could benefit from novel algorithms for corpus
pre-treatment (step 1 of the methodology described in section 5), and from larger
corpora (so as to increase the number of occurrences of rare yet interesting
words). Another limitation is that the methods rely on a symmetric measure of
proximity and therefore cannot account for such asymmetric relationships as “X
has property Y”. This limitation typically shows in the results for “adenine” (as
mentioned above, adenine cannot be said to have the property of metabolism, yet
the results show a non-null association with this theme due to the fact that adenine
participates in metabolism). One should note that operations such as topic
interpretation and clustering into categories involve manual operations and
methodological choices. To buffer against possible biases, we checked the overall
robustness of the results by testing different granularities of topic modeling (that
is to say, by implementing models with fewer or more than 200 topics) as well as
different measures of topic-entity association (e.g. conditional probability
measures vs. cosine metrics). For all topics, we also retrieved the top-10 most
strongly associated text excerpts to confirm topic interpretation and computed
inter-topic distances to assist with clustering (see SI Appendix for details).
Text-mining analyses are also dependent on the corpus (in an obvious sense,
since such analyses, of course, aim at revealing the latent semantic content of the
corpora being studied). In the present case, this raises the question whether
similar analyses conducted on a different corpus would have produced a
significantly different lifeness space or not. To answer this question, let us
consider the main stages of our analyses. With regards to the topic modeling
itself, it is likely that the list of topics would have been different on a different
corpus.25 Yet, our methodology is such that, for building the lifeness space, we
24
Note however that studying lifeness at the level of species may raise other methodological difficulties, in
particular linked to the fact that certain species of organisms are more studied than others due to their being
model organisms or simply because they are easier to study (e.g. micro-organisms that can be cultivated as
opposed to those that cannot), or still because they affect human health or economy. In this respect, adopting
a more aggregated level averages out these differences (see also footnote 17).
25
Imagine a corpus that would have included journals in astronomy or physics, or even on origins of life,
synthetic biology or theoretical biology: other topics specific to these disciplines would have emerged (e.g.,
28
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
only retained topic-categories that were present in those text excerpts
(paragraphs) that mentioned the target entities we were interested in. As a result,
there are good reasons to believe that many novel topics—not found in
conjunction with our target entities—would not have made it as dimensions of
lifeness though some may have. However, for a novel topic-category to create a
novel dimension of lifeness, many new topics would have to frequently be found
in conjunction with the target entities. This is not impossible, but we doubt this
is likely. A more likely outcome is that novel topics could integrate and modify
existing topic-categories, hence dimensions of lifeness. The question of interest
then becomes whether these topics would significantly influence the relative
ranking of entities along the dimensions of lifeness. Our approach provides some
robustness in this respect, since we averaged out topic-entity proximity measures
at the level of topic-categories. For novel topics to strongly change lifeness
scores, they would have to be either relatively numerous or very strongly
correlated to the target entities. We therefore believe that this two-stage process—
first, selection of only those text paragraphs that included target entities; second,
definition of lifeness dimensions by averaging topics into categories—provides
robustness to the results (though, of course, the only way to be certain would be
to conduct the analysis on different corpora).26
That being said, the text-mining methods provide empirical grounding to
conceptual claims and offers the advantage of fallibility: there was indeed no
guarantees at start that entities would display differing association patterns
depending on activities, and no guarantees that this would generally fit with
intuitions while revealing interesting novel insights. In addition, conducting
similar analyses on other corpuses has the potential to confirm or disconfirm our
findings.
The methods are also generalizable. By adding research articles, in particular
about novel entities—be they entities possibly found on Mars or synthesized in
about ‘stars’, ‘planets’, ‘elementary particles’ or about ‘prebiotic chemistry’, ‘engineered micro-organisms’
or ‘theoretical models’). These additional topics could then result in either creating novel topic-categories
or modifying existing ones. Yet one also has to consider whether these new topics are correlated or not to
the target entities: while one rarely talks about ‘stars’ and ‘bacteria’ in the same paragraphs, some articles
may consider ‘microbial contamination’ in the context of space exploration or investigate ‘genetically
engineered properties’ of bacteria. Hence the possibility for novel topics to influence the lifeness space.
26
Interestingly, adding texts that specifically focus on the question of defining life is unlikely to affect the
results. Indeed, such texts typically weigh the relative significance of different criteria for life, yet rarely
mention target entities. As a result, few of their paragraphs would be retained.
29
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
the laboratory (provided there are enough publications to overcome the
methodological limitations noted above)—the methods make it possible to assess
novel lifeness scores. Note also that re-running the analyses as novel scientific
articles accumulate may result in a re-evaluation of the lifeness dimensions. The
eight dimensions here identified stem from the best understanding we have of
nature as provided by the current status of science. But the methods we use
provide a means of re-assessing that understanding as science unfolds. As a
result, novel entities that we discover or create will likely influence the
dimensions that make most sense to capture their diverse functionalities as well
as the relative ranking of existing entities along such new or modified
dimensions. Explicating concepts by means of experimental philosophy of
science approaches and data-mining methodologies has the drawback of being
revisable depending on how science unfolds, but also has the advantage of
anchoring our concepts and representational devices to the most up-to-date
science. If concepts are to be more exact and fruitful, drawbacks of the
framework—we would argue—are overcome by its advantages.
8 Conclusion
Many of the controversies surrounding the problem of defining life stem from
the need to justify the choice of particular properties of life over others, and from
the assumption that these properties should translate into a categorical binary
view of life. We see this as a symptom of conceptual inadequacy, hence our
proposal to engineer a concept of lifeness that better accords with the current
status and practice of science. Adopting the stance of experimental philosophy of
science, we have mobilized text-mining methodologies to extract, from a subset
of the scientific literature, the contours of a lifeness space. We hope to have
shown how such methodologies can contribute to explicating and engineering
concepts in ways that are tightly connected to the practice of science. Our focus
was on the binary assumption of life. Our results go against that assumption and
justify a degrees-of-life view stemming from multiple gradual dimensions of
lifeness. Also, by making it possible to compute lifeness scores for different
more-or-less-alive entities, the quantitative and data-driven methodologies that
we implemented shed new light onto the grey-zone of lifeness that is found in
between clearly-living entities (such as bacteria) and clearly non-living ones
(such as small organic molecules). If this view is correct, one should no longer
30
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
worry about defining life in a categorical way. Rather, one ought to tackle the
task of further identifying and classifying types of more-or-less-alive entities. We
see this as one of the ways to assess the fruitfulness of our conceptual
reengineering. Another is the extent to which it will, in turn, open up new
questions about the historical or genealogical relationships, if any, that pertain
between these types of entities, as well as about the predominance, or not, of
certain types of entities in certain areas of the lifeness space. As (very) large
organisms, we, human beings, have often had the tendency to conceive of nature
as consisting of two broad categories: life and non-life. At the finer-grained
microscopic scale where it all started on Earth, four billion years ago, and where
so much still takes place today, there is no such distinction: it is all a matter of
shades.
Acknowledgements
Access to the BioMed Central Collection is gratefully acknowledged. The
authors thank Marc Bedau, Mark Ereshefsky, Michel Morange, Kepa RuizMirazo and Eran Tal for comments on earlier versions or parts of the manuscript.
They also thank the audience of the 2015 “Origins” conference organized by
COST Action TD 1308, as well as the participants to the UQAM and McGill
2017 conferences where this work was presented. Thanks are also extended to
three anonymous reviewers for Synthese for thoughtful suggestions. Research
conducted with funding from Canada Research Chair Program (Grant 950230795), Canada Foundation for Innovation (Grant 34555), and Canada Social
Sciences and Humanities Research Council (Grant 430-2018-00899).
References
Aggarwal, Charu C. 2015. Data Mining: The Textbook. Springer.
Arthur, David, and Sergei Vassilvitskii. 2007. “K-Means++: The Advantages of
Careful Seeding.” In Proceedings of the Eighteenth Annual ACM-SIAM
Symposium on Discrete Algorithms, 1027–1035. Society for Industrial and
Applied Mathematics.
Ashkenasy, Gonen, Reshma Jagasia, Maneesh Yadav, and M. Reza Ghadiri.
2004. “Design of a Directed Molecular Network.” Proceedings of the
National Academy of Sciences of the United States of America 101 (30):
10872–10877.
31
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
Bains, William. 2014. “What Do We Think Life Is? A Simple Illustration and Its
Consequences.” International Journal of Astrobiology 13 (Special Issue 02):
101–111. https://doi.org/10.1017/S1473550413000281.
Bedau, Mark A. 2011. “A Functional Account of Degrees of Minimal Chemical
Life.” Synthese 185 (1): 73–88. https://doi.org/10.1007/s11229-011-9876-x.
Benner, Steven A. 2010. “Defining Life.” Astrobiology 10 (10): 1021–30.
https://doi.org/10.1089/ast.2010.0524.
Bich, Leonardo, and Sara Green. 2018. “Is Defining Life Pointless? Operational
Definitions at the Frontiers of Biology.” Synthese 195 (9): 3919–46.
https://doi.org/10.1007/s11229-017-1397-9.
Bitbol, M., and P. L. Luisi. 2004. “Autopoiesis with or without Cognition:
Defining Life at Its Edge.” Journal of The Royal Society Interface 1 (1): 99–
107. https://doi.org/10.1098/rsif.2004.0012.
Blain, J. Craig, and Jack W. Szostak. 2014. “Progress Toward Synthetic Cells.”
Annual Review of Biochemistry 83 (1): 615–40.
https://doi.org/10.1146/annurev-biochem-080411-124036.
Boden, M. 1999. “Is Metabolism Necessary?” The British Journal for the
Philosophy of Science 50 (2): 231–48. https://doi.org/10.1093/bjps/50.2.231.
Boyd, Richard. 1999. “Homeostasis, Species, and Higher Taxa.” In Species:
New Interdisciplinary Essays, edited by R. A. Wilson, 141–85. MIT Press.
Brun, Georg. 2016. “Explication as a Method of Conceptual Re-Engineering.”
Erkenntnis 81 (6): 1211–41. https://doi.org/10.1007/s10670-015-9791-5.
Bruylants, Gilles, Kristin Bartik, and Jacques Reisse. 2010. “Is It Useful to
Have a Clear-Cut Definition of Life? On the Use of Fuzzy Logic in Prebiotic
Chemistry.” Origins of Life and Evolution of Biospheres 40 (2): 137–43.
https://doi.org/10.1007/s11084-010-9192-3.
Burgess, Alexis, and David Plunkett. 2013a. “Conceptual Ethics I: Conceptual
Ethics I.” Philosophy Compass 8 (12): 1091–1101.
https://doi.org/10.1111/phc3.12086.
———. 2013b. “Conceptual Ethics II: Conceptual Ethics II.” Philosophy
Compass 8 (12): 1102–10. https://doi.org/10.1111/phc3.12085.
Cairns-Smith, Alexander Graham. 1982. Genetic Takeover and the Mineral
Origins of Life. Cambridge University Press Cambridge.
Cappelen, Herman. 2018. Fixing Language: An Essay on Conceptual
Engineering. First edition. Oxford, United Kingdom: Oxford University
Press.
32
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
Carnap, Rudolf. 1950. Logical Foundations of Probability. Chicago: University
of Chicago Press.
Cleland, Carol E., and Christopher F. Chyba. 2007. “Does ‘life’ Have a
Definition?” In Planets and Life: The Emerging Science of Astrobiology,
edited by Woodruff T. Sullivan III and John A. Baross, 119–31. Cambridge:
Cambridge University Press.
De Duve, Christian. 2005. “The Onset of Selection.” Nature 433: 581–82.
Dennett, Daniel C. 1995. Darwin’s Dangerous Idea. Evolution and the Meaning
of Life. New-York: Simon and Schuster.
Diéguez, Antonio. 2013. “Life as a Homeostatic Property Cluster.” Biological
Theory 7 (2): 180–86. https://doi.org/10.1007/s13752-012-0052-4.
Dimmock, Nigel, Andrew Easton, and Keith Leppard. 2009. Introduction to
Modern Virology. John Wiley & Sons.
Duhr, Stefan, and Dieter Braun. 2006. “Why Molecules Move along a
Temperature Gradient.” Proceedings of the National Academy of Sciences of
the United States of America 103 (52): 19678–82.
https://doi.org/10.1073/pnas.0603873103.
Dupré, John, and Maureen A. O’Malley. 2009. “Varieties of Living Things: Life
at the Intersection of Lineage and Metabolism.” Philosophy & Theory in
Biology 1:e003.
Eigen, Manfred. 1992. Steps towards Life: A Perspective on Evolution. Oxford:
Oxford University Press.
http://repository.library.georgetown.edu/xmlui/handle/10822/545290.
Eklund, Matti. 2015. “Intuitions, Conceptual Engineering, and Conceptual
Fixed Points.” In The Palgrave Handbook of Philosophical Methods, edited
by Chris Daly, 363–385.
Ellis, Brian. 2001. Scientific Essentialism. Cambridge University Press.
England, Jeremy L. 2013. “Statistical Physics of Self-Replication.” The Journal
of Chemical Physics 139 (12): 121923. https://doi.org/10.1063/1.4818538.
Ferreira Ruiz, María J., and Jon Umerez. 2018. “Dealing with the Changeable
and Blurry Edges of Living Things: A Modified Version of Property-Cluster
Kinds.” European Journal for Philosophy of Science 8 (3): 493–518.
https://doi.org/10.1007/s13194-018-0210-z.
Firth, John R. 1957. “A Synopsis of Linguistic Theory 1930–1955.” In Studies
in Linguistic Analysis, 1–32. Oxford: Blackwell.
Forterre, Patrick. 2006. “The Origin of Viruses and Their Possible Roles in
33
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
Major Evolutionary Transitions.” Virus Research 117 (1): 5–16.
———. 2010. “Defining Life: The Virus Viewpoint.” Origins of Life and
Evolution of Biospheres 40 (2): 151–60. https://doi.org/10.1007/s11084-0109194-1.
Galperin, Michael Y., Kira S. Makarova, Yuri I. Wolf, and Eugene V. Koonin.
2015. “Expanded Microbial Genome Coverage and Improved Protein Family
Annotation in the COG Database.” Nucleic Acids Research 43 (Database
issue): D261–69. https://doi.org/10.1093/nar/gku1223.
Gánti, Tibor. 2003. The Principles of Life. Oxford ; New York: Oxford
University Press.
Gilbert, Walter. 1986. “Origin of Life: The RNA World.” Nature 319 (6055),
618.
Godfrey-Smith, Peter. 2009. Darwinian Populations and Natural Selection.
Oxford; New York: Oxford University Press.
Griesemer, James. 2003. “The Philosophical Significance of Gánti′s Work.” In
The Principle of Life, by Tibor Gánti, 169–94. Oxford: Oxford University
Press.
Hazen, Robert M. 2005. Genesis: The Scientific Quest for Life’s Origin.
Washington, DC: Joseph Henry Press.
Hutchison, Clyde A., Scott N. Peterson, Steven R. Gill, Robin T. Cline, Owen
White, Claire M. Fraser, Hamilton O. Smith, and J. Craig Venter. 1999.
“Global Transposon Mutagenesis and a Minimal Mycoplasma Genome.”
Science 286 (5447): 2165–2169.
Joyce, Gerald F. 1994. “Foreword.” In Origins of Life: The Central Concepts,
edited by David W. Deamer and Gail R. Fleischaker, xi–xii. Oxford: Jones
and Barlett.
Justus, James. 2012. “Carnap on Concept Determination: Methodology for
Philosophy of Science.” European Journal for Philosophy of Science 2 (2):
161–79. https://doi.org/10.1007/s13194-011-0027-5.
Khalidi, Muhammad Ali. 1998. “Natural Kinds and Crosscutting Categories.”
The Journal of Philosophy 95 (1): 33–50.
Kitcher, Philip. 2008. “Carnap and the Caterpillar.” Philosophical Topics 36 (1):
111–27.
Knuuttila, Tarja, and Andrea Loettgers. 2017. “What Are Definitions of Life
Good for? Transdisciplinary and Other Definitions in Astrobiology.” Biology
& Philosophy 32 (6): 1185–1203. https://doi.org/10.1007/s10539-017-96004.
34
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
Koshland, Daniel E. 2002. “The Seven Pillars of Life.” Science 295 (5563):
2215–2216.
La Scola, Bernard, Christelle Desnues, Isabelle Pagnier, Catherine Robert, Lina
Barrassi, Ghislain Fournous, Michèle Merchat, et al. 2008. “The Virophage
as a Unique Parasite of the Giant Mimivirus.” Nature 455 (7209): 100–104.
https://doi.org/10.1038/nature07218.
Lange, Marc. 1996. “Life, ‘Artificial Life,’ and Scientific Explanation.”
Philosophy of Science, 225–244.
Lederberg, Joshua. 1960. “Exobiology: Approaches to Life beyond the Earth.”
Science 132 (3424): 393–400. https://doi.org/10.1126/science.132.3424.393.
Lincoln, T. A., and G. F. Joyce. 2009. “Self-Sustained Replication of an RNA
Enzyme.” Science 323 (5918): 1229–32.
https://doi.org/10.1126/science.1167856.
Luisi, Pier Luigi. 1998. “About Various Definitions of Life.” Origins of Life
and Evolution of the Biosphere 28 (4–6): 613–622.
———. 2006. The Emergence of Life: From Chemical Origins to Synthetic
Biology. Cambridge: Cambridge University Press.
Machery, Edouard. 2012. “Why I Stopped Worrying about the Definition of
Life... and Why You Should as Well.” Synthese 185 (1): 145–64.
https://doi.org/10.1007/s11229-011-9880-1.
———. 2016. “Experimental Philosophy of Science.” In A Companion to
Experimental Philosophy, edited by Justin Sytsma and Wesley Buckwalter,
473–90. John Wiley & Sons, Ltd.
https://doi.org/10.1002/9781118661666.ch33.
———. 2017. Philosophy within Its Proper Bounds. New York: Oxford
University Press.
Madigan, Michael T. 2015. Brock Biology of Microorganisms. Fourteenth
edition. Boston: Pearson.
Maher, Patrick. 2007. “Explication Defended.” Studia Logica 86 (2): 331–41.
https://doi.org/10.1007/s11225-007-9063-8.
Malaterre, Christophe. 2010a. Les origines de la vie: émergence ou explication
réductive? Paris : Hermann.
———. 2010b. “Lifeness Signatures and the Roots of the Tree of Life.”
Biology & Philosophy 25 (4): 643–58. https://doi.org/10.1007/s10539-0109220-8.
———. 2010c. “On What It is to Fly Can Tell Us Something About What It is
35
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
to Live.” Origins of Life and Evolution of Biospheres 40(2): 169-177.
https://doi.org/10.1007/s11084-010-9196-z.
———. 2013. “Microbial diversity and the lower-limit problem of
biodiversity.” Biology and Philosophy 28: 219–239.
Mariscal, Carlos, and W. Ford Doolittle. 2018. “Life and Life Only: A Radical
Alternative to Life Definitionism.” Synthese, July.
https://doi.org/10.1007/s11229-018-1852-2.
Martin, W., and M. J. Russell. 2003. “On the Origins of Cells: A Hypothesis for
the Evolutionary Transitions from Abiotic Geochemistry to
Chemoautotrophic Prokaryotes, and from Prokaryotes to Nucleated Cells.”
Philosophical Transactions of the Royal Society B: Biological Sciences 358
(1429): 59–85. https://doi.org/10.1098/rstb.2002.1183.
Maynard Smith, John, and Eörs Szathmáry. 1997. The Major Transitions in
Evolution. OUP Oxford.
Mix, Lucas John. 2015. “Defending Definitions of Life.” Astrobiology 15 (1):
15–19. https://doi.org/10.1089/ast.2014.1191.
Moreira, David, and Purificación López-García. 2009. “Ten Reasons to Exclude
Viruses from the Tree of Life.” Nature Reviews Microbiology 7 (4): 306–
311.
Moreno, Alvaro, and Matteo Mossio. 2015. Biological Autonomy. Vol. 12.
History, Philosophy and Theory of the Life Sciences. Dordrecht: Springer
Netherlands. https://doi.org/10.1007/978-94-017-9837-2.
Morowitz, Harold J. 1992. Beginnings of Cellular Life: Metabolism
Recapitulates Biogenesis. New Haven: Yale University Press.
Nakabachi, A., A. Yamashita, H. Toh, H. Ishikawa, H. E. Dunbar, N. A. Moran,
and M. Hattori. 2006. “The 160-Kilobase Genome of the Bacterial
Endosymbiont Carsonella.” Science 314 (5797): 267–267.
https://doi.org/10.1126/science.1134196.
Nghe, Philippe, Wim Hordijk, Stuart A. Kauffman, Sara I. Walker, Francis J.
Schmidt, Harry Kemble, Jessica AM Yeates, and Niles Lehman. 2015.
“Prebiotic Network Evolution: Six Key Parameters.” Molecular BioSystems
11 (12): 3206–3217.
Norman, Anders, Lars H. Hansen, and Søren J. Sørensen. 2009. “Conjugative
Plasmids: Vessels of the Communal Gene Pool.” Philosophical Transactions
of the Royal Society B: Biological Sciences 364 (1527): 2275–2289.
Palacci, J., S. Sacanna, A. P. Steinberg, D. J. Pine, and P. M. Chaikin. 2013.
“Living Crystals of Light-Activated Colloidal Surfers.” Science 339 (6122):
36
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
936–40. https://doi.org/10.1126/science.1230020.
Pályi, Gyula, Claudia Zucchi, and Luciano Caglioti. 2002. Fundamentals of
Life. Paris: Elsevier.
Pirie, N. W. 1937. “The Meaninglessness of the Terms Life and Living.” In
Perspectives in Biochemistry, edited by Joseph Needham and David E.
Green, 11–22. Cambridge: Cambridge University Press.
Popa, Radu. 2004. Between Necessity and Probability: Searching for the
Definition and Origin of Life. Vol. 2. Springer.
Prigogine, Llya, Gregoire Nicolis, and Agnes Babloyantz. 1972.
“Thermodynamics of Evolution.” Physics Today 25 (12): 38–44.
https://doi.org/10.1063/1.3071140.
Prusiner, Stanley B. 1982. “Novel Proteinaceous Infectious Particles Cause
Scrapie.” Science 216 (4542): 136–144.
Raoult, D. 2004. “The 1.2-Megabase Genome Sequence of Mimivirus.” Science
306 (5700): 1344–50. https://doi.org/10.1126/science.1101485.
Raulin, François. 2010. “Searching for an Exo-Life in the Solar System.”
Origins of Life and Evolution of Biospheres 40 (2): 191–93.
https://doi.org/10.1007/s11084-010-9199-9.
Ruiz-Mirazo, Kepa, Juli Peretó, and Alvaro Moreno. 2004. “A Universal
Definition of Life: Autonomy and Open-Ended Evolution.” Origins of Life
and Evolution of the Biosphere 34 (3): 323–346.
Saunders, Keith, and John Stanley. 1999. “A Nanovirus-like DNA Component
Associated with Yellow Vein Disease of Ageratum Conyzoides: Evidence for
Interfamilial Recombination between Plant DNA Viruses.” Virology 264 (1):
142–152.
Schupbach, Jonah N. 2017. “Experimental Explication.” Philosophy and
Phenomenological Research 94 (3): 672–710.
https://doi.org/10.1111/phpr.12207.
Schuster, Peter. 1984. “Evolution between Chemistry and Biology.” Origins of
Life and Evolution of Biospheres 14 (1): 3–14.
Seager, S., W. Bains, and J.J. Petkowski. 2016. “Toward a List of Molecules as
Potential Biosignature Gases for the Search for Life on Exoplanets and
Applications to Terrestrial Biochemistry.” Astrobiology 16 (6): 465–85.
https://doi.org/10.1089/ast.2015.1404.
Segré, Daniel, Dafna Ben-Eli, David W. Deamer, and Doron Lancet. 2001.
“The Lipid World.” Origins of Life and Evolution of the Biosphere 31 (1–2):
119–145.
37
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
Shepherd, Joshua, and James Justus. 2015. “X-Phi and Carnapian Explication.”
Erkenntnis 80 (2): 381–402. https://doi.org/10.1007/s10670-014-9648-3.
Smith, Kelly C. 2016. “Life Is Hard: Countering Definitional Pessimism
Concerning the Definition of Life.” International Journal of Astrobiology 15
(04): 277–89. https://doi.org/10.1017/S1473550416000021.
Tirard, Stephane, Michel Morange, and Antonio Lazcano. 2010. “The
Definition of Life: A Brief History of an Elusive Scientific Endeavor.”
Astrobiology 10 (10): 1003–1009.
Trifonov, Edward N. 2011. “Vocabulary of Definitions of Life Suggests a
Definition.” Journal of Biomolecular Structure and Dynamics 29 (2): 259–
266.
38
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
Figures and tables
Table 1. List of categories and topics per category
Categories
Topics
Number
of topics
A-Metabolism CARBOHYDRATE
METABOLISM;
CELLULOSE
DIGESTION;
FERMENTATION; ENERGY DISSIPATION; THERMODYNAMICS; FATTYACIDS/PHOSPHOLIPIDS;
LIPIDS/CHOLESTEROL; NUTRITION;
COENZYMES/ENZYMES/CATALYSTS; COFACTORS; METABOLITES;
OTHER ORGANICS; OXYDO-REDUCTION
13
B-Proteins/
catalysis
PROTEOSOME; PEPTIDES; PROTEINS; GLYCAN; PROTEIN
INTERACTIONS; AMINO-ACIDS; MOIETY; MOLECULAR STRUCTURE;
PROTEIN MOTIFS; PROTEIN STRUCTURE
10
C-Cellular/
structural
features
CELL DIVISION; CYTOPLASM/NUCLEUS; CYTOSKELETON;
ORGANELLES;
ABIOTIC
STRESS;
CHELATION/METALTRANSPORTERS;
OXYDATIVE STRESS; TRANSMEMBRANE
8
TRANSPORTERS
D-Micro/
BACTERIOPHAGES; CELL RECEPTORS; DEPOLARIZATION;
macro
EXTRACELLULAR MATRIX; SIGNALING; CLIMATE; GEOGRAPHIC
environment DISTRIBUTION; MARINE; PHYLOGEOGRAPHY; SYMBIOSIS;
MOTILITY; QUORUM SENSING/BIOFILM
12
E-Plants/
animals
related
CATTLE MEAT; CATTLE PATHOLOGY; CATTLE SPECIES; INSECT
GENOMES; INSECTS/CRUSTACEANS; LARVAE; MOSQUITOES;
OOGENESIS; PARASITIC INFECTIONS; SALIVARY GLANDS (TICKS,
MOSQUITOES); SEXUAL REPRODUCTION; SOCIAL INSECTS; FISH;
POULTRY/AVIAN
VIRUSES;
PRIMATES;
CEREALS;
FRUITS/VEGETABLES; OTHER PLANTS; PLANT MORPHOLOGY;
PLANT PATHOLOGY
20
F-Humans
related
BLOOD;
BLOOD
CIRCULATION;
GEOMETRY;
HISTOPATHOLOGY/LESIONS;
MORPHOLOGY;
MORPHOLOGY/ANATOMY; NERVOUS SYSTEM; OCULAR;
ODONTOLOGY; ONTOGENY; OSSIFICATION; VENTRICULAR; CELL
LINES; DNA DAMAGE; EPITHELIUM CELL DIFFERENTIATION;
IMMUNE RESPONSE; INDUCED APOPTOSIS; LEUKEMIA/CANCER;
STEM CELLS; TISSUE INFLAMMATION; TRANSPLANTATION;
TUMORS; PATHOGENICITY/INTESTINAL
23
G-Variation/
adaptation/
speciation
ADAPTATION;
ADAPTATIONISM;
FITNESS/COMPETITION;
CHROMOSOMES; MUTATION; POLYPLOIDY; RECOMBINATION;
TRANSPOSONS; ENDOSYMBIOSIS; GENETIC DRIFT; GENOTYPE;
LINEAGE/EVOLUTION; PHYLOGENY; SEGREGATION/GENOTYPE;
SPECIATION; TAXA
16
H-Genetics
METHYLATION;
METHYLATION/CHROMATIN;
POSTTRANSLATIONAL MODIFICATIONS; DATA-BASE; GENE SEQUENCES;
SEQUENCES;
REPLICATION;
TRANSCRIPTION
FACTORS;
TRANSLATION;
ADDITIONAL
DATA;
GENE
EXPRESSION/TRANSCRIPTION;
GENE
TRANSCRIPTION;
MRNA/SPLICING
13
I-Applications BIOACTIVITY;
ORGANIC
CHEMISTRY;
BACTERIAL
/varia
IDENTIFICATION/DIAGNOSIS; BIOREMEDIATION; IMMUNIZATION;
INFECTION; VACCINATION; VIRION/INFECTIOUS MECHANISM;
NANO-PARTICLES;
BLOOD;
CLINICAL
STUDIES;
DIAGNOSTICS/BIOPSIES; EPIDEMIOLOGY; ETIOLOGY/ PATHOLOGY;
FOETUS MALFORMATION; MALFORMATION; CONDITION/
OXYGENATION; MUSCLE TISSUE/FIBERS; INTESTINAL; ANTIBIOTIC
19
SUSCEPTIBILITY
J-Experiment EUTHANASIA AND ANESTHESIA; LIGHT AND HUMIDITY
CONDITIONS; PRECIPITATION; PROTEIN TAGS; TRANSDUCTION;
TRANSFECTION; BOTTLE; CENTRIFUGATION; FREEZE AND GRIND;
INCUBATOR AND CELL CULTURE; MICROPLATE PREPARATION;
PLASTIC CONTAINER; CHROMATOGRAPHY; FLUORESCENCE;
MICROSCOPY; SECTION; SPECTROMETRY; SPECTROSCOPY;
CLONING/VECTOR;
ELECTROPHORESIS;
EXPERIMENTAL
ANTIBODIES; MOLECULAR LABELING; PCR; PROTOCOL; GROWTH
RATE; OSCILLATION; LOGTRANSFORM; MICROARRAY
28
K-Method
38
BELIEF; CLAIM; EXTRANEOUS; INTERDEPENDENCE; ACADEMIC
INSTITUTIONS/GUIDELINES;
AUTHORSHIP/SOURCES;
HIERARCHICAL CLUSTERING METHOD; PUBLIC INSTITUTIONS;
SOCIETY/ECONOMY; EFFECT; FACT/UNDERDETERMINATION;
INHIBITORY MECHANISM; KNOCKOUT/MUTATION EFFECT;
MECHANISM; ABBREVIATION; COMPUTER; DATA CURATION;
UPLOAD/DOWNLOAD; APPROXIMATION; BAYESIAN MODELING;
CALCULATION; COVARIATION; CURVE FITTING; DATA
DISTRIBUTION
39 ; DATA PLOT; HYPOTHESIS PREFERENCE;
MATHEMATICAL MODEL; RELIABILITY; STATISTICAL ANALYSIS
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
Table 2. Dimensions of lifeness
Dimensions of lifeness
Subject matter
A – Metabolism
MATTER-ENERGY
B – Catalysis and synthesis of catalysts
C – Elaboration of cellular/ structural features
STRUCTURE
D – Micro/macro environment (interactions)
E – Plants/animals related (interactions)
ENVIRONMENT
INTERACTIONS
F – Humans related (interactions)
G – Evolvability
EVOLUTION
H – Information encoding and genetics
INFORMATION-PROGRAM
Dimensions of lifeness as inferred from topic categories, and sorted by subject
matter.
40
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
Fig. 1. Lifeness dimensions association strengths. (Left) Lifeness space for “bacteria”, “archaea”, “virus” and “prion”. (Right) Lifeness space for
“bacteria”, “phage”, “plasmid” and “adenine”. All measures normalized to “bacteria”.
41
Beyond categorical definitions of life: a data-driven approach to assessing lifeness
Fig. 2. Degrees of lifeness. Euclidean distance of each entity to bacteria, computed by giving an equal
weight to each dimension, and normalized to bacteria for the 1-value and adenine for the 0-value.
42