Validation of the Flourishing Scale (FS), Greek Version and Evaluation of Two Well-Being Models ()
1. Introduction
Well-being has been the focal point of western and eastern moral philosophers for thousands of years, and has also been the central focus of positive psychology (Linley, Maltby, Wood, Osborne, & Hurling, 2009; Cummins, 2003; Diener & Lucas, 1999; Keyes et al., 2002) . The positive psychology movement highlighted that well-being is not the absence of ill-being. That is, lack of negative affect and illness does not equal the presence of positive affect and well-being or flourishing (Seligman, 2002; Seligman, 2011) . Generally, two well-being approaches emerged: the subjective well-being (hedonic) and the psychological well-being (eudaimonic), or hedonia and eudemonia (Ryan & Deci, 2001) .
Hedonia emphasizes on pleasure and avoidance of any displeasure (Diener et al., 2003) , subjective happiness, and enjoyment (Delle Fave et al., 2011) . On the other hand, eudaimonia focuses on personal growth, self-actualization, and purpose in life (Ryan & Deci, 2001; Ryff, 1989; Ryff & Keyes, 1995) , drawing on the humanistic psychology (Silva & Caetano, 2013). Additional eudaimonic traits are optimism, self-esteem, enjoyment from self-expressive activities, autonomy, and vitality (Ruini, 2016; Ryan & Deci, 2001; Ryff, 1989; Waterman, 2008 ).
Generally, flourishing is closely related to eudemonic well-being (Deci & Ryan, 2008; Ryan & Deci, 2001) . In fact, Diener et al. (2010) initially called FS “Psychological Well-being” (Diener et al., 2009) , but consequently this name was changed to Flourishing Scale (FS) because it comprised additional dimensions beyond eudaimonic well-being. Flourishing is defined as a state of optimal psychological functioning (Keyes et al., 2002; Giuntoli et al., 2017) . Keyes et al. (2002) described flourishing as an integration of three states: emotional, psychological, and social well-being. Specifically, flourishing integrates self-determination theory (Deci & Ryan, 2000) , the optimal experience or flow model (Csikszentmihalyi, 1990) , the psychological well-being model (Ryff, 1989) , and psychological, emotional and social well-being (Keyes et al., 2002) .
The operationalization of the above flourishing models brought about a plethora of eudaimonic well-being measures. The Flourishing Scale (FS) was developed by Diener et al. (2010) adding to the flourishing measurement instruments by providing a brief measure that combined the essential components of previous models: life meaning and purpose (Ryff, 1989; Seligman, 2002; Steger et al., 2006) , quality in interpersonal relationships (Deci & Ryan, 2000; Ryff, 1989) , flow and engagement (Csikszentmihalyi, 1990; Seligman, 2011) , contribution to the well-being of others (Putnam, 2000) , personal competence (Deci & Ryan, 2000; Ryff, 1989) , self-acceptance (Ryff, 1989) , optimism (Carver &Scheier, 2009) , receiving others’ respect (Brown, Nesse, Vinokur, & Smith, 2003) , measured by eight items.
In the original FS study (Diener et al., 2010) , internal consistency reliability was .87 (Diener et al., 2010) . Principal axis factor analysis indicated a single factor, explaining 53% of the variance, an eigenvalue of 4.24, and loadings ranging from .61 to .77. Scores across six university students’ samples varied from 42.6 to 48.1. FS was reported to have high convergent validity with other well-being measures (Basic Needs Satisfaction Scale by Ryan & Deci, 2001 ; Scale of Psychological Well-being by Ryff, 1989 ; and Satisfaction with Life Scale by Diener et al., 1985 ). The validation of the FS in Italian (Giuntoli et al., 2017) , Indian (Singh et al., 2017), Portuguese (Silva & Caetano, 2013) , Japanese (Sumi, 2014) , and Iranian (Khodarahimi, 2013) samples followed.
The Italian FS version (Giuntoli et al., 2017) was validated in two samples also indicating a single-factor structure using Confirmatory Factor Analysis (CFA). Measurement invariance was also examined. Internal consistency was .88 and convergent validity was examined with SPANE (Diener et al., 2010) , PANAS (Watson, Clark, & Tellegen, 1988) , SHS (Lyubomirsky & Lepper, 1999) , PWI (International Well-being Group, 2006), and SGI (International Well-being Group, 2006). The Indian FS version (Singh et al., 2017) was validated in three samples with Exploratory Factor Analysis (EFA) and two Confirmatory Factor Analyses (CFA) confirming the single factor structure. Mean scores per gender was 43.10 for males and 44.65 for females and Cronbach’s alpha in the three samples was .80, .91 and .85. Convergent validity was examined using SPANE (Diener et al., 2010) and the Mental Health Continuum-Short Form (Keyes, 2009) , and it was supported.
In the Portuguese version (Silva & Caetano, 2013) , internal consistency reliability in two samples was .83 (adults) and .78 (university students). The unidimensionality of FS factor structure was supported as indicated by a CFA and a Multi-group CFA (MGCFA). Mean scores were 42.92 (adults) and 44.51 (students). Convergent validity was established by statistically significant correlations with the Scale of Positive and Negative Experience (SPANE; Diener et al., 2010 ), the Satisfaction with life scale (SWLS; Diener et al., 1985 ), the Subjective Happiness Scale (SHS; Lyubomirsky & Lepper, 1999 ) and the Single-item measure of happiness (Fordyce, 1977, 1988) . The Japanese adaption of FS (Sumi, 2014) had a Cronbach’s alpha of .95 and the single factor structure was confirmed. Mean score was 36.63 and convergent and discriminant validity were supported by examining the relationship of FS with SPANE (Diener et al., 2010) , SWLS (Diener et al., 1985) , SHS (Lyubomirsky & Lepper, 1999) , the Revised Life Orientation Test (R-LOT; Scheier et al., 1994 ), the PANAS (Watson et al., 1988), and Hopkins Symptoms Checklist (HSCL; DeLogatis et al., 1974 ). Finally, the Iranian version of FS was validated in a sample of adults. EFA indicated a single factor explaining 59.46% of the variance. Cronbach’s alpha was .89 (Khodarahimi, 2013) . Concurrent validity was established using the Satisfaction with Life Scale (Diener et al., 1985) .
Regarding the validation of FS in different age groups, studies in an Indian cultural context used FS on school children (Singh & Junnarkar, 2015; Singh, Ruch, & Junnarkar, 2014 ) and adolescents (Singh et al., 2017) . MGCFA findings suggested that FS is age invariant (Singh et al., 2017) . Reported score in the adolescent sample (11 - 17 years) was higher (45.99) than the adult sample (40.29). Moreover, the FS Italian version (Giuntoli et al., 2017) was tested in a special population, namely unemployed adults. They reported lower FS scores in this special population than the control group. To put it in a nutshell, the unidimensionality of the FS was validated in Western, Eastern and Asiatic cultures, i.e. individualistic and collectivistic ones (Hofstede, 2001; Triandis, 1995) , special populations and different age groups.
Flourishing showed a positive correlation with resilience, positive affect, mental health continuum and its dimensions (Keyes, 2009) and SPANE-P (positive experiences; Diener et al., 2010 ). It was negatively correlated with SPANE-N (negative experiences; as cited in Singh et al., 2017 ). Additionally, Diener et al. (2010) suggested that FS and SPANE could be integrated to a three-factor model of well-being (flourishing, positive feelings, and negative feelings). This hypothesis was later evaluated in other studies (Howell & Buro, 2015; Singh et al., 2017; Giuntoli et al., 2017) .
Well-being literature also devised the two continua model of mental health (see Keyes, 2005; Lim, 2014; Lupano Perugini et al., 2017; Petrillo et al., 2015 ) to empirically evaluate the hypothesis that mental health and mental illness are two distinct but linked dimensions (Keyes, 2005) . This is yet another compound model containing a mental illness measure and a mental health measure. Testing alternative one-factor and two-factors models with EFA and CFA, this model empirically suggested that mental health and mental illness are not the opposite ends of a bipolar continuum; instead, they are two distinct but correlated dimensions (see Perugini et al., 2017; Lamers et al., 2011; Petrillo et al., 2015 ).
The present study focuses on the following objectives: 1) To validate the FS, Greek version in an adult sample of the general adult population using the 3-faced validation method (Kyriazos, Stalikas, Prassa, & Yotsidi, 2018a, 2018b) with EFA and CFA; 2) To establish strict measurement invariance across gender; 3) To test two well-being models using FS as a well-being measure, namely the two continua model (Keyes, 2002, 2005) and the Tripartite Model of Mental well-being also proposed by ( Keyes, 2002 ; see also Diener et al., 2010 ); 4) To examine internal consistency reliability and convergent/discriminant validity with 12 different measures.
2. Method
2.1. Participants
The sample consisted of 2272 Greek adults of the general population (females 63%) aged M = 35.54 years (SD = 12.35). The 51% of the participants were older than 33 years of age and an equal percentage of them were single (51%). The rest were either married/living together (41%) or divorced (8%). Most of the participants (59%) did not have children, or had either 1 (14%), 2 children (22%), or more (5%). Most participants had a bachelor degree (42%), finished high-school (24%), held a postgraduate degree (19%), were undergraduate university students (14%), or received primary education (1%).
2.2. Materials
1) Flourishing Scale (FS)
The FS (Diener et al., 2009, 2010) is an 8-item unidimensional measure about general aspects of positive human functioning (e.g., “I am a good person and live a good life”). Responses are rated on a 7-point Likert scale from strong disagreement (1) to strong agreement (7). All items are positively worded. Possible score ranges from 8 (minimum flourishing) to 56 (maximum flourishing). Diener et al. (2010) reported an internal consistency reliability of α = .87. FS was translated in Greek by Stalikas, Kyriazos, & Kotsoni (2017) adopting the translation/back-translation method (Brislin, 1970) .
2) Scale of Positive and Negative Experience (SPANE-12)
This is a 12-item scale of subjective well-being. It contains two dimensions of positive and negative experiences, with six one-word items each. Items are scored on a Likert scale from 1 (very rarely or never) to 5 (very often or always). Experiences are evaluated on a 4-week time frame. Score on each dimension (SPANE-P and SPANE-N) can range from 6 to 30 and it is separately calculated. Their difference (Affect Balance or SPANE-B) can vary from −24 to 24. In this study, Cronbach’s alpha for SPANE-P, SPANE-N and SPANE total was .90, .85 and .91 respectively.
3) Scale of Positive and Negative Experience 8 (SPANE-8)
Except for the original version (SPANE-12), this study also included a second version (SPANE-8) with 8 items (4 in SPANE-P and 4 in SPANE-N). SPANE-8 (Kyriazos, Stalikas, Prassa, & Yotsidi, 2018b) is a revised structure containing one general feeling per dimension (item pleasant in positive experiences and bad in negative ones) instead of 3 in the original SPANE (Diener et al., 2010: p. 145) . This resulted in a briefer structure with four positive (Pleasant, Happy, Joyful, Contented) and four negative (Bad, Sad, Afraid, Angry) items. Cronbach’s Alpha in this study was .85 for SPANE-8 P, .75 for SPANE-8 N and .84 for the total SPANE-8.
4) Warwick-Edinburgh Mental Well-Being Scale (WEMWBS)
The WEMWBS (Universities of Warwick and Edinburgh; Tennant et al., 2007 ) is a 14-item unidimensional measure of mental well-being and psychological functioning. Items are rated on a 5-point Likert scale from “None of the time” to “All of the time” indicating frequency. Score is summed ranging from 14 to 70. WEMWBS has been reported to have adequate internal consistency reliability (.89 in student samples and .91 in adult samples; Tennant et al., 2007 ). Internal consistency reliability in this study was α = .91.
5) Brief Resilience Scale (BRS)
BRS (Smith, Dalen, Wiggins, Tooley, Christopher, & Bernard, 2008) is a 6-item measure of resilience, focusing on the ability to bounce back from stress and hardship. Responses are rated on a 5-point Likert scale from Strongly Disagree (1) to Strongly Agree (5). Smith et al. (2008) reported Cronbach’s alpha from .80 to .91. In this study Cronbach’s alpha was α = .80.
6) Mental Health Continuum-Short Form (MHC-SF)
Mental Health Continuum-Short Form (Keyes et al., 2008) is a 14-item measure of well-being proposed by Keyes (2002) with 3 factors: emotional (EWB), social (SWB) and psychological well-being (PWB). Responses are rated on a 6-point Likert scale, indicating frequency (never, once or twice a month, about once a week, two or three times a week, almost every day, every day) using a 4-week time frame. Internal consistency reliability for the total MHC-SF scale was reported by Keyes (2005) to be adequate (Cronbach’s alpha > .80). Internal reliability for the total scale in this study was α = .90.
7) The Gratitude Questionnaire (GQ-6)
The GQ-6 (McCullough, Emmons, & Tsang, 2002) is a 6-item measure of gratitude experience in everyday life. Items are rated on a 7-point Likert scale (1 = strongly disagree, 7 = strongly agree). GQ-6 has a unidimensional structure. Possible scores range from 6 (less grateful) to 42 (most grateful). Items 3 and 6 are reverse-scored. The internal consistency reliability of GQ-6 in the original study was .82 (McCullough, Emmons & Tsang, 2002) and in this study it was estimated to be α = .68.
8) Meaning in Life Questionnaire (MLQ)
The MLQ (Steger et al., 2006) measures the presence of and search for meaning in life, with a total of 10 items in two factors (presence of meaning and search for meaning). All items are rated on a 7-point Likert scale (from “Absolutely True” to “Absolutely Untrue”). Possible scores for each scale range from 9 - 29 for Presence of Meaning and from 5 - 35 for Search for Meaning. Internal reliability in this study for the total scale was α = .78
9) Satisfaction with Life Scale (SWLS)
The Satisfaction with Life Scale (Diener, Emmons, Larsen, & Grifin, 1985) is a 5-item measure of perceived life satisfaction. Items are rated on a 7-point Likert scale (1 = Strongly Disagree, 7 = Strongly Agree). Internal consistency reliability (Cronbach’s alpha) has been reported to be from .79 to .89 (Pavot & Diener, 1993) . In this study it was α = .88.
10) Trait Hope Scale (HS)
Trait Hope Scale (Snyder et al., 1991) measures trait hope with two factors: Agency and Pathways. Items are rated from 1 (Definitely False) to 8 (Definitely True) resulting to a possible score from 8 to 64. Snyder et al. (1991) reported that Cronbach’s alphas for the total scale varied from .74 to .84. Internal reliability in this study was α = .89.
11) World Health Organization Quality of Life-Brief scale (WHOQOL-BREF)
WHO Quality of Life-Brief scale (WHOQOL Group, 1998a, 1998b) is an assessment tool of perceived quality of life. It is the short version of the WHOQOL-100 (c.f. Skevington, 1999). It contains 26 items reflecting all life quality facets of WHOQOL-100. Responses are rated on a 5-point Likert scale indicating either intensity, or capacity, or frequency, or judgment (Skevington et al., 2004) . The instrument is divided in four QOL domains: Physical health, Psychological health, Social Relations, and Environment. Cronbach’s alphas are .82, .81, .68, and .80 respectively (Skevington, Lotfy, & O’Connell, 2004) . Cronbach’s alpha for the total scale in this study was α = .91.
12) Depression Anxiety Stress Scale (DASS)
DASS (Lovibond & Lovibond, 1995) measures emotional distress with three 7-item dimensions, namely depression, anxiety and stress. The total of 21 items are rated on a 4-point Likert scale assessing intensity/frequency of distress (from 0 = did not apply to me at all to 3 = applied to me very much, or most of the time) over the past week. The higher the score the more intense or frequent the emotional distress. Each factor has a distinct score ranging from 0 to 21. Internal consistency reliability was reported α =.97 for adults of the general population (Henry & Crawford, 2005) . Internal consistency reliability in this study for Depression, Anxiety, Stress and DASS-21 Total was α = .90, .88, .89 and .95 respectively (see also Kyriazos, et al., 2018a ).
13) Depression Anxiety Stress Scale, Short form (DASS-9)
DASS-9 ( Yusoff, 2013 and in Greek by Kyriazos, Stalikas, Prassa, Yotsidi, 2018a ) is a short form of DASS-21 (Lovidond & Lovibond, 1995) . This version is a post hoc measure, empirically derived by Yusoff (2013) . DASS-9 evaluates emotional distress with three 3-item dimensions, like DASS original (depression, anxiety, stress). All nine items are rated on a 4-point Likert scale evaluating both intensity and frequency of symptoms over the last week (from 0 = did not apply to me at all to 3 = applied to me very much, or most of the time). The higher the score the more intense/frequent the emotions of distress. Each factor has a discrete score varying from 0 to 9. Cronbach’s alphas for Depression, Anxiety and Stress factors were .52, .57, and .55 respectively, as reported by Yussof (2013). In this study, internal reliability for Depression, Anxiety, Stress and DASS-9 Total was .79, .77, .73 and .89 correspondingly (Kyriazos et al., 2018a) .
2.3. Procedure
One hundred and fifty undergraduate psychology students assisted the online data collection procedure. Students forwarded a link to an electronic test battery (in Google Forms© format) to 15 - 20 adults from their social milieu. Students received extra-credit for their participation in the study. All the fields of the digital battery form were set as required. An introduction informed participants about the purpose of this study. Data were collected based on the following process. First, students received a brief, free workshop about the administration of psychology questionnaires in a digital form. Then, a period of pilot-testing followed to track potential problems in the procedure and record the completion time (approximately 15 minutes). After successful pilot testing, students received a link to the official study.
2.4. Research Design
Research was carried out in two levels: 1) on three subsamples (EFA, CFA1 and CFA2) to evaluate construct validity with EFA and confirm it with CFA; 2) on the full sample to evaluate strict measurement invariance across gender. This is a construct validation procedure we called the “3-faced construct validation method” (see Kyriazos et al., 2018a, 2018b ). See Table 1 for an overview of the method.
![]()
Table 1. Overview of the 3-faced construct validation method as implemented in FS.
EFA = Exploratory factor analysis, ICM-CFA = Independent cluster model confirmatory factor analysis.
Regarding the factor analysis methods used in this study, in the first subsample (EFA subsample), Exploratory Factor Analysis (EFA) was carried out. Independent Cluster Model Confirmatory Factor Analysis (ICM-CFA) was applied in the second subsample (CFA 1 subsample). Only ICM-CFA models were tested because FS is a unidimensional scale and when the Exploratory Structural Equation Modeling (ESEM) model includes one factor, it becomes an ICM-CFA model ( Asparouhov & Muthen 2009; Marsh, Morin, Parker & Kaur, 2014). Similarly, CFA second-order models were not possible in a unidimensional structure (Wang & Wang, 2012) . The optimal model that emerged from the CFA 1 subsample was cross-validated in a different subsample of equal power (CFA 2). Then, a multi-group CFA (MGCFA) was carried out in the entire sample (N = 2272) using the CFA 2 optimal model as a baseline model, to test for strict measurement invariance across gender (see Table 1 for an overview of this method). A reliability analysis (α and ω) was carried out in the total sample. AVE Convergent validity and Convergent/Discriminant validity based on correlation analysis were evaluated in the entire sample using 12 measures of mental distress, well-being, positivity and quality of life. Next, two CFA Well-being Models were evaluated: one using FS and DASS (Two Continua Model; Keyes, 2005 ) and a second using FS and SPANE (Diener et al., 2010) . Finally, normative data were calculated.
Data were collected electronically on Google Forms® and were analyzed with SPSS Version 25 (IBM, 2017) , Stata Version 14.2 (StataCorp, 2015) , and MPlus Version 7.0 (Muthen & Muthen, 2012) .
3. Results
3.1. Missing Values and Sample Power
The total sample included N = 2272 cases. There were no missing values in the data because all the digital test-battery fields were set as required (see details in Procedure section). To validate the FS factor structure, the total sample (N = 2272) was randomly split into three (20%, 40% and 40%). The first 20% of the total sample was used for EFA (nEFA = 452), the second 40% for CFA (nCFA1 = 910), and the third 40%―a sample of equal power―for another CFA (nCFA2 = 910). Sample-splitting (Guadagnoli & Velicer, 1988; MacCallum, Browne, & Sugawara, 1996) is a method of construct validity cross-check because the researcher tests the optimal model in a different sample (Byrne, 2010; Brown, 2015) . We termed the above analysis procedure “the 3-faced construct validation method” (see details in Kyriazos et al., 2018a, 2018b ). The sample-to-variable ratio for the EFA subsample (nEFA = 452) was 65.5 cases for each variable. For both the CFA1 subsample (nCFA1 = 910) and the CFA2 subsample (nCFA2 = 910) it was 113.75 cases for each variable. A sample-to-variable ratio of 10:1 (Osborne & Costello, 2004) to 20:1 (Schumacker & Lomax, 2015) is generally accepted. Alternately, 500 - 1000 cases are generally regarded from adequate to excellent for factor analysis for scales with < 40 items like FS (Comrey & Lee, 1992; Singh et al., 2016; DeVellis, 2017) .
3.2. Univariate and Multivariate Normality
The data in the total sample and the three subsamples (N = 2272, nEFA = 452, nCFA1 = 910, nCFA2 = 910) were non-normally distributed. Kolomogorov-Smirnov tests (Massey, 1951) on each FS item were statistically significant with no exception, at p < .001, indicating that all items for the total sample violated the univariate normality assumption.
Multivariate normality was examined with four tests: 1) Mardia’s multivariate kurtosis test (Mardia, 1970) ; 2) Mardia’s multivariate skewness test (Mardia, 1970) ; 3) Henze-Zirkler’s consistent test (Henze & Zirkler, 1990) , and 4) Doornik-Hansen omnibus test (Doornik & Hansen, 2008) . The null hypothesis was rejected for all four tests with all p values < 0.0001, suggesting a violation of multivariate normality in all four samples (N = 2272, nEFA = 452, nCFA1 = 910, nCFA2 = 910).
3.3. Exploratory Factor Analysis (EFA)
In this phase of the 3-faced construct validation method, the unidimensional factor structure of FS was examined with EFA in the first subsample (20%, nEFA = 452). MLR was used as a parameter estimator (c.f. Muthen & Muthen, 2012 ). MLR is a rescaling-based estimator suitable for non-normal distributions, calculating standard errors and chi-square test, unlike similar estimation methods (Wang & Wang, 2012; Brown, 2015) . Furthermore, MLR is suitable for small sample sizes (Bentler & Yuan, 1999; Muthen & Asparouhov, 2002; Wang & Wang, 2012) like this split sample. Geomin factor rotation was used in all EFA models. EFA model fit was evaluated by the limits proposed by Hu & Bentler (1999) and Brown (2015) : RMSEA (≤.06, 90% CI ≤ .06), SRMR (≤.08), CFI (≥.95), TLI (≥.95). Additionally, the chi-square/df ratio ≤ 3 rule was also used (Kline, 2016) . Multiple fit measures when used simultaneously offer a more conservative estimation of model fit (Brown, 2015) .
Two EFA models were evaluated in the EFA subsample (n = 472). MODEL 1 was proposed by Diener et al. (2010) and contains all FS items in one factor. MODEL 2 is a two-factor model extracted from the current dataset. Both models had generally tolerable fit, however some fit measures (TLI, RMSEA) were beyond acceptable limits (See Table 2). Next, we examined the FS factor structure with CFA in a different subsample, to re-evaluate model fit.
3.4. Confirmatory Factor Analysis (CFA)
In this phase of the 3-faced construct validation method, we further examined FS dimensionality with Confirmatory Factor Analysis in a different subsample (40%, n = 910). CFA model fit was evaluated with the following criteria (Hu & Bentler, 1999; Brown, 2015) : RMSEA (≤.06, 90% CI ≤ .06), SRMR (≤.08), CFI (≥.95), TLI (≥.95), and the chi-square / df ratio less than 3 (Kline, 2016) . Again, MLR was used as a parameter estimator (c.f. Muthen & Muthen, 2012 ).
Only ICM-CFA models were tested because when the ESEM model includes one factor then it is equivalent to the classic CFA/SEM model (Asparouhov & Muthen 2009; Marsh, et al., 2014) . Similarly, CFA second order models were not possible in a single factor structure (Wang & Wang, 2012) . Given the above restriction, the following three ICM-CFA models were evaluated with ICM CFA. MODEL 1 is the original single factor model proposed by Diener et al. (2010) . MODEL 2 is a variation of MODEL 1 with error covariances added. MODEL 3 is a two factor model with items 1, 3, and 7 in Factor 1 and Items 2, 4, 5, 6, 8 in Factor 2, extracted in the previous EFA.
MODEL 1 had a poor fit. MODEL 2 had an acceptable fit with all fit measures within adequate fit limits (TLI = .939 > .90). Factor loadings were also acceptable, ranging from 0.400 to 0.589. MODEL 3 also had some measures in marginally lower values than required. To sum up, in the CFA 1 subsample (n = 910), MODEL 2―single factor model with error covariances added―showed optimal fit taking into consideration fit measures and factor loadings. Table 3 contains the fit statistics for all three models tested.
3.5. Cross-Validating the Optimal CFA Model in a Different Subsample
In this phase of the 3-faced construct validation method, we cross-validated the FS model that emerged from the CFA 1 subsample (40%, n = 910) with a second CFA in a new subsample of equal power (CFA 2, 40%, n = 910).
![]()
Table 2. Model fit for the EFA models of FS.
Factor 1 = Items 1, 3, 7, Factor 2 = items 2, 4, 5, 6, 8, FI = Factor Intercorrelations; Estimator = MLR, Factor rotation = Geomin.
![]()
Table 3. Model fit for the CFA 1 models of FS.
Factor 1 = Items 1, 3, 7, Factor 2 = items 2, 4, 5, 6, 8, FI = Factor intercorrelations; Estimator = MLR; Bold indicates optimal model fit.
The optimal FS structure that emerged from the CFA 1 subsample was the single factor proposed by Diener et al. (2010) with error covariances added. This model was successfully validated in the new subsample of equal power. All fit statistics were within acceptable limits achieving a good fit. Factor loadings were also within adequate limits (0.482 - 0.642). All model fit statistics are presented in Table 4 and the path diagram in Figure 1.
3.6. Measurement Invariance
In this phase of the 3-faced construct validation method, we used the optimal FS model cross-validated in the CFA 2 subsample (40%, n = 910) as a baseline model to test strict measurement invariance across gender in the entire sample (N = 2272). The following measurement invariance criteria were used: ΔCFI ≤ −.01, and ΔRMSEA ≤ .015 (Chen, 2007).
First, the single factor FS model with error covariances was tested separately in each gender group (males, N = 832 versus females, N = 1440), to establish a baseline model. This model had a good fit for males and equally good for females (see baseline model in Table 8). Then, this unidimensional model was tested concurrently in both gender groups (M1) presenting a good fit (see nested models in Table 6), therefore configural invariance was confirmed. Next, to examine weak invariance, factor loadings were constrained to equality. As presented in Table 6, ΔCFI and ΔRMSEA for this nested model (M2) supported weak invariance. Subsequently, indicator intercepts were constrained to equality (M3), and both ΔCFI and ΔRMSEA suggested strong invariance. Finally, for the ultimate test of measurement invariance―strict invariance (Wang & Wang 2012) ―indicator residuals were constrained to equality. The nested model comparison showed that strict measurement invariance could not be supported, with ΔCFI (but not ΔRMSEA) too high to be acceptable (See Table 5 and Table 6).
3.7. Reliability and AVE Validity
We evaluated the reliability and validity of FS over the entire sample (N = 2272) and in the three subsamples (nEFA = 452, nCFA1 = 910, nCFA2 = 910) using three
![]()
Figure 1. The path of the optimal, unidimensional model with error covariances for FS confirmed in CFA 1 and cross-validated in CFA 2 in a sample of equal power.
![]()
Table 4. Model fit for the optimal model for FS in CFA 1 in a different subsample of equal power.
Factor 1 = Items 1, 3, 7, Factor 2 = items 2, 4, 5, 6, 8, FI = Factor intercorrelations; Estimator = MLR; Error Covariances added were item 7 - 1, item 7 - 4, item 3 - 1, item 5 - 2, item 6 - 3, item 6 - 2.
![]()
Table 5. FS Baseline model for measurement invariance across gender.
Estimator = MLR.
measures: 1) Cronbach’s alpha (Cronbach, 1951) to assess internal consistency of item responses. Alpha values ≥.70 are considered adequate (Hair et al., 2010) and ≥.80 satisfactory (Nunnally & Berstein, 1994) ; 2) Omega Total coefficient (ω total; McDonald, 1999; Werts, Lim, & Joreskog, 1974 ) to examine construct reliability (Hoque et al., 2017) . For omega, a value of ≥.70 is acceptable
![]()
Table 6. Fit Measures of the nested models tested to establish measurement invariance of FS.
Estimator = MLR.
(Hair et al., 2010) ; 3) Average Variance Extracted (AVE; Fornell & Larcker, 1981 ) to evaluate convergent validity. Omega alone is unstable reliability measure, permitting a potential error variance, as high as 50%. AVE in combination with ω coefficient offers a more reliable measure of convergent validity (Malhotra & Dash, 2011) . The suggested cutoff value for AVE is .50 (Fomell & Larcker, 1981; Hair et al., 2010; Awang et al., 2015 ).
The internal reliability for all 8 items of FS in the total sample (N = 2272), measured by Cronbach’s alpha was α = .81. Omega Total reliability (McDonalds, 1999; Werts, Lim, & Joreskog, 1974 ) was .75 and AVE was .28. In the three subsamples (nEFA = 452, nCFA1 = 910, nCFA2 = 910) Cronbach’s alpha was .79, .74, and .78 respectively.
3.8. Correlation Analysis to Examine Convergent and Discriminant Validity
The relationship between FS and other constructs was examined over the total sample (N = 2272). Constructs evaluated were categorized in five groups: 1) mental distress with the 3 dimensions of the DASS-21 (Lovibond & Lovibond, 1995) and the DASS-9 (Yusoff, 2013; Kyriazos et al., 2018a) ; 2) well-being, including WEMWBS (Tennant et al., 2007) , MHC-SF (Keyes, 2008), and Satisfaction with life Scale (SWLS; Diener et al., 1985 ); 3) Affect measures comprising the Scale of Positive and Negative Experiences (SPANE; Diener et al., 2010 ) and SPANE-8 (Kyriazos et al., 2018b ) ; 4) positivity scales comprising trait HOPE (Snyder et al., 1991) , Brief Resilience Scale (BRS; Smith et al., 2008 ), Meaning in life Questionnaire (MLQ; Steger et al., 2006 ) and Gratitude 6 Questionnaire (McCullough et al., 2002) ; 5) The WHOQOL-BREF (WHOQOL Group, 1998a, 1988b) . All correlations are presented in Table 7.
The correlations between FS and Group 1 (Mental Distress Scales) were negative, ranged from moderate (−.26; DASS-21 Stress) to strong (−.41; DASS-21 Depression). The correlations between FS and Group 2 (Well-Being Scales) were on average strong (M = .56). Note that the strongest correlations were between FS and MHC-SF and WEMWB.
Concerning the correlations between FS and Group 3 (Affect Measures), FS and SPANE-8 P, SPANE-8 N and SPANE-8 B had on average a moderate to strong correlation of .50, ?.34, and .47 respectively. The correlations between FS and SPANE-12 P, SPANE-12 N and SPANE-12 B were moderately strong, .52, ?.37 and .49 respectively. Group 4 (Positivity Measures) showed positive, weak to strong correlations with FS. HOPE and Presence of Meaning were at the highest end of the range and Search for Meaning at the lowest. Regarding the correlations of FS with Group 5 (Quality of life Scales), FS had on average strong correlation with them (M = .46). All values were significant at p < 0.01 level (see Table 7 for details).
3.9. Evaluation of Well-Being Models
The following two well-being models were tested in the entire sample (N = 2272).
The Tripartite Model of Mental well-being (Keyes, 2002)
The Tripartite model of hedonic and eudaimonic well-being was evaluated using FS and SPANE (Diener et al., 2010) . It was hypothesized that FS will be equivalent to PWB and SPANE to SWB since Flourishing, PWB, and SWB are related to the eudaimonic facet of well-being. In turn, SPANE and SWB are interconnected to the hedonic facet of well-being (Singh et al., 2017) . Table 8 presents the goodness-of-fit indices for this Tripartite model. The fit of this model was acceptable and factor intercorrelations found were the following:
![]()
Table 7. Bivariate Correlations between FS and other measures.
All p values < .01.
Estimator = MLR, FI= Factor Intercorrelations, F = FS, SP = SPANE Positive, SN = SPANE Negative, D = DASS-21.
SPANE-P to FS .592, SPANE-N to FS −.457, and SPANE-N to SPANE-P −.743. Fit measures and Factor Loadings for the models are presented in Table 8 and path diagram in Figure 2(a).
The Two-Continua Model (Keyes, 2002, 2005)
The two-continua model (Keyes, 2002, 2005) suggests that mental health and mental illness are two distinct but correlated dimensions, but not two opposite ends of a single continuum (Keyes et al., 2008) . In other words, it graphically presents the central assumption of Positive Psychology, i.e. that wellbeing is not the absence of ill-being (Seligman & Csikszentmihalyi, 2000; Seligman, 2002; Seligman, 2011 ).
In order to obtain evidence for the two continua model (Keyes, 2005) , a CFA was carried out using the FS to measure mental health and the three DASS dimensions (Lovibond & Lovibond, 1995) to measure mental illness (See Table 6 and Figure 2(b)). Specifically, we created a two factor model where factor 1 was mental health represented by FS (Diener et al., 2010) , and factor 2 was mental distress represented by DASS-21 (Lovibond & Lovibond, 1995) with three dimensions collapsed in one. This two-factor model using FS and DASS-21 in two correlated factors showed acceptable fit. The correlation of FS and DASS was −.386. Mental Health dimension (FS) had factor loadings ranging from .553 to .663 while Mental Illness (DASS) had factor loadings range from .537 to .801(see Table 8 and Figure 2(b)).
3.10. Standardization of FS Scores
The means for the FS in the total sample (N = 2272) are presented in Table 9. Our data were non-normality distributed, thus means were not representative of FS scores (Crawford & Henry, 2004). Therefore, Table 9 converts FS scores to percentiles. The 50% of the respondents scored ≤ 46. For the original FS, more than half of the respondents (53%) in the US also scored ≤ 46, range 8 - 56 (Diener et al., 2010) .
![]()
Table 9. Summary statistics and FS raw scores converted to percentiles.
4. Discussion
The purpose of the present study was to validate the Flourishing scale (Diener et al., 2010) in a Greek adult sample of the general population. We adopted the construct validation procedure called “3-faced construct validation method” (Kyriazos et al., 2018a, 2018b) . First the sample was split (Guadagnoli & Velicer, 1988; MacCallum, Browne, & Sugawara, 1996) in three subsamples (20% for EFA, 40% for a first CFA (CFA1) and 40% for a second equal-power CFA (CFA 2)]. Generally, sample-splitting is considered to be a cross-check method of construct validity (Brown, 2015; Byrne, 2010) . The sample power was multiple times above the proposed limits (Osborne & Costello, 2004; Schumacker & Lomax, 2015; Singh et al., 2016; DeVellis, 2017) suggesting that factor loadings had robustness (Linley et al., 2009; Kline, 2016) . In the first 20% part a factor structure was established with EFA. Two models were extracted in this phase. The unidimensional FS structure proposed by Diener et al. (2010) and a two-factor model with factor 1 containing items 1, 3, and 7 and factor 2 containing items 2, 4, 5, 6 and 8. Models showed a hardly tolerable fit, with some goodness-of-fit measures beyond acceptability (TLI, and RMSEA).
Next, we examined the FS factor structure further in the second subsample (40%) using Confirmatory Factor Analysis. This was the second phase of the 3-faced construct validation method. Three ICM-CFA models were tested because single factor ESEM models are equivalent to the ICM-CFA models ( Asparouhov & Muthen 2009; Marsh et al., 2014) . Similarly, the single factor structure of FS excluded the possibility of evaluating higher order structures like second order CFA (Wang & Wang, 2012) . The optimal FS structure that emerged from the CFA 1 subsample was the single factor proposed by Diener et al. (2010) with error covariances added. The model had goodness-of-fit measures in satisfactory levels. Error covariances were added in item 7 with item 1, in item 7 with item 4, in item 3 with item 1, in item 5 with item 2, in item 6 with item 3, and in item 6 with item 2. Generally, error covariances are regarded an overfitting when theoretically unfounded, however in this case FS is a unification of different well-being theories and error covariance is at some extend tolerable to account for content overlap and complexity of the theories.
In the next phase of the 3-faced construct validation method, we cross-validated the optimal unidimensional model of FS confirmed in the CFA 1 subsample (40%) in a different subsample (CFA 2) of equal power (40%). This model also showed a good fit with all goodness-of-fit measures in acceptable values. This unidimensional solution is a widely validated factor structure for FS in the western cultures like Italian (Giuntoli et al., 2017) and Portuguese (Silva & Caetano, 2013), Asian cultures like Japanese (Sumi, 2014) and Indian (Singh et al., 2017) , or Eastern cultures like Iranian (Khodarahimi, 2013) . Thus, this unidimensional structure has been successfully adapted in both collectivistic and individualistic cultural contexts (Hofstede, 2001; Triandis, 1995) .
In the next phase of the 3-faced construct validation method, we used the optimal FS model, cross-validated in the CFA 2 subsample as a baseline model to evaluate strict measurement invariance across gender over the entire sample. We evaluated in turn configural, weak, strong and strict measurement invariance using nested models. The configural, weak, and strong measurement invariance were supported. Strict measurement invariance was partially supported. Measurement invariance is a very important property for a measure because it suggests that no measurement bias exist when measuring males and females (Damasio & Koller, 2015) .
Additionally, we evaluated the reliability and validity of FS with the following three measures: 1) Cronbach’s alpha (Cronbach, 1951) to assess internal consistency; 2) Omega Total coefficient (ω total; McDonald, 1999; Werts, Lim, & Joreskog, 1974 ) to examine construct reliability (Hoque et al., 2017) ; and 3) Average Variance Extracted (AVE; Fornell & Larcker, 1981 ) to evaluate convergent validity. Internal consistency reliability of the Greek adaptation of FS was adequate, achieving a value ≥ .80 (Nunnally & Berstein, 1994) . Cronbach’s alpha was comparable to the values reported by Diener et al. (2010) and other studies (e.g. Singh et al., 2017 ). Omega reliability was equally satisfactory. This was not the case with AVE. Nevertheless, FS was designed to measure a different mental well-being dimension per item (Diener et al., 2010) , so a low AVE is probably not surprising, because AVE is an indicator of converged validity (Malhotra & Dash, 2011) .
Convergent and discriminant validity were evaluated using 12 measures divided in five groups: 1) mental distress with the 3 factors of the DASS-21 (Lovibond & Lovibond, 1995) and the DASS-9 (Yusoff, 2013; Kyriazos et al., 2018a) , namely Depression, Anxiety and Stress; 2) well-being, including Warwick-Edinburgh Mental Well-being Scale (WEMWBS; Tennant et al., 2007 ), Mental Health Continuum-Short Form (MHC-SF; Keyes, 2008), and Satisfaction with life Scale (SWLS; Diener et al., 1985 ); 3) Affect measures comprising the Scale of Positive and Negative Experiences (SPANE; Diener et al., 2010 ) and SPANE-8, a briefer version of SPANE (Kyriazos et al., 2018b) ; 4) positivity scales comprising trait HOPE (Snyder et al., 1991) , Brief Resilience Scale (BRS; Smith et al., 2008) , Meaning in life Questionnaire (MLQ; Steger et al., 2006) and Gratitude 6 Questionnaire (McCullough et al., 2002) ; and 5) Quality of life dimensions by WHOQOL-BREF (WHOQOL Group, 1998a, 1998b) . FS showed moderate to strong negative correlation with mental distress dimensions, with Depression at the highest negative value. FS had on average strong correlations with Well-Being Scales. The relationship between FS and MHC-SF (Keyes, 2008) and between FS and WEMWB (Tennant et al., 2007) had the strongest magnitude. Concerning the correlations between FS and affect measures (SPANE; Diener et al., 2010 ), they were on average of moderate to strong magnitude. Positivity measures of hope, life meaning and gratitude had a positive correlation with FS ranging from weak (MLQ Search for meaning in life; Steger et al., 2006 ) to strong (MLQ Presence of meaning in life and Trait Hope; Snyder et al., 1991 ). Finally, the correlations of FS with dimensions of life quality (WHOQOL-BREF; WHOQOL Group, 1998a, 1998b ) were on average strong. Generally, the magnitude of the positive correlations ranged from low (MLQ Search for meaning in life; Steger et al., 2006 ) to strong (MHC-SF by Keyes, 2008 and WEMWBS by Tennant et al., 2007 ). Similar results were reported by other studies (Singh et al., 2017) .
Moreover, two well-being models were evaluated. First, we replicated the Tripartite well-being model. This model had FS in one factor, SPANE Positive Experiences on a second factor and SPANE negative Experiences on a third factor. This composite structure has also been evaluated using FS and SPANE (Diener et al., 2010) by Singh et al. (2017) in an Indian sample and by Howell and Buro (2015) and Giuntoli et al. (2017) in an Italian sample. The models had an acceptable fit with a factor intercorrelation below the cutoff value of .80. These results support the hypothesis that mental well-being can be represented by three distinct but related dimensions using FS to measure mental health. This is in line with previous studies using FS (Singh et al., 2017; Howell & Buro, 2015) and with other measures of well-being too (Joshanloo, 2017) .
Second, the Two Continua well-being model was evaluated. Originally attributed to Keyes (2002, 2005) , this model suggests that mental health and mental illness are not opposite poles of the same dimension but two distinct but related constructs. Essentially this is the core assumption of Positive Psychology, that wellbeing is not merely the absence of ill-being (Seligman & Csikszentmihalyi, 2000; Seligman, 2002; Seligman, 2011 ), i.e. it is not a zero-sum game. The model here had DASS (Lovibond & Lovibond, 1995) in one factor as a mental distress dimension and FS as a well-being dimension in a second oblique factor. This model showed good fit suggesting that the Two Continua model is tenable in this cultural context using FS to measure mental health. Similar models―using MHC-FS as the mental health dimension―have been proposed by Keyes et al. (2008) and replicated by Petrillo et al. (2015) , and Perugini, Iglesiaa, Solanoa, & Keyes (2017) .
5. Conclusion
Finally, the general conclusion of this work is that the unidimensional structure of FS established by Diener et al. (2010) is confirmed in the Greek cultural context. The FS showed satisfactory psychometric properties and it is a reliable and valid well-being measure. FS Greek, as proposed by Diener et al. (2010) can complement the existing subjective well-being measures, as a brief measure of eudaimonic well-being. Another important finding is that the Greek adaptation of FS is gender equivalent.
However, this study has limitations. First, psychology students were involved in the data collection. The effect of this process, if any, should be taken into account and any generalization of the results should be made with caution. Of course, adequate sample size minimizes these potential effects. Moreover, error covariances used in optimal model possibly suggest an overlapping content of the items (Brown, 2015) . A similar issue was reported by Singh et al. (2017) for the Indian adaptation of FS. Nevertheless, FS was designed to measure multiple well-being dimensions (different in each item), so error covariances are possibly expected. Besides, this model was validated further in two different subsamples as suggested by Byrne (2010) and Brow (2015). All the above limitations considered, in the present study reports strong evidence for the construct validity, measurement invariance across gender, reliability, convergent and discriminant validity of the FS, Greek version.