Participants and procedures
Data for this study comprised two datasets: the first one was collected from January to December 2010 (n = 216) [23]; and the second one was collected from April to June 2014 (n = 86). These two datasets together constituted a convenience sample of 302 patients who were seeking treatment or were in treatment in one basic-care or two specialized-care services that are part of a public rehabilitation network in Brazil. To be eligible for inclusion, participants had to have an orthopaedic or a neurological health condition (acute or chronic), be at least 18 years old, and be able to understand and answer the interview questions. There was no upper age limit or other limitation in the type of health condition. All participants were informed about the study and gave informed consent. Then, the participants were interviewed using a socio-demographic questionnaire and the P-Scale. The Ethics Committee from Universidade Federal de Minas Gerais, Brazil, approved the study (n. 426.982).
The interviews were conducted by one of the authors (FCMSD or MAPS). Prior to data collection, both interviewers did a careful reading of the P-Scale manual, and any doubts or issues were discussed with a third researcher (RFS). Additional questions not solved among the researchers were emailed to the P-Scale author (Wim van Brakel). In the 2010 dataset, the interviews were conducted using a former version of the scale [10], while in the 2014 dataset, the more recent P-Scale available was used [24]. These two versions are essentially similar, with the main distinctions being the sequence of the items and the replacement of one item (i.e., Item N16 – In home, are the eating utensils you use kept with those used by the rest of the household? in the former version was replaced by Item N10 – Do you have the same opportunities as your peers to start or maintain a long-term relationship with a life partner? in the latest version). For this study we used only the items present in both versions.
Participation scale
The P-Scale has 18 items, in which the person is asked to respond whether they perceive their level of participation as equal to their “peer” in each of the situations described by the scale items. If the person considers that his/her level of participation is lower than that of his/her peer, representing a possible restriction to participation, he/she is also asked to indicate to what degree this is a problem in his/her daily routine [10]. The individual’s score on each item can be “no problem” = 1, “Small” = 2, “Medium” = 3, “Large” = 5, or 0 (zero) if the individual does not consider his/her participation less than that of his/her peer. To obtain the total score, values attributed to each item are added. The P-scale total score varies between 0 (zero) and 90, where 0 = “no restrictions on participation” and 90 = “complete restriction in participation” [10].
Van Brakel et al. (2006) performed an initial validation study, in which the P-Scale showed a Cronbach’s alpha of 0.92, an intra-class correlation coefficient for intra-interviewer stability of 0.83 and an inter-interviewer reliability of 0.80; 90% of variability was explained in the first factor in Factor analysis [10]. Other studies that also conducted Factor analyses found a better fit to the two-factor model (“work-related participation” and “general participation”) [11, 25], although these factors can also be part of an unidimensional scale because they were strongly correlated with each other [25].
Data entry and statistical analysis
The Statistical Package for Social Sciences (SPSS) – v.16 was used for data entry, dataset management, and descriptive statistics. After the data entry, the dataset was double-checked by the researchers FCMSD or MAPS. The final dataset did not have missing data. Rasch analysis was conducted using the Winsteps software – v.3.81.0.
Individual item and person fit were analysed using Infit and Outfit statistics to indicate how well data conformed to the Rasch model. For each one of these fit statistics, Winsteps provides Mean Square (MNSQ) and Z-Standardized Scores (ZSTD) [21]. As a general rule, it is recommended to begin fit analysis by looking at Outfit before Infit, and MNSQ before ZSTD. The expected value for MNSQ is approximately 1.0, and values between 0.5 and 1.5 are considered productive for measurement [26]. If the MNSQ value is beyond this range, ZSTD must be checked – ZSTD values of 2.0 or more indicate statistically significant model misfit [26]. If misfitting items or individuals are found, an iterative process proposed by Linacre (2010) may be used to address these issues [27]. Analysis begins by deleting the “really really bad” items and individuals, and then the analysis is run again. In the next step, the “really bad” items and individuals are further deleted, and the analysis is run again while results are compared to the previous step. The process continues until the most adequate statistics are obtained.
Unidimensionality was examined with Principal Component analysis (PCA) of the residuals. In unidimensional measures, it is expected that the observed variance explained by the measures roughly matches the expected variance in the model. In addition, PCA analyses the components in the correlation matrix of the residuals (called contrasts). The “first contrast” is the component that explains the largest possible amount of variance in the residuals [26]. If the unexplained variance found in the first contrast is up to 2.0 Eigenvalue, the biggest possible secondary dimension has the strength of less than 2 items. The decision to consider a measure as unidimensional or multidimensional is usually made by the researcher according to the purposes of the test. However, unexplained variances in the first contrast greater than 2.0 Eigenvalue may indicate the presence of a second dimension [26].
Reliability was evaluated using the indices provided by Winsteps: person separation index, person reliability, item separation, and item reliability. The separation indices give an estimate of the spread of items or individuals along the continuum of ability and reflect the number of distinct strata in which the sample or items can be divided [28]. The reliability reports how reproducible the person and item measure orders (i.e., their locations on the continuum) are [26]. A person separation index of 1.5 or a person reliability coefficient of 0.7 represent an acceptable level of separation and is considered the minimum required to divide the sample into two distinct strata (i.e., low and high ability) [29], while a person separation index of 2.0 and a person reliability of 0.8 represent a good level of separation and are considered the minimum preferable values [26]. Item separation index and item reliability are interpreted using the same criteria. According to Rasch guidelines, if the item reliability and separation are below the required values, a bigger sample is necessary; if the person reliability and separation are below the required values, the test needs more items [26].
Differential Item Functioning (DIF) was explored in two sub-groups of individuals categorized by gender (men and women) and duration of symptoms (acute or chronic – defined as more than 6 months of symptom duration). The presence of noticeable DIF was defined by two criteria: 1) DIF contrast >0.5 logits, and 2) significant enough not to have occurred by chance (t > 2.0) [26].
P-Scale item difficulty and person ability were plotted graphically in a person-item map. The person-item maps (also called Wright Maps) allow the visual analysis of the relationship between the measures of individuals and items. The use of these maps assists in the assessment of positive and negative issues, such as item redundancy (i.e., items at the same difficulty level), trait gaps (that may indicate the need of more items to fill the gaps), ordering of items matching the prediction of the test author or users (i.e., construct validity), and targeting between the items and sample (i.e., whether item difficulty range matches the sample ability range) [21].
Sample size
The sample size (302 participants) was adequate for Rasch analysis. According to parameters defined by Linacre (1994), a sample size of 243 respondents may be sufficient to provide 99% confidence of the person estimates being within ±0.5 logits [30].