Telecare and telehealth: critically examining the evidence
Mike Clark reports on a presentation by Anna Davies and Professor Stan Newman
On 10 December 2009 at the WSDAN event held in Manchester, Professor Newman provided a presentation examining the issues involved in planning and evaluating telecare and telehealth. This feature reports on the follow-up presentation, made at our event held on the 11 February at Stansted, on how to make critical judgements about the evidence and what it is really telling us about the impact of telehealth and telecare.
Both presentations are available on WSDAN’s past events pages
Planning an evaluation of telecare and telehealth
Professor Newman recapped key points from the Manchester presentation (click here to access slides). This included an understanding of the hierarchy of evidence, how to structure and design a research study, and the mechanisms you can use to do so. He also explained how using less than a ‘gold standard’ design can still provide very useful (qualitative) information, asking somewhat different questions – for example: ‘What are the nature of the problems in adoption?’, ‘How can they be overcome?’ and ‘How do people view the technology?’.
The randomised controlled trial (RCT) is the ‘gold standard’. You randomise people. Some receive the intervention and others do not. Then you look at the impact of the intervention, which is exactly what we are doing in the Whole System Demonstrator (WSD) programme. It is a randomised controlled trial where half the people receive the intervention and the other half do not. We are evaluating it on many levels, including cost savings such as reductions in hospitalisation. We are also evaluating it from the patients and participants’ perspectives, in both social care and health care. We are looking at people’s experiences of telecare and telehealth, its impact on their quality of life, their psychological well-being, their behaviour – all sorts of assumptions that are found in the literature.
The trial is complex, costly and time-consuming, as the three WSD pilot sites (Kent, Cornwall and Newham) will attest. There are 6,000 participants randomised with three studies – one study on telehealth, one on telecare and one on carers (or caregivers).
There are some advantages in having a big study with a control group that follows people over time. Indeed, one of the things we suggested in Manchester (December 2009) is that you can develop and design local research studies that that can be directly compared to the control group developed for the WSD Pilots. We can advise on how to do an evaluation at your own site, and we can give you some comparable data and the instruments that you might use to do the evaluation.
When you are planning to implement a study and you want to audit or evaluate it, start at the very beginning, thinking about the evaluation as you attempt to present the study. This will help you plan what you do next. ‘In whom it works’ is perhaps the most crucial question that we need to consider.
Much of the literature and data are about groups in general, and there is a question about individual differences. People clearly have strong attitudes towards devices and we need to find a way of identifying ‘in whom it works’ – what are the attitudinal barriers, and are there proven techniques for dealing with them?
Stan Newman introduced Anna Davies.
Aspects of evaluation quality
It is important to evaluate or critically review the evidence that you might come across – casting a critical eye over it.
In Manchester, we defined a good-quality evaluation as ‘an evaluation that provides an accurate or truthful picture of what is actually happening’.
There is a hierarchy of evidence, from systematic reviews and meta-analysis, through to randomised controlled trials, to case studies and expert opinion (the latter being more subjective).
Key aspects of evaluation quality include:
1. Clear objectives and questions
2. Clear statement of methods used to answer them
- Study design, comparison groups guided by questions
3. Sampling
- Representative: do people being evaluated represent those that will eventually receive telecare or telehealth?
- A sufficiently large sample – power to detect an effect
4. Measures used
- Comprehensive, valid and reliable – established measures
5. Confounding variables identified and controlled for
- Factors other than intervention that may account for findings
6. Appropriate analyses to answer the questions
7. Appropriate conclusions, feasibility of findings, and extent to which they can be generalised (see National Institute for Health and Clinical Excellence (NICE) Centre for Public Health Excellence (CPHE) manual, or the Critical Appraisal Skills Programme (CASP) for additional quality criteria).
Economic evaluation: the Bergmo study
One area that people are particularly interested in is ‘making a business case’ or ‘economic evaluation’. Does telecare and telehealth provide as much or greater benefit as current standards of care, and how much does it cost compared with the current service provision?
A recent article by TS Bergmo (2009) looks at the quality of the evidence (see Bergmo TS (2009) 'Can economic evaluation in telemedicine be trusted? A systematic review of the literature')
In Bergmo’s study, telemedicine was defined as ‘real time’ or ‘store and forward’ data transfer – video conferencing for consultations, emailing of x-rays, and sending information on glucose monitoring via personal digital assistants to doctors to be assessed.
The review aimed to look at both the costs of providing these services and other outcomes not related to costs (eg, quality of life, whether people were more depressed, etc).
Out of 790 possible studies that included an economic analysis of these types of devices, only 33 provided an analysis of both the costs of providing the devices and the outcomes.
The studies varied in terms of the interventions they covered and the conditions for which they were applied. There were different patient groups (people with diabetes, cardiology patients) and the types of technology involved various combinations.
Out of the 33 studies that analysed both costs and outcomes, only six fulfilled the author’s criteria for a good-quality evaluation. This really limits what we can learn from these evaluations.
Instead of describing the outcomes of these evaluations, the author thought it more appropriate to focus on their quality to try and help others carry out better-quality evaluations.
The author asked these questions:
- Were there clear statements of the methods used?
- Were the measures used appropriate?
- Were the analyses appropriate?
- Were the conclusions mapped onto the findings that were described in the studies?
In relation to methods, the studies were very varied: 50 per cent were RCTs and some were poorly designed or poorly described. In a couple of these studies, two groups were mentioned, but they did not say how they came up with the groups or what these groups were doing. There were six decision-modelling studies – this is where secondary data are used in a mathematical model to generate information to make decisions. Very few described the model they were using and very few described how they got the data to populate their model. However, in general, the comparison groups were clear.
In relation to the measures that were used, the cost measures varied markedly, were poorly articulated, or not even described. Unfortunately, this makes it very difficult to make comparisons between the studies. Also, if you are looking at the study yourself, it is not possible to make it applicable to your own setting and the kind of things that you are interested in looking at.
On the other hand, the non-resource outcomes were very varied but they were clearly described. The studies considered areas like health-related quality of life, quality adjusted life years (QALYs), diagnostic accuracy, glucose control and psychological outcomes (eg anxiety and depression).
A further issue was the appropriateness of the analyses used.
This next point relates to the mathematical modelling studies. A hallmark of a good economic evaluation is a sensitivity analysis – that is, whether variations in the characteristics of the studies affect the findings.
For instance, you might be interested in finding out whether commercially funded research results in different outcomes from non-commercially funded research. Fewer than 50 per cent of the studies carried out this type of analysis.
In addition, very few studies discussed the extent to which their findings could be generalised. So it is not clear how you can apply them to different contexts. Are they going to be applicable to your setting or to the type of clients or patients you are seeing? The effectiveness of interventions is affected by the types of people you are studying, the technology that you are using, and also by local procedural and organisational factors.
The following conclusions can be drawn.
- Several points relating to quality assessment criteria were not addressed in the studies considered by Bergmo’s literature review.
- This limits the ability to consider their findings to be an accurate representation of reality.
- A systematic review is only as good as the studies it includes.
A social care study
We now turn to a social care study on carers or caregivers, entitled ‘The effect of home-monitoring technology on reducing burden in caregivers of older adults with disabilities’ (an unpublished 2006 PhD thesis by L. Russ, an occupational therapist, from The University of Buffalo, USA). This has clear strengths and weaknesses but it is also quite unique, in that much of the research that looks at outcomes for carers is qualitative. There are very few studies that look at quantitative outcomes – less than 10 worldwide.
The thesis by Russ is similar in design to those available in the published literature. The study design is a quasi-experiment – the people receiving the telecare (and their carers) being randomised to an intervention or control group. The research assessed people before they received the telecare intervention, and again after 12 months.
The study aimed to involve more than 100 people who were carers of people who were frail or elderly but did not have a cognitive impairment. They did a power calculation to find out how many people they needed to detect an effect of the intervention – 34 people were required in each group.
The intervention was the Personal Assistance Security System (PASS) – a pendant alarm that dialled the carer or, if unavailable, one of four stored numbers (family members or friends).
In the study, they measured depression (CES-D10), which is also used in the WSD programme, as well as carer burden, satisfaction with caring, etc. They also measured people’s subjective evaluation of the system, but did not describe how they did so. Age, ethnicity and other demographic variables were noted.
Summary of measures used in the study:
- depression (CES-D10)
- burden (Zarit Burden Interview (ZBI))
- satisfaction with caring (Picot Caregiver Rewards Scale (PCRS))
- subjective evaluation of PASS (not described)
- age, ethnicity
- care-situation characteristics (eg, care-recipient health).
Analytic methods: Analysis of Variance (ANOVA)
Findings included:
- depression (CES-D10): No change in depression over time for either group when demographics controlled for
- caregiver burden (ZBI): No change in either group
- caregiver satisfaction (PCRS): No change in either group, but trend for lower satisfaction in PASS group.
However, it should be noted that carers evaluated PASS positively (subjective).
Strengths of the study included:
- clear research objectives and appropriate study design
- but it was unclear what care the control group were receiving
- good quality measures: valid, reliability reported
- potential confounders identified included demographics, care-recipient factors
Weaknesses on sampling in the study included:
- power calculation required 34 per group; final sample comprised 19 in control group, and 31 in intervention group
- insufficient numbers to detect the expected effect
- no information about:
- those who agreed/ did not agree to participate
- those who did and did not complete all evaluation materials
Weakness on analysis and confounders in the study included:
- analysis looked at change in each group separately
- whether each group changed over time. Appropriate analysis would look at change in groups relative to one another
- did not control for all confounders (care-recipient factors)
- secondary analysis carried out: people bought commercially available equipment.
It is better to analyse in assigned groups: intention-to-treat (ITT) analysis is a better indicator of effectiveness in real life.
For evaluations, it is important to look beyond the report summary.
Overall assessment/summary of this evaluation:
- some serious shortcomings jeopardise the conclusions of the study
- small sample
- weak analytic methods
- conclusions not supported by primary data analysis
- unable to make conclusions about generalisation of findings: not clear whether differences existed in those that agreed and did not agree to take part.
Take-home messages
In considering the two examples studied, there were a number of take-home messages:
- avoid taking evaluation findings at face value – look beyond the summary
- evaluate primary studies
- checklists are a useful tool for doing this
- review findings are only as good as the studies that go into it.
Questions and discussion following the presentation
Martin Scarfe, Newham WSD
I have read the Bergmo study and do not know which of the six economic studies I can take as reliable evidence. Which of the six studies can we rely on from the Bergmo paper?
Stan Newman
The clear issue is: for each study there are ways of evaluating the quality of the primary studies, and there is a way of evaluating it to determine which are the best studies. We can provide you with the different ways in which quality of studies is evaluated, especially when they go into a systematic review that sits at the pinnacle of evidence – it is bringing together lots of studies. There are weaknesses of course – studies are different, samples are different and interventions are different.
Richard Giordano, The King’s Fund
Part of what we do at The King’s Fund is to help people to use evidence and the evidence from the peer-reviewed literature in order to make decisions. In Bergmo, there were around 130 peer-reviewed articles. Of those, only six really satisfied all the criteria that were set out for a well-formed evaluation, if I remember correctly. Was there a cluster of journals that tended to publish better-quality evaluations than other journals?
Stan Newman
There is a cluster of journals. One of the other issues is that in the systematic review, you are able to establish and generalise to the number of studies that have not appeared in the literature. The peer-review process is really quite a rigorous approach where studies are subjected to a peer review and people are asked to alter them to make sure they are very clear. Some journals do that better than others. But the value of doing it in that way is that you end up with a much better article in the end. There are some journals that are very good, and there is a hierarchy of journals.
Louise Perkins, Suffolk
Are there any results from our own WSD RCTs as yet?
Stan Newman
We have just sent out the final list of the three-month assessments for the sample. You have to be careful not to analyse the data as you go along. The three-month data have just been accumulated. They will, however, come with a health warning. As you all know, people like technology when they get it early on, use it, find it very attractive and then they grow tired of it (‘the exercise bike phenomenon’ – great purchase over Christmas, avid cycling over January, reducing over February, and by Easter the bike is occupying too much space and is put in the garage or sold).
The same is true in evidence overall about the use of technology. The new, existing element of it very often is not done unless it is formally incorporated into people’s lifestyles. Here is an issue about implementation – how you integrate these devices both into care packages and pathways but also into people’s lives. The analogy we use is people with congestive heart failure (CHF) who have to weigh themselves every morning – it needs to become like brushing your teeth.
None of us worry about whether we should or should not brush our teeth in the morning. It is the integration into lifestyle that is the real challenge. Without that, it is seen as an add-on, a chore (like the exercise bike). Often, technology tends to drop off and not be used.
The real data come in at the 12-month follow-up. We will then have a pattern of usage over time, an attitude towards the devices – their impact on self-care behaviour and on quality of life. If we do find that these devices bring about change that persists over 12 months (albeit that people stop using the kit) you have a different model about how you might use some of the kit. You may, for example, find that you give it to people for three months and then you say ‘let’s take it away and move it on to somebody else’ and ‘let’s see whether they have got their behaviour right’ in terms of, say, monitoring their diabetes, and they don’t need close scrutiny. That’s why we need to think carefully about segmenting the population that we are going to look at. Their needs will differ and change over time and that will affect the model we have about introducing the kit and care.
Gary Raynor, Essex
I was interested in your exercise bike analysis, having almost been there. We have just done a telephone survey of about 240 users out of 2,000 in the sample. One of the outcomes is that people are more satisfied the longer they have had the service. Those who have had it for only a very short time are a bit ambivalent about it, but with those who have had it for longer you can see quite a steady progression in the percentage satisfaction.
Stan Newman
That is a really important point and you need to incorporate that with the people who refuse it and the people who ask you to take it away. What you are left with is a group of people who are persisting to use it and are more likely to integrate it into their lives. So it’s a selector group and that’s why you need to look very carefully at those people who choose not to have the kit or, once they have got it, say ‘not for me’. There are lots of reasons why people reject kit. A lot of people are fearful about the loss of a face-to-face exchange; a lot of them feel they are not ill enough and feel stigmatised by using it. There are all sorts of patient-driven factors but you have a reduced sample of the long-term users and it raises a question about whether you can identify people, which we plan to do in the WSD study. Do we have early identifiers at the time they enter the study that predicts who is going to stick with the kit?
Wendy Hardicker, NHS Norfolk
Just picking up on that point, I think the assessment and the reason why you are actually using, whether it’s telecare or telehealth, is the pivotal point here. From a commissioning perspective, our organisation would be bankrupt if we were putting all (telehealth) equipment into a person from the time they were assessed right the way through to the time they did not need it any more. It has to be part and parcel of an ongoing assessment and evaluation in terms of what patients need and how those needs will change. They may get so used to understanding how their trends and disease processes are working that you would want to remove it, and there is a whole range of evaluation measures that you would put into that assessment. So, I would never anticipate us putting something in long term until we had got that robust evaluation on an individual basis right.
Stan Newman
You might find that will differ. It is one of the reasons why we have two follow-on time points in WSD, to look at the changing pattern and that kind of assessment. If you take somebody with dementia and a dementia carer, you might see that the piece of kit is not going to be moved while the person is in-house. If you take somebody with diabetes who is not well controlled, you might find that they learn the techniques, adapt their behaviour, have behaviour change and after a period of six months maintain their HbA1c at 6.5. There is a time that you might want to reconsider and, indeed, they may want to reconsider whether they need it. And that is the kind of issue about thinking clearly about patterns of use, trajectories of conditions, changing needs, and the need maybe to accentuate other areas of telecare and telehealth for particular individuals.
Sheena Hobbs, Newham WSD programme
Are you going to measure the dropouts against all three sites – in other words, those who drop out and why they have dropped out? And are you measuring why one site might have more dropouts – what are the sites doing differently as to what might cause the dropout rates?
Stan Newman
The simple answer is ‘yes’. The more complicated answer is that our brief is to look across the three sites. I think it is important to distinguish between two issues around dropouts and the stage at which rejection of the device occurs. For example, someone agrees to see the device but they say it is not for them. We have done some assessments. We have qualitative studies on people who rejected it at the early stages. Then the next interesting question is people who have direct experience of the device and ask for it to be removed or indeed stop using it. We will have all that information around what people did. Again, we have qualitative studies about the reasons why people rejected the kit and we have a whole host of data.
We are going to integrate the qualitative data (the reasons people give) with who drops out, when and where. We will also have that data by site. The other question is, ‘Can we identify predictors of people who are more likely to reject it than others?’ We have some suspicions, some little glimmers from the literature – people with milder conditions, as you would expect (eg, a diabetic who is working particularly well, quite robust in the way they are handling their condition); and those who don’t feel they need to have it and reject the idea of their illness as something to be ignored often won’t accept the kit.
Interestingly, it is often family members that put pressure on people to have the kit. So it is both two-way around people who reject it and indeed people who stick with it and continue to use it, as well as who gets benefit and at what level. For example, you might get a quality of life benefit, but you might not get a cost benefit, and in this case, is the pay-off worthwhile? You might get a clinical benefit but a poorer quality of life benefit. The outcomes are very complex when you look at them in a multivariate way, of the range of different benefits and the different types of outcomes.
Maggie Ellis, London School of Economics (LSE)
The one thing that Brussels does know about is the UK Telecare Business Case Toolkit and they are pushing us to use that very strongly in our project. That is an estimator of whether a system will be cost-effective or not and the potential savings that can be made. I am concerned about this because I think it is quite a complicated toolkit, and demands quite a lot of time. I am desperate to meet somebody who has used it. Although there is a list of boroughs that have used it, I haven’t been in a room with anyone that has – if there is anyone here, I would like to know. Any comments from you about that toolkit and any others you can suggest.
Stan Newman
I do not claim expertise in economic toolkits. I can tell you we have a clear comparison on both the direct benefits that you would get from having the kit and economic costs and a cost-benefit analysis in terms of the returns. This is being run by the Nuffield Trust and LSE. They are really doing that kind of analysis.
One of the things we are looking at is what is the cost of getting a quality of life improvement? How much additional cost do you need to have for getting an improvement in a clinical outcome?
It is looking at all of those other kinds of benefits you get and the costs that you have to put in to get them. It’s a range of different questions and a range of different ways. We are also looking at the key data set: ‘Do we have fewer hospital admissions?’ Here is a very interesting question. We know that hospital admissions are very costly and the footprint of hospitals needs to be reconsidered, and we need to think about the costs that go into it. What we currently do is quite an issue. If we put the devices in the community and we monitor people more closely, does it lead to increased care or a reduction of care? Indeed, is it better but is it also more costly? Those are some of the questions that we hope to answer.
I know you all have business cases to consider and I think that there is a real issue around the quality of evidence. You will allow me to be somewhat sceptical – the quality of evidence that is available, even on the economic case, to support a very clear business case… What I think that really means is that I would encourage you to think carefully about audit and evaluation. I think that you need that cycle – introduce something new, evaluate it, feedback, and improve your service. It is that feedback loop that will allow us to learn about the services and, indeed, improve them.
Peter Range, Home Telehealth Limited
As a business, we have been managing nurse-led disease management programmes for two and a half years. Wendy’s question was quite interesting in terms of how do you determine the episode of care for monitoring. We would be happy to share with any commissioners in this room our evidence over the last two and a half years. We have some 50,000–55,000 patient interactions a year with different trusts with COPD, diabetes and heart failure. We know, for instance, the evidence of a number of patients you might have on a 12–13 week episode of care and how they fall off into perhaps a second episode. Many of those patients are stable, lead a better quality of life and can manage their medication, etc. Typically, on a COPD patient cohort, you may have 20 per cent of those re-offending and need a second episode of care and maybe then 5 per cent fall out into a third episode. We’ve got a lot of data because most systems in telehealth don’t analyse that sort of stuff. They have a traffic light solution but don’t analyse the utilisation of the service. They don’t analyse the important stuff that you want to collect in terms of what are the savings to the trust in terms of hospital admissions and bed days.
We’ve been collecting that data because we have had to develop a third-party solution ourselves – a software package that collects those statistics. We’ve got a lot of valuable data, which we think is fairly unique. It is a bit of ‘middleware’ (a third party solution) which doesn’t exist commercially so we have had to create it to give reports back to primary care trusts (PCTs). We would be happy to share that with you and your team, and be happy to share it with any commissioners, because it does collect data that most telehealth systems don’t and we can suck data out of telehealth solutions into this package to analyse information and statistics, which is important to commissioners who really want to know the facts.
Stan Newman
Thank you very much. Indeed, that would be very helpful and the simple answer to that is ‘yes’, that would be very good to do it, and one can actually reference that data and make comparisons. We are collecting data from the devices in the WSD programme. So we need to know when and how often alerts occur, and we can look at that in terms of severity measures and impact on quality of life and, indeed, extrapolate from there to the costs of the intervention effectively by individual patient. That is going to be painstaking and even slower than the main results.
Stephen Pattenden, The Application Home Initiative (TAHI)
Thinking about devices that you might use for intervention, do you have a hierarchy or shopping list of things that you would find useful upstream for specific interventions? What would, for instance, be useful for everyone over a certain age to make weighing easy instead of having to step on your scales? Perhaps if there was a mat on the floor that would automatically weigh everybody so that you would know roughly what their body mass index (BMI) was and if there were multiple sensors you might be able to pick up balance as well. Are there obvious devices that would be useful in the home space that everyone would have and therefore they would be cheap because they would be mass market?
Stan Newman
I think the real question about generating lots of data is who is going to look at it? I think the real issue is to balance that. I think that a combination of telehealth and telecare devices along with self-management is useful. If you take, for example, how much somebody weighs, that is a very useful indicator for people to have and to record that very easily. We do know we have got fantastic devices, because we have used some of them. There is a life vest that you put on and that measures your stuff constantly. We did a big study looking at hypoxia and basically people wore these life vests and we were able to assess them. There are very good sensors that you can have in beds but I think there is a real problem about lots and lots of data, who looks at it and what value it has.
I think you are describing a very interesting and potentially valuable application of the use of data. People have blood pressure monitors at home. People weigh themselves regularly at home, making that nice and simple, and I think that one of the issues that we need to consider is how you accumulate the displays to show people their behaviour over the long term, and indeed how they impose goals and targets on their behaviour on those measures would be extremely useful. But we need to think quite simply here and it seems to be a situation that people can volunteer to do. If you take BMI, a potentially useful measure or hip/waist ratio – being able to take that routinely and for people to have it displayed on their TV screens or automatically ‘Bluetoothed’ to their computer might satisfy some people. Other people will find it absolutely appalling. That is the real dilemma about it; some people don’t want to know and, indeed, one can have a certain degree of sympathy with them.
Mike Clark, WSDAN
Can I just see how many people are currently doing evaluations or audits or extended monitoring of their service? Do any of you want to say a bit about what you are doing?
Jane Crawford-White, Cambridgeshire Community Services
What we have done is an annual audit of all of our telecare users. The audit has involved getting feedback from the service user as well as their carers through a questionnaire. If they don’t complete the questionnaire, it would be a telephone interview. The evaluations also included an audit of the case notes – and this is multidisciplinary case notes – to see if there are any episodes where we feel we can demonstrate that care costs were avoided, potentially admissions to hospital were avoided, respite events were avoided. So it is looking at that story based on the case note audit and then identifying the potential avoided costs for the investment we have made. The feedback evaluations are extremely positive and we do have some people four years later still using the telecare that they were issued originally.
Gary Raynor, Essex County Council
We have just completed a small research project. Essex operates a series of pledges every year designed to make it the best place to live and work in the country. We have seen a lower take-up of the telecare pledge, which is where we are offering people a 12-month free service if they are over a certain age. It is fair to say that we have not had the response we had hoped for, so we commissioned a small telephone survey to find out why.
We have just got the results and there are some interesting points in there. We have also commissioned York University to do a long-term study of the pledge recipients to see if we can come up with some economic models of cost benefits. But we understand that they would be lower than what we call mainstream telecare because it is a prevention initiative and therefore you won’t get that immediate cost avoidance that you do get with some traditional telecare installations. That is probably 12 months away to do the survey, collate the data. This would be peer-reviewable research. There is not a huge body of reliable evidence out there – everybody can find something wrong with it. So we are hoping that we will be able to add to that as a benefit from our pledge.
Stan Newman
We would be very keen to hear about the studies that you are doing. Just to hear what you are doing and see how it feeds into the body of evidence. So I would be grateful if people would email me details and we would be happy to talk to you and help and advise in any way we can.
Mike Clark is co-project lead for WSDAN
Professor Stan Newman and Anna Davies are from the University College London. Professor Newman is the principal investigator for the WSD Pilot evaluation