ABSTRACT
Objective
This study aims to evaluate the educational quality and reliability of YouTube videos on laparoscopic hydatid cyst surgery (LHCS), focusing on various factors such as narration, subtitles, and user engagement metrics.
Methods
A cross-sectional analysis was conducted on 34 YouTube videos related to LHCS. Videos were assessed using Laparoscopic Surgery Video Educational Guideline (LAP-VEGaS), Journal of the American Medical Association (JAMA), and Global Quality Score (GQS). Parameters including video duration, presence of spoken commentary, subtitles, number of likes, total views, and average daily views were recorded. Statistical analyses, including descriptive statistics, correlation assessments, and linear regression models, were utilized to evaluate the impact of these factors on the educational quality scores.
Results
Videos with spoken commentary scored significantly higher across LAP-VEGaS, JAMA, and GQS. Subtitled videos showed a borderline significant increase in GQS but not in other metrics. Significant positive correlations were found between LAP-VEGaS scores and JAMA scores, GQS, annual likes, total views, and daily average views. Univariate regression analysis identified video duration and presence of spoken commentary as significant predictors for LAP-VEGaS scores. In multivariate regression, spoken commentary and upload time were significant variables influencing LAP-VEGaS and JAMA scores.
Conclusion
The presence of spoken commentary significantly enhances the educational value of LHCS videos on YouTube. While subtitles provide additional support, they are not as impactful as spoken commentary. Regular updates and professional production are crucial to maintain the relevance and accuracy of these educational resources.
Introduction
Hydatid cyst (HC) is a zoonotic disease that spreads throughout our country, particularly in our region. The most prevalent cause is Echinococcus granulosus. In humans, 70% accumulate in the liver, 20% in the lungs, and 10% in other organs (1, 2).
Although previously treated with laparotomy and total or partial precystectomy, HC surgery, like many other surgical procedures, is now being performed via laparoscopy. One of the most significant drawbacks of laparoscopic procedures for surgeons is the lengthy training period. Because the learning curve in laparoscopic surgery is longer than in open surgery, surgeons are increasingly using alternative training models, particularly for laparoscopic surgery. To accomplish this, they try to learn about the surgical procedure and shorten their learning time by watching surgical videos online (3).
YouTube is a platform that houses a massive video-sharing network. Today, this platform assists many surgeons in their training. However, because the published videos are not subject to supervision, it is unclear whether they are educationally valuable or an adequate educational resource. In particular, the presence of disparate information and applications in other videos about the same surgical procedures published on these platforms makes it difficult to locate the appropriate information and applications. Several scoring systems have been developed to demonstrate the quality and reliability of educational video content.
Educators from 26 international institutions created the Laparoscopic Surgery Video Educational Guideline (LAP-VEGaS) to standardize the quality of online educational videos on laparoscopic procedures (3). The Journal of the American Medical Association (JAMA) and Global Quality Score (GQS) scoring systems evaluate video reliability and content. Silberg et al. (4) designed the JAMA scoring system to assess the transparency of video sources and published data. It is used to identify untrustworthy videos of unknown origin. Bernard defines the GQS as a scoring system that categorizes videos based on their content (5).
Although there are several laparoscopic HC surgery (LHCS) videos on YouTube, no research has been conducted to evaluate the quality of these videos in terms of their contribution to surgical education. In this regard, our study is the first of its kind.
The purpose of this study was to evaluate the educational quality and reliability of LHCS videos on YouTube using the LAP-VEGaS, the JAMA scoring system, and the GQS system.
Methods
Study Design
We performed research on YouTube without making any changes in the normal search preferences and after selecting the “sort by relevance” option, using the keywords LHC and LHCS on December 15, 2023. Because YouTube is a public platform and no personal information is used, no ethics committee approval is required for the study (6). A total of 45 videos with at least 1,000 views were identified.
Inclusion and Exclusion Criteria
Exclusion criteria for the obtained videos included videos in which total cyst excision was performed, videos in which the entire procedure was not published, the presence of an accompanying surgical procedure, videos containing LHCS performed outside the liver, and repetitive videos. Only videos featuring English verbal narration or subtitles were included in the study. Furthermore, it was assumed that the educational value of published videos with a duration of less than 4 min would be insufficient, and videos with fewer than 1,000 views were excluded from the study because they were not popular among surgeons. The study included 34 videos in total, with the remaining cases being published that were liver-related and underwent laparoscopic surgery. All videos included in the study were liver-related LHCS predominantly demonstrating laparoscopic (partial) cystectomy/ pericystectomy techniques. Of these, 8 were uploaded by academic sources (university-affiliated channels or conference presentations), while 26 were shared via individual physician accounts.
Data Collection and Assessment of Quality and Reliability of Videos
The number of likes, dislikes, verbal or subtitled narration, video duration, time since the video upload date, the daily number of views, and the total number of views were recorded. Videos were evaluated using the LAP-VEGaS, as well as JAMA and GQS scores. LAP-VEGaS was created by educators from 26 international institutions as a video evaluation tool to standardize the quality of online educational videos about laparoscopic procedures (3). It enables video evaluation using nine parameters (Table 1). Each parameter is awarded 0 points if it is not presented in the video, 1 point if it is partially presented, and 2 points if it is fully presented. The total score ranges from 0 to 18. Videos with scores of 0 to 6 are of poor educational quality, those with scores of 7 to 12 are of medium quality, and those with scores of 12 or higher are of good quality. Silberg et al. (4) defined the JAMA scoring system to assess the video source’s transparency and publication information. It is used to detect untrustworthy videos with unknown origins. It has four criteria, each worth 1 point (Table 2). According to the scoring system, videos with 1 point are considered inadequate, videos with 2 to 3 points are considered partially adequate, and videos with 4 points are considered entirely adequate. Bernard defines GQS as a scoring system that defines videos based on their content (5). This scoring system assigns video scores ranging from 1 to 5 (Table 2). Videos were considered low quality (1 or 2), medium quality (3), or high-quality (4 or 5).
Before the videos were evaluated, three general surgeons (M.S.B., H.Y., H.E.) with experience in LHCS in our clinic discussed the evaluation criteria for LAP-VEGaS, JAMA, and GQS scores and developed a common standard. Then, two general surgeons scored the videos according to the guide, unaware of each other. In the videos, if there was a difference in scoring, a third surgeon’s opinion was sought.
Statistical Analysis
Statistical analyses were carried out with the Jamovi software package (version 2.3.28, The Jamovi project, 2023) and the Jeffreys’s Amazing Statistics Program software package (version 0.18.3, 2024). Descriptive statistics were used to summarize the study’s results. Results for continuous numerical variables were presented as mean ± standard deviation or median, minimum, and maximum based on distribution. Categorical variables were summarized using numbers and percentages. The normality of numerical variables was assessed using appropriate tests and visual tools, taking into account the sample size and data characteristics. When comparing small samples (n<50), the Shapiro-Wilk test was preferred. In addition, visual tools such as histograms and quantile-quantile plots were used to assess the assumption of normality. To compare differences in categorical variables across groups, the Pearson chi-square test was used for 2×2 tables with expected cell counts of 5 or more, as larger sample sizes provide more accurate results. For 2×2 tables with expected cell counts of less than 5, the Fisher’s exact test was preferred due to its higher precision with small sample sizes. In R×C tables with expected cell counts less than 5, the Fisher-Freeman-Halton test was used because it is appropriate for small samples. When numerical variables did not have a normal distribution and were compared between two independent groups, the Mann-Whitney U test was preferred. Spearman’s ρ correlation coefficient was used to assess the relationship between numerical variables that did not follow a normal distribution. In this study, univariate and multivariate linear regression analyses were used to identify factors that predict LAP-VEGaS score, JAMA Score, and GQS in LHCS videos. In univariate analyses, the impact of independent variables such as annual likes, video duration, average daily views, time since upload, presence of spoken commentary, and presence of subtitles on LAP-VEGaS score, JAMA Score, and GQS was assessed separately. β coefficients, 95% confidence intervals, and p-values were computed for each independent variable. In multivariate linear regression analyses, the combined effects of these variables were assessed while controlling for the impact of other factors, with β coefficients, 95% confidence intervals, and p-values provided for each variable. A p-value of ˂0.05 indicated statistical significance.
Results
This study included 34 videos about LHCS. The median time since the videos’ initial publication was 7.2 years. The median number of likes was 18.5, and the average number of likes per year was 2.4. The videos had a median duration of 8.4 min. The median total number of views was 2,400, with a median daily view count of 1. The median LAP-VEGaS score was 5.5, the JAMA score was 2, and the GQS was 3 (Table 3).
Videos with narration received significantly more likes (p=0.045), longer video durations (p=0.017), higher average daily views (p=0.042), higher LAP-VEGaS scores (p<0.001), higher JAMA scores (p<0.001), and higher GQS (p<0.001). Videos with spoken narration were significantly more likely to score 4 points in JAMA and 5 points in GQS (Table 3).
There was no significant difference between videos with and without narration in terms of total time on air, liking status, annual likes, total number of views, and subtitles presence (p>0.05). The GQS was marginally higher in subtitled videos (p=0.054). The LAP-VEGaS score was also higher in subtitled videos, although the difference was not statistically significant (p=0.076). Other variables such as the total time the video was on air, like status, number of likes, annual average of likes, video duration, total number of views, daily average number of views, JAMA score, JAMA score distribution, and presence of voiceover showed no significant difference between the groups (p>0.05, Table 4).
When evaluating videos on LHCS, the LAP-VEGaS score was correlated with the JAMA score (r=0.737, p<0.001), GQS (r=0.896, p<0.001), average annual rating (r=0.560, p<0.001), total number of views (r=0.423, p=0.013), and average daily number of views. The JAMA score and GQS (r=0.802, p<0.001), annual average rating (r=0.568, p<0.001), total number of views (r=0.533, p=0.001), and daily average number of views (r=0.539, p<0.001) showed significant and positive correlations. GQS also showed significant positive correlations with annual average number of likes (r=0.523, p=0.002), total number of views (r=0.391, p=0.022), and daily average number of views (r=0.500, p=0.003). However, there was a weak to moderate negative correlation found between GQS scores and the total time the video was on air (r=-0.343, p=0.047). A positive correlation was found between the annual average number of likes, the daily average number of views, and the total number of views (r=0.803, p<0.001 and r=0.514, p=0.003); however, a strong negative correlation was found with the total time the video was on air (r=-0.710, p<0.001). There was a strong and positive correlation between total views and average daily views (r=0.844, p<0.001) and a moderate negative correlation between total time on air and average daily views (r=-0.443, p=0.009). Other pairwise comparisons revealed no significant relationships (p>0.05, Figure 1).
The univariate analysis of the linear regression model for predicting the LAP-VEGaS score revealed that video duration (p=0.034) and speech commentary (p<0.001) were significant variables. A one-unit increase in video duration was correlated with a 0.16-unit increase in LAP-VEGaS scores. Videos with speech commentary, in contrast, showed a significantly higher increase in LAP-VEGaS scores, up 7.59 units. However, the annual average number of likes, daily average number of views, upload time, and presence of subtitles were found to be non-significant (p>0.05). Significant variables in the multivariate linear regression analysis included speech commentary (p<0.001) and upload time (p=0.015). As a result, a one-unit increase in the total time since the video’s uploaded resulted in a 0.37 unit decrease in LAP-VEGaS scores, whereas LAP-VEGaS scores increased by 7.5 units in videos with speech commentary. Video duration was not found to be a significant predictor (p=0.966, Table 5).
The univariate analysis found that the average annual rating (p=0.044) and the presence of speech commentary p<0.001 were significant predictors of the JAMA score. A one-unit increase in the average annual rating resulted in a 0.02-unit increase in the JAMA score, whereas the JAMA score increased by 1.87 units in videos with speech commentary. The effects of video duration, average daily views, upload time, and subtitles presence were non-significant (p>0.05). Significant variables in the multivariate linear regression analysis included speech commentary (p<0.001) and average annual rating (p=0.003). As a result, a one-unit increase in the average annual rating resulted in a 0.01-unit increase in the JAMA score; however, in videos with speech commentary, the JAMA score increased by 1.91 units (Table 6).
Univariate analysis of the linear regression model for predicting GQS revealed that the presence of speech commentary was the only significant variable (p<0.001). GQS increased by 1.75 units in videos with speech commentary. The effects of annual average likes, video duration, daily average number of views, upload time, and subtitles presence were non-significant (p>0.05). Speech commentary was a significant variable in the multivariate linear regression analysis (p<0.001). GQS increased by 1.68 units in videos with speech commentary. The total duration of the video’s broadcast was marginally significant (p=0.051). Therefore, every one-unit increase in the total duration of the video’s broadcast resulted in a 0.1-point decrease in GQS scores. Conversely, the presence of subtitles was not found to be a significant predictor (p=0.071, Table 7).
The links of the videos included in the study are provided in Table 8.
Discussion
Surgical education, like our lives, has changed as a result of technological advancements in recent years. The most significant development is that, in addition to traditional face-to-face surgical education, online education has begun to gain traction. Although many factors have been proposed to explain this shift, the most important factor in surgeons turning to online education appears to be the lengthy learning curve associated with laparoscopic surgery practices. Many surgeons want to accelerate their learning curve by watching online videos. For this reason, online training videos are becoming increasingly popular among surgeons seeking to improve their knowledge and skills, particularly in laparoscopic surgery (7-10).
At this point, publishing videos with accurate, up-to-date, and reliable information on online platforms is critical. Unfortunately, YouTube ranks its videos based on the number of views or comments rather than the quality of the content. This sorting is not appropriate for education. In a study emphasizing the significance of this situation, only one of the most 10 popular laparoscopic cholecystectomy videos was found to be appropriate for surgical training (11). However, studies have shown that information obtained from YouTube may be inaccurate or misleading. A review of this issue revealed that the majority of the videos contained incorrect, out-of-date information, resulting in false teachings (12).
Previous research has found varying levels of educational content on YouTube for various surgical procedures. For example, Wu et al. (13) assessed the educational quality of cholesteatoma surgery videos and identified significant areas for improvement, emphasizing the importance of high-quality educational content on public platforms such as YouTube. Similarly, Unal et al. (14) discovered low educational quality in laparoscopic hysterectomy videos, emphasizing the importance of peer-reviewed educational resources during the coronavirus disease 2019 era. Shapiro et al. (15) noted the low quality of endoscopic sinus surgery videos and advised against relying solely on them for surgical training. Tan et al. (16) found that laparoscopic distal pancreatectomy videos on YouTube lacked educational quality. Our current study also found significant gaps in the educational value of LHCS videos, particularly those that lack spoken commentary or professional production standards. Our study backs up these findings, demonstrating that the presence of spoken commentary significantly improves the educational value of surgical videos. In a study of laparoscopic cholecystectomy videos, the most commonly performed procedure, only 15.1% were found to be educationally sufficient. In the same study, it was found that the video duration, number of views, and likes did not correlate with video quality (17). In contrast, our study found that high-scoring videos were watched and liked significantly more, but there was no correlation with video duration. Chapman et al. (18) found that the LAP-VEGaS score was very low, on average 6, which is consistent with our findings.
Other studies have looked into the relationship between user engagement metrics (such as likes and views) and educational quality. Zhang et al. (19) assessed laparoscopic gastrectomy videos and found varying levels of information completeness and reliability, indicating similar challenges in user engagement and educational quality. In our study, we found significant positive correlations between LAP-VEGaS, JAMA scores, and user engagement metrics, implying that higher engagement often leads to better educational content.
Videos with spoken commentary consistently performed better on educational metrics. This is supported by findings from studies on other surgical procedures, such as one by Balta et al. (20) who found that using videos in training could improve surgical opinion. The presence of subtitles resulted in a borderline significant increase in GQS but was less effective than spoken commentary. This finding suggests that, while subtitles can help you understand, they are not a substitute for detailed spoken explanations.
Study Limitations
This study has several limitations that must be addressed. To begin, the sample size of 34 videos may not fully represent the range of LHCS videos available on YouTube. The limited sample size may have impacted the generalizability of our findings. Future research with larger sample sizes is required to validate our findings and provide a more complete analysis. Second, the scoring systems (LAP-VEGaS, JAMA, and GQS) are open to subjective interpretation, which may introduce bias. Although these tools are standardized, variations in individual scorers’ assessments may influence the results. Implementing a more objective and automated scoring system could help address this issue. Another limitation is relying solely on YouTube for video content. While YouTube is a popular platform, it does not host educational videos available online. Other platforms, such as specialized medical education websites, may host higher-quality videos that were not considered in our analysis. Future research should consider combining videos from various sources to provide a more balanced evaluation. Furthermore, the study did not take into account the diverse backgrounds and levels of expertise among video creators. Videos produced by experienced surgeons or medical institutions may have a higher educational value than those created by less experienced individuals. A stratified analysis of the creators’ credentials could yield more nuanced results. Finally, the study’s cross-sectional design limits the ability to infer causality. Longitudinal studies that track the impact of video quality on learning outcomes over time would provide stronger evidence of the educational value of these videos. Despite these limitations, this study provides valuable insights into the current state of educational videos on YouTube and identifies areas for improvement.
Conclusion
We conducted this study to assess the educational quality and reliability of LHCS videos available on YouTube. Several key findings emerged from our research. First, videos with spoken commentary significantly improved educational quality, as evidenced by higher scores on the LAP-VEGaS, JAMA, and GQS systems. This indicates that spoken explanations provide useful context and clarity, making complex procedures more understandable to viewers.
Second, while subtitles were beneficial, they had less of an impact than spoken commentary. This demonstrates that, while subtitles are useful, they cannot completely replace the effectiveness of a well-narrated video. The relationship between user engagement metrics, such as likes and views and educational quality, emphasizes the significance of viewer interaction in determining the value of educational content. Higher engagement typically indicates better educational content, implying that users interact more with videos that contain clear and useful information.
Furthermore, the time since a video was uploaded negatively correlated with educational scores, implying that newer videos may be more current and thus more useful for learning purposes. This finding emphasizes the importance of continuous updates and revisions to keep educational content relevant and accurate.
Overall, this study emphasizes the importance of high-quality, professionally produced educational videos in medical education. It emphasizes the importance of spoken commentary in improving learning experiences and the need for regular updates to keep educational materials relevant. Future efforts should be directed toward improving the production quality and peer-review processes of educational videos to ensure that they meet the educational needs of medical professionals and students.