ABSTRACT
Objective
Orthognathic surgery addresses facial deformities by improving both function and aesthetics, with its success relying heavily on accurate planning. This study aimed to assess the long-term accuracy of virtual surgical planning (VSP) by comparing three-dimensional (3D) preoperative virtual models with actual postoperative outcomes and identifying regions prone to deviation.
Methods
A retrospective study was conducted and approved by the Ethics Committee of Bezmialem Vakıf University. Patients who underwent bimaxillary surgery for Class II or III malocclusion and had postoperative computed tomography scans ≥6 months later were included. VSP was performed using NemoFab software. Standardized Le Fort I and bilateral sagittal split ramus osteotomies were conducted. The preoperative virtual planning model and the postoperative surgical model were aligned using surface-based registration in 3-matic and analyzed in Mimics. Linear and directional (sagittal, coronal, axial) deviations were measured at cephalometric landmarks, and 3D color-coded deviation maps were generated with a ±2 mm threshold. Distance differences between 15 cephalometric points on preoperative planning and postoperative models were statistically analyzed using a one-sample t-test.
Results
Forty-two patients (aged 18-40) were included. The mean deviation was 2.19±0.82 mm. Significant deviations (>2 mm, p<0.05) were found at anterior nasal spine (ANS), posterior nasal spine (PNS), and across all spatial planes. No significant differences were observed based on sex, skeletal class, or surgical sequence. Most discrepancies occurred in the anterior maxilla, chin, and posterior mandible. Preoperative asymmetry and pogonion deviation were not predictive of discrepancies. Intraclass correlation coefficient values >0.90 confirmed measurement reliability.
Conclusion
VSP shows high overall accuracy; however, ANS and PNS remain prone to deviation, warranting further investigation in larger studies.
Introduction
The primary goal of orthognathic surgery is to correct facial deformities and improve both functional and aesthetic concerns. Its success depends not only on surgical techniques but also on precise and detailed treatment planning (1). With advancements in modern technology, orthognathic surgery planning has evolved into a three-dimensional (3D) virtual process. Using 3D imaging and digitally reconstructed models, surgeons can anticipate potential intraoperative challenges and predict postoperative outcomes more accurately (2-4). Virtual surgical planning (VSP) allows for a highly accurate visualization of the jawbones and surrounding anatomical structures, reducing the risk of complications such as unfavorable fractures, nerve injuries, or malunions during surgery. Furthermore, VSP aids in determining whether additional procedures are necessary, allowing patients to be informed in advance. A linear difference of 2 mm or less and an angular difference of 4 degrees or less between the VSP and the actual postoperative outcome are widely regarded as acceptable thresholds for clinical accuracy. At the same time, exeeding these values are typically considered to be clinically significant (5-8).
Several studies have compared orthognathic surgery outcomes with VSP (2, 7, 9). However, few have clearly identified the specific anatomical regions where discrepancies occur between the virtual plan and the postoperative outcome. This study aims to evaluate the long-term accuracy of VSP by comparing 3D models representing the preoperative virtual plans and actual postoperative jaw positions, and to identify specific anatomical regions where deviations commonly occur between planned and actual outcomes.
Methods
Study Design/Sample
A retrospective study was designed and approved by the Ethics Committee of Bezmialem Vakıf University (decision no: 2023/202, date: 14.07.2023). Informed consent was waived due to the retrospective design. Patients aged 18 to 40 years who underwent bimaxillary orthognathic surgery for Class II or III dentofacial deformities and had a postoperative computed tomography (CT) scan taken at least six months after surgery between 2020 and 2023 were included in the study. VSP was performed using the NemoFab software (Nemotec, Madrid, Spain; 2020), and surgical splints were used during the procedure. All patients underwent Le Fort I osteotomy and bilateral sagittal split ramus osteotomy, performed by the same surgical team using a standardized technique.
Additional inclusion criteria were:
• Presence of a sufficient number of teeth to ensure preoperative and postoperative occlusal stability
• At least two occlusal contact points on both sides (tripod contact)
• Rigid fixation in all segments
• Adherence to a standardized protocol for preoperative and postoperative CT imaging
Patients were excluded if they:
• Underwent single-jaw surgery or genioplasty in addition to bimaxillary orthognathic surgery
• Had facial deformities due to trauma, cleft lip and palate
• Underwent orthognathic surgery using the model surgery technique
• Had a history of temporomandibular joint disorders or autoimmune diseases
• Lacked VSP records via NemoFab software
Tomographic Data Collection and 3D Model Analysis
3D models obtained from the preoperative VSP of patients meeting the inclusion criteria, representing the predicted postoperative positions of the jaws were generated and exported in standard tessellation language (STL) format. Subsequently, postoperative CT scans obtained at the 6-month follow-up for these patients were recorded in digital imaging and communications in medicine format. These data were imported into Mimics Innovation Suite software (Materialise, Belgium, v.21.0). Postoperative 3D craniofacial models were reconstructed and exported in STL format. Both preoperative and postoperative 3D models were then imported into 3-matic software (Materialise, Belgium, v.13.0). At least 10 distinct and identical anatomical landmarks on skull were selected on both the preoperative and postoperative models, and the surface-based registration method was used. The registered models were then transferred back to Mimics software, where cephalometric landmarks were identified. The selected cephalometric landmarks included:
• Maxillary and mandibular dental midlines
• Anterior nasal spine (ANS) and posterior nasal spine (PNS)
• Cusps of the right and left upper and lower canines
• Mesiobuccal cusps of the right and left upper and lower first molars
• Pogonion, A point, and B point
The linear distances between corresponding points on the preoperative and postoperative models were measured and recorded. Additionally, the 3D coordinates of each landmark were determined, and distance differences in the sagittal, coronal, and axial directions were calculated separately (Figure 1). Preoperative mandibular asymmetry, maxillary and mandibular midline deviations, and pogonion deviation were measured. The presence of preoperative mandibular asymmetry was recorded. The “create part comparison analysis” function in 3-matic software was used to visualize the discrepancies between the aligned models. 3D color-coded deviation maps were generated to represent the degree of surface deviations. A ±2 mm (5) threshold was set to define the range of deviations for the color mapping. The maxillomandibular complex was divided into six regions: chin, right or left posterior mandible, anterior maxilla, and right or left posterior maxilla. The most significant discrepancy region was identified and noted for each model (Figure 2). A single observer performed all measurements. To assess intra-observer reliability, 20% of all measurements were randomly selected and repeated by the same observer after a minimum two-week interval. Intraclass correlation coefficients (ICC) were calculated for each cephalometric point and deviation measurement to evaluate the repeatability of the measurements.
Variables
The primary outcome variables included location of discrepancies, total, maxillary, and mandibular deviation amounts, as well as deviations at cephalometric points between the planned and actual postoperative measurements. Patient characteristics were recorded as potential influencing factors: age, sex, skeletal malocclusion type (Class II or Class III), surgical sequencing (maxilla-first or mandible-first approach). Patients were divided into two groups based on whether their deviation amounts were less than or greater than 2 mm (5).
Statistical Analysis
All statistical analyses were performed using IBM SPSS Statistics software (IBM Corporation, New York, USA, v.26). Descriptive statistics were calculated, including means, minimum and maximum values, medians, standard deviations (SD), and variances. The distance differences between the 15 cephalometric points identified on both the preoperative virtual planning and postoperative surgical outcome models were statistically compared using a one-sample t-test. The analysis was based on a 2 mm deviation threshold, which is considered clinically acceptable according to the literature (5). This test determined the regions where deviations were statistically significant. The normality of the data was assessed using the Shapiro-Wilk test. Independent samples t-tests and Wilcoxon signed-rank tests were conducted to compare differences in sex, skeletal malocclusion type, and surgical sequencing. Additionally, the relationship between sex, skeletal malocclusion type, surgical sequencing (maxilla-first or mandible-first), and the presence of deviations was analyzed using crosstabulations. All tests were two-sided, and a p-value of <0.05 was considered statistically significant.
Results
Forty-two patients were included in the study (26 females, 16 males), with a mean age of 23.07±3.40 years (mean ± SD). Among them, nine patients had Class II skeletal malocclusion, while 33 had Class III skeletal malocclusion. Surgeries were performed using a maxilla-first approach in 24 patients and a mandible-first approach in 18 patients.
The total mean deviation was 2.19±0.82 mm. In half of the patients, the mean total deviation was below 2 mm, whereas in the other half, it exceeded 2 mm. The total mean deviation was 1.55±0.24 mm in patients without clinically significant deviation (≤2 mm), whereas it was 2.84±0.66 mm in those with deviations greater than 2 mm. Descriptive statistics for deviations at each cephalometric point, as well as maxillary, mandibular, and total deviation values, along with deviations in the coronal, sagittal, and axial planes, are presented in Table 1.
According to the results of the one-sample t-test, deviations of the ANS and PNS points were statistically significantly higher than the test value of 2 mm (p=0.030 and 0.007, respectively). Additionally, the maxillary mean deviation (p=0.030) was significantly higher than the test value. In contrast, total mean deviations in three directions (coronal, sagittal, and axial) (p<0.001), were significantly lower than the test value. However, maxillary, mandibular, and total deviation amounts, as well as mean deviations in three directions, showed no statistically significant differences between sex, skeletal malocclusion type, or maxilla/mandible-first categories (p>0.05). Cross tabulations examining the relationship between sex, skeletal malocclusion type, maxilla/mandible-first categories, and the presence of deviation are presented in Table 2. No statistically significant relationship was found between any of these variables and the presence of deviation (p>0.05).
Evaluation of the 3D color-coded deviation maps revealed that the most prominent discrepancies were observed in the anterior maxilla in 17 patients, in the chin region in 12 patients, posterior mandible in 11 patients, and the posterior maxilla in 2 patients. There was no statistically significant relationship between the presence of preoperative mandibular asymmetry and the presence of postoperative deviation (p=0.710). Additionally, no significant difference was found between the amount of preoperative pogonion deviation from the midline and the presence of postoperative deviation (p=0.300). ICC values for repeated measurements exceeded 0.90 for all evaluated variables, supporting the robustness and reliability of the 3D analysis.
Discussion
Virtual planning techniques and 3D-printed surgical splints are now widely adopted in orthognathic procedures (10-12). VSP enables comprehensive visualization of the dental arches in relation to surrounding skeletal structures within a single 3D model. Compared to traditional planning methods, this digital approach offers multiple advantages. It allows for detailed diagnostic analysis within a 3D environment and enables surgeons to simulate various surgical scenarios to determine the most optimal outcome. It also supports assessing and correcting centric relation in the temporomandibular joint and is a practical educational resource. In computer-assisted surgical simulation systems, the finalized virtual plan can be accurately translated to the clinical setting through surgical splints, which are produced using computer-aided design and computer-aided manufacturing technologies directly from the digital model (13). Presurgical plans do not always match the actual surgical results. Although surgical notes can be helpful, surgeons may differ in estimating the amount of movement. Additionally, these notes often lack the precision needed to evaluate the accuracy of virtual surgery properly. Postoperative models provide the most reliable way to measure the actual surgical changes. The present study aims to assess the long-term accuracy of VSP by comparing 3D models representing the preoperative virtual plans and actual postoperative jaw positions, and to identify specific anatomical regions where deviations commonly occur between planned and exact outcomes.
In this study, the total mean deviation was 2.19±0.82 mm. Although this value is slightly above the commonly accepted clinical threshold of 2 mm, the difference was not statistically significant. This may indicate that the minor postoperative changes resulting from factors such as soft tissue adaptation or bone remodeling are clinically negligible and may not significantly affect surgical outcomes. Neither preoperative mandibular asymmetry nor the degree of pogonion deviation from the midline showed a statistically significant association with postoperative discrepancies. This suggests that while preoperative asymmetry is an important clinical consideration, it may not be a reliable predictor of surgical inaccuracy when modern virtual planning and execution protocols are used.
A notable portion of the maxillary discrepancy may be attributed to deviations at the ANS and PNS points, both of which were statistically significant. These landmarks are particularly susceptible to intraoperative manipulation, such as dissection or trimming with burs, and may also undergo greater postoperative remodeling. Additionally, the maxillary mean deviation (p=0.030), along with mean deviations in three directions (coronal, sagittal, and axial) (p<0.001), were significantly higher than the test value. It is possible that intraoperative factors—such as splint seating, fixation technique, or maxillary positioning errors—play a more prominent role in the development of anterior maxillary deviation, as also supported by the overrepresentation of ANS deviation. These findings highlight the importance of carefully evaluating maxillary positioning during surgery, particularly in the anterior region, and suggest that even minor technical imprecisions can result in clinically perceptible deviations.
There appears to be a clear gap in the literature concerning the use of well-validated assessment methods. Notably, a lack of consensus is observed across studies regarding the criteria and approaches used for evaluation and validation. Han et al. (14) and Baan et al. (1) applied voxel-based registration using the cranial region as the reference, which contributed positively to the accuracy of their outcomes. Hsu et al. (5) and Hernández-Alfaro and Guijarro-Martínez (15) proposed a reliable superimposition technique using surface best-fit registration, while Zinser et al. (16) utilized point-based registration, a method more susceptible to human-induced error. The authors adopted the surface-based registration technique in this study to align the preoperative and postoperative models.
Xia et al. (17) utilized a hybrid approach combining surface best-fit alignment with reference point-based assessment. The reference point discrepancies were quantified as both linear and angular deviations across all three spatial dimensions. In a sample of five patients, the mean linear discrepancy was reported as 0.12 mm, with a SD of 0.19 mm (16, 17). Hernández-Alfaro and Guijarro-Martínez captured the intraoperative dentition position within the intermediate splint using an intraoral scanner. These scanned surfaces were then compared to the preoperative virtual plans through Mathworks (Natick, MA) software, which generated color-coded deviation maps. The authors reported the mean and SD of the surface distance discrepancies (15). Multiple authors have suggested that a discrepancy of up to 2 mm between the virtual surgical plan and the actual postoperative outcome can be considered an acceptable threshold for surgical accuracy (5, 10, 13, 17, 18). Thus, the 2 mm success criterion should be considered the surgical goal. According to the results of this study, the ANS and PNS points were statistically significantly higher than the test value of 2 mm (p=0.030 and 0.007, respectively). However, maxillary, mandibular, and total deviation amounts, and deviations in three directions, showed no statistically significant differences between sex, skeletal malocclusion type, or maxilla/mandible-first categories (p>0.05). ANS and PNS landmarks are particularly susceptible to intraoperative manipulation, such as dissection or trimming with burs, and may also undergo greater postoperative remodeling. Minor bony reductions performed either for the dissection of nasal muscles from the ANS or to preserve the nasal tip may account for the observed changes at the ANS point. Regarding the PNS, bone reduction extending from the ANS to the PNS is often performed to allow proper repositioning of the nasal septum along the midline without deviation. Changes in muscle orientation and traction forces due to superior or inferior repositioning of the maxilla are also believed to play a role in this remodeling process. Although these alterations are not clinically significant, they are potential explanations for the observed changes.
Perez and Ellis (19) argue that errors inadvertently created by performing mandibular surgery last would potentially be eliminated and not translated to the occlusion if the maxilla were positioned last instead. For instance, a 1 mm malposition of the mandible performed after maxillary surgery would create a malocclusion; however, the same malposition performed first would not. The maxilla would instead be malpositioned this slight amount to accommodate the appropriate occlusion. Slight malpositions (i.e., 1 mm or less), even in the incisor area, are not usually clinical problems. However, a 1-mm malocclusion could be a problem. However, Bozok et al. (20) reported that the absolute mean difference of the B point and the pogonion in the maxilla-first group was statistically significantly higher than in the mandible-first group. Several studies have focused on evaluating the accuracy of maxillary positioning following orthognathic surgery. However, limited attention has been given to the predictability of VSP in cases where mandibular surgery is performed first. This has led to ongoing discussions about whether the surgical sequence influences the accuracy of VSP, and whether additional measures—such as more rigid fixation—may be necessary when a mandible-first approach is used. Our study found no statistically significant difference between patients who underwent mandible-first and maxilla-first approaches.
Study Limitations
One of the key limitations of this study is the relatively small sample size (n=42), which may reduce the ability to detect subtle but potentially meaningful differences, particularly in subgroup analyses. Additionally, using ANS and PNS as maxillary landmarks may have overestimated surgical discrepancies, as these points are prone to intraoperative reduction and postoperative remodelling. Excluding them led to a notable decrease in measured maxillary deviation, underscoring the importance of landmark selection in accuracy assessment. Furthermore, since cutting guides were not used, deviations may also have resulted from differences between the osteotomy lines defined during virtual planning and those performed intraoperatively by the surgeon. Larger-scale studies are needed to draw more definitive conclusion.
Conclusion
Taken together, the findings suggest that while VSP ensures a high degree of accuracy overall, specific anatomical landmarks—such as the ANS and PNS—remain susceptible to deviation. Moreover, clinically meaningful discrepancies may arise independently of traditionally assumed predictors such as skeletal classification or surgical sequencing. These results highlight the need for further research with larger, statistically powered sample sizes.


