Automatic quality evaluation of MT translation is known to overestimate translation quality and have low agreement with human quality evaluation. At the same time human translation quality evaluation based on MQM is too complex, tracks excessive linguistic detail and is primarily designed to evaluate human translations. This presentation describes a human MQM implementation which addresses these issues and helps to achieve fast and reliable human evaluation of MT quality which correlates well with human judgment, indicative of post-editing effort and reflects quality of MT perceived by humans. We will also discuss key parameters that affect MT quality evaluation, and applicability of evaluations results to production parameters, such as time and labor costs. The metrics is simple, easy to use and allows not only estimate quality of MT engine, but also compare engines with not so different quality, which previously was hard to do. Metrics is intended to serve as reliable and inexpensive tool for research, ML engineering and the industry.
Conference Track Format
Conference Track Type