Optical music recognition: Difference between revisions
Millstream3 (talk | contribs) m Remove some unnecessary archive URLs for live websites |
Fixed my typo mistake Tags: Manual revert Mobile edit Mobile web edit |
||
(48 intermediate revisions by 18 users not shown) | |||
Line 1: | Line 1: | ||
{{Short description|Computer recognition of music notation}} |
|||
{{Use MDY dates}} |
|||
{{Use |
{{Use MDY dates|date=June 2021}} |
||
{{Use American English|date=June 2021}} |
|||
'''Optical music recognition''' ('''OMR''') is a field of research that investigates how to computationally read |
'''Optical music recognition''' ('''OMR''') is a field of research that investigates how to computationally read [[musical notation]] in documents.<ref>{{cite thesis |type=PhD |last=Pacha |first=Alexander |date=2019 |title=Self-Learning Optical Music Recognition |publisher=TU Wien, Austria |degree=PhD |url=http://katalog.ub.tuwien.ac.at/AC15448215 |doi=10.13140/RG.2.2.18467.40484}}</ref> The goal of OMR is to teach the computer to read and interpret [[sheet music]] and produce a machine-readable version of the written music score. Once captured digitally, the music can be saved in commonly used file formats, e.g. [[MIDI]] (for playback) and [[MusicXML]] (for page layout). |
||
In the past it has, misleadingly, also been called |
In the past it has, misleadingly, also been called "music [[optical character recognition]]". Due to significant differences, this term should no longer be used.<ref name=":0">{{cite journal |last1=Calvo-Zaragoza |first1=Jorge |last2=Hajič |first2=Jan jr. |last3=Pacha |first3=Alexander |title=Understanding Optical Music Recognition |journal=ACM Computing Surveys |year=2020 |volume=53 |issue=4 |pages=1–35 |doi=10.1145/3397499 |arxiv=1908.03608 |s2cid=199543265}}</ref> |
||
== |
==History== |
||
[[File:FirstPublishedDigitalScanOfMusic-Prerau1971.png|thumb|First published digital scan of music scores by David Prerau in 1971]] |
[[File:FirstPublishedDigitalScanOfMusic-Prerau1971.png |thumb |First published digital scan of music scores by David Prerau in 1971]] |
||
Optical music recognition of printed sheet music started in the late 1960s at [[Massachusetts Institute of Technology |
Optical music recognition of printed sheet music started in the late 1960s at the [[Massachusetts Institute of Technology]] when the first [[image scanner]]s became affordable for research institutes.<ref>{{cite AV media |url=https://www.youtube.com/watch?v=Mr7simdf0eA |archive-url=https://ghostarchive.org/varchive/youtube/20211221/Mr7simdf0eA |archive-date=2021-12-21 |url-status=live|title=Optical Music Recognition for Dummies - Part 2 - Introduction and History |date=2018-10-03 |access-date=2021-06-24 |publisher=[[YouTube]] |website=youtube.com}}{{cbignore}}</ref><ref>{{cite thesis |type=PhD |last=Pruslin |first=Dennis Howard |date=1966 |title=Automatic Recognition of Sheet Music |publisher=Massachusetts Institute of Technology, Cambridge, Massachusetts, USA |degree=PhD}}</ref><ref name=prerau1971>{{cite conference |last=Prerau |first=David S. |date=1971 |title=Computer pattern recognition of printed music |conference=Fall Joint Computer Conference |pages=153–162}}</ref> Due to the limited memory of early computers, the first attempts were limited to only a few measures of music. |
||
⚫ | In 1984, a Japanese research group from [[Waseda University]] developed a specialized robot, called WABOT (WAseda roBOT), which was capable of reading the music sheet in front of it and accompanying a singer on an [[electric organ]].<ref>{{cite web |url=https://www.humanoid.waseda.ac.jp/booklet/kato_2.html |title=WABOT – WAseda roBOT |publisher=Waseda University Humanoid |access-date=July 14, 2019 |website=waseda.ac.jp}}</ref><ref>{{cite web |url=https://robotsguide.com/robots/wabot/ |title=Wabot 2 |publisher=IEEE |access-date=July 14, 2019 |website=[[IEEE]]}}</ref> |
||
⚫ | In 1984, a Japanese research group from [[Waseda University]] developed a specialized robot, called WABOT (WAseda roBOT), which was capable of reading the music sheet in front of it and accompanying a singer on an [[electric organ]].<ref>{{cite web | |
||
Early research in OMR was conducted by Ichiro Fujinaga, Nicholas Carter, Kia Ng, David Bainbridge, and Tim Bell. These researchers developed many of the techniques that are still being used today. |
Early research in OMR was conducted by Ichiro Fujinaga, Nicholas Carter, Kia Ng, David Bainbridge, and Tim Bell. These researchers developed many of the techniques that are still being used today. |
||
The first commercial OMR application, MIDISCAN (now [[SmartScore]]), was released in 1991 by |
The first commercial OMR application, MIDISCAN (now [[SmartScore]]), was released in 1991 by Musitek Corporation. |
||
The availability of [[smartphone]]s with good cameras and sufficient computational power, paved the way to mobile solutions where the user takes a picture with the smartphone and the device directly processes the image. |
The availability of [[smartphone]]s with good cameras and sufficient computational power, paved the way to mobile solutions where the user takes a picture with the smartphone and the device directly processes the image. |
||
== |
==Relation to other fields== |
||
[[File:RelationToOtherFields.svg|thumb|Relation of optical music recognition to other fields of research]] |
[[File:RelationToOtherFields.svg |thumb |Relation of optical music recognition to other fields of research]] |
||
Optical music recognition relates to other fields of research, including [[computer vision]], document analysis, and [[music information retrieval]]. It is relevant for practicing musicians and composers that could use OMR systems as a means to enter music into the computer and thus ease the process of [[musical composition|composing]], [[transcription (music)|transcribing]], and editing music. In a library, an OMR system could make music scores searchable<ref>{{cite conference |last1=Laplante |first1=Audrey |last2=Fujinaga |first2=Ichiro |date=2016 |title=Digitizing Musical Scores: Challenges and Opportunities for Libraries |conference=3rd International Workshop on Digital Libraries for Musicology|pages=45–48}}</ref> and for musicologists it would allow to conduct quantitative musicological studies at scale.<ref>{{cite conference |last1=Hajič |first1=Jan jr. |last2=Kolárová |first2=Marta |first3=Alexander |last3=Pacha |first4=Jorge |last4=Calvo-Zaragoza |date=2018 |title=How Current Optical Music Recognition Systems Are Becoming Useful for Digital Libraries |conference=5th International Conference on Digital Libraries for Musicology|pages=57–61|location=Paris, France}}</ref> |
Optical music recognition relates to other fields of research, including [[computer vision]], document analysis, and [[music information retrieval]]. It is relevant for practicing musicians and composers that could use OMR systems as a means to enter music into the computer and thus ease the process of [[musical composition |composing]], [[transcription (music) |transcribing]], and editing music. In a library, an OMR system could make music scores searchable<ref>{{cite conference |last1=Laplante |first1=Audrey |last2=Fujinaga |first2=Ichiro |date=2016 |title=Digitizing Musical Scores: Challenges and Opportunities for Libraries |conference=3rd International Workshop on Digital Libraries for Musicology |pages=45–48}}</ref> and for musicologists it would allow to conduct quantitative musicological studies at scale.<ref>{{cite conference |last1=Hajič |first1=Jan jr. |last2=Kolárová |first2=Marta |first3=Alexander |last3=Pacha |first4=Jorge |last4=Calvo-Zaragoza |date=2018 |title=How Current Optical Music Recognition Systems Are Becoming Useful for Digital Libraries |conference=5th International Conference on Digital Libraries for Musicology |pages=57–61 |location=Paris, France}}</ref> |
||
=== |
===OMR vs. OCR=== |
||
Optical music recognition has frequently been compared to Optical character recognition.<ref name=":0" /><ref name="Bainbridge2001" /><ref name=":1" /> The biggest difference is that music notation is a featural writing system. This means that while the alphabet consists of well-defined primitives (e.g., stems, noteheads, or flags), it is their configuration – how they are placed and arranged on the staff – that determines the semantics and how it should be interpreted. |
Optical music recognition has frequently been compared to Optical character recognition.<ref name=":0" /><ref name="Bainbridge2001" /><ref name=":1" /> The biggest difference is that music notation is a featural writing system. This means that while the alphabet consists of well-defined primitives (e.g., stems, noteheads, or flags), it is their configuration – how they are placed and arranged on the staff – that determines the semantics and how it should be interpreted. |
||
Line 30: | Line 30: | ||
Finally, music notation involves ubiquitous two-dimensional spatial relationships, whereas text can be read as a one-dimensional stream of information, once the baseline is established. |
Finally, music notation involves ubiquitous two-dimensional spatial relationships, whereas text can be read as a one-dimensional stream of information, once the baseline is established. |
||
== |
==Approaches to OMR== |
||
[[File:Excerpt from Nocturne Op. 15, no. 2 by Frédéric Chopin.png|thumb|upright=1.5|Excerpt of [[Nocturnes, Op. 15 (Chopin)|Nocturne Op. 15]], no. 2, by Frédéric Chopin – challenges encountered in optical music recognition]] |
[[File:Excerpt from Nocturne Op. 15, no. 2 by Frédéric Chopin.png |thumb |upright=1.5 |Excerpt of [[Nocturnes, Op. 15 (Chopin) |Nocturne Op. 15]], no. 2, by Frédéric Chopin – challenges encountered in optical music recognition]] |
||
The process of recognizing music scores is typically broken down into smaller steps that are handled with specialized [[pattern recognition]] algorithms. |
The process of recognizing music scores is typically broken down into smaller steps that are handled with specialized [[pattern recognition]] algorithms. |
||
Many competing approaches have been proposed with most of them sharing a pipeline architecture, where each step in this pipeline performs a certain operation, such as detecting and removing staff lines before moving on to the next stage. A common problem with that approach is that errors and artifacts that were made in one stage are propagated through the system and can heavily affect the performance. For example, if the staff line detection stage fails to correctly identify the existence of the music staffs, subsequent steps will probably ignore that region of the image, leading to missing information in the output. |
Many competing approaches have been proposed with most of them sharing a pipeline architecture, where each step in this pipeline performs a certain operation, such as detecting and removing staff lines before moving on to the next stage. A common problem with that approach is that errors and artifacts that were made in one stage are propagated through the system and can heavily affect the performance. For example, if the staff line detection stage fails to correctly identify the existence of the music staffs, subsequent steps will probably ignore that region of the image, leading to missing information in the output. |
||
Optical music recognition is frequently underestimated due to the seemingly easy nature of the problem: If provided with a perfect scan of typeset music, the visual recognition can be solved with a sequence of fairly simple algorithms, such as projections and template matching. However, the process gets significantly harder for poor scans or handwritten music, which many systems fail to recognize altogether. And even if all symbols would have been detected perfectly, it is still challenging to recover the musical semantics due to ambiguities and frequent violations of the rules of music notation (see the example of Chopin's Nocturne). Donald Byrd and Jakob Simonsen argue that OMR is difficult because modern music notation is extremely complex.<ref name=":1">{{cite journal |last1=Byrd |first1=Donald |last2=Simonsen |first2=Jakob Grue |date=2015 |title=Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images |journal=[[Journal of New Music Research]]|volume=44 |issue=3 |pages=169–195 |doi=10.1080/09298215.2015.1045424}}</ref> |
Optical music recognition is frequently underestimated due to the seemingly easy nature of the problem: If provided with a perfect scan of typeset music, the visual recognition can be solved with a sequence of fairly simple algorithms, such as projections and template matching. However, the process gets significantly harder for poor scans or handwritten music, which many systems fail to recognize altogether. And even if all symbols would have been detected perfectly, it is still challenging to recover the musical semantics due to ambiguities and frequent violations of the rules of music notation (see the example of Chopin's Nocturne). Donald Byrd and Jakob Simonsen argue that OMR is difficult because modern music notation is extremely complex.<ref name=":1">{{cite journal |last1=Byrd |first1=Donald |last2=Simonsen |first2=Jakob Grue |date=2015 |title=Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images |journal=[[Journal of New Music Research]] |volume=44 |issue=3 |pages=169–195 |doi=10.1080/09298215.2015.1045424}}</ref> |
||
Donald Byrd also collected a number of interesting examples<ref>{{cite web | |
Donald Byrd also collected a number of interesting examples<ref>{{cite web |url=https://homes.sice.indiana.edu/donbyrd/InterestingMusicNotation.html |title=Gallery of Interesting Music Notation |first=Donald |last=Byrd |date=November 2017 |access-date=July 14, 2019 |website=indiana.edu}}</ref> as well as extreme examples<ref>{{cite web |url=https://homes.sice.indiana.edu/donbyrd/CMNExtremes.htm |title=Extremes of Conventional Music Notation |first=Donald |last=Byrd |date=October 2018 |access-date=July 14, 2019 |website=indiana.edu}}</ref> of music notation that demonstrate the sheer complexity of music notation. |
||
== |
==Outputs of OMR systems== |
||
Typical applications for OMR systems include the creation of an audible version of the music score (referred to as replayability). A common way to create such a version is by generating a [[MIDI]] file, which can be [[Synthesizer|synthesised]] into an audio file. MIDI files, though, are not capable of storing engraving information (how the notes were laid out) or [[enharmonic]] spelling. |
Typical applications for OMR systems include the creation of an audible version of the music score (referred to as replayability). A common way to create such a version is by generating a [[MIDI]] file, which can be [[Synthesizer |synthesised]] into an audio file. MIDI files, though, are not capable of storing engraving information (how the notes were laid out) or [[enharmonic]] spelling. |
||
If the music scores are recognized with the goal of human readability (referred to as reprintability), the structured encoding has to be recovered, which includes precise information on the layout and engraving. Suitable formats to store this information include [[Music Encoding Initiative|MEI]] and [[MusicXML]]. |
If the music scores are recognized with the goal of human readability (referred to as reprintability), the structured encoding has to be recovered, which includes precise information on the layout and engraving. Suitable formats to store this information include [[Music Encoding Initiative |MEI]] and [[MusicXML]]. |
||
Apart from those two applications, it might also be interesting to just extract metadata from the image or enable searching. In contrast to the first two applications, a lower level of comprehension of the music score might be sufficient to perform these tasks. |
Apart from those two applications, it might also be interesting to just extract metadata from the image or enable searching. In contrast to the first two applications, a lower level of comprehension of the music score might be sufficient to perform these tasks. |
||
== |
==General framework (2001)== |
||
[[File:Optical Music Recognition Architecture by Bainbridge and Bell (2001).svg|thumb|upright=2|Optical Music Recognition Architecture by Bainbridge and Bell (2001)]] |
[[File:Optical Music Recognition Architecture by Bainbridge and Bell (2001).svg |thumb |upright=2 |Optical Music Recognition Architecture by Bainbridge and Bell (2001)]] |
||
In 2001, David Bainbridge and Tim Bell published their work on the challenges of OMR, where they reviewed previous research and extracted a general framework for OMR.<ref name=Bainbridge2001/> Their framework has been used by many systems developed after 2001. The framework has four distinct stages with a heavy emphasis on the visual detection of objects. They noticed that the reconstruction of the musical semantics was often omitted from published articles because the used operations were specific to the output format. |
In 2001, David Bainbridge and Tim Bell published their work on the challenges of OMR, where they reviewed previous research and extracted a general framework for OMR.<ref name=Bainbridge2001/> Their framework has been used by many systems developed after 2001. The framework has four distinct stages with a heavy emphasis on the visual detection of objects. They noticed that the reconstruction of the musical semantics was often omitted from published articles because the used operations were specific to the output format. |
||
== |
==Refined framework (2012)== |
||
[[File:Optical Music Recognition Architecture by Rebelo (2012).svg|thumb|upright=2|The general framework for optical music recognition proposed by Ana Rebelo et al. in 2012]] |
[[File:Optical Music Recognition Architecture by Rebelo (2012).svg |thumb |upright=2 |The general framework for optical music recognition proposed by Ana Rebelo et al. in 2012]] |
||
In 2012, Ana Rebelo et al. surveyed techniques for optical music recognition.<ref>{{cite journal |last1=Rebelo |first1=Ana |last2=Fujinaga |first2=Ichiro |last3=Paszkiewicz |first3=Filipe |last4=Marcal |first4=Andre R.S. |last5=Guedes |first5=Carlos |last6=Cardoso |first6=Jamie dos Santos |date=2012 |title=Optical music recognition: state-of-the-art and open issues |journal=[[International Journal of Multimedia Information Retrieval]] |volume=1 |issue=3 |pages=173–190 |doi=10.1007/s13735-012-0004-6 |s2cid=12964479 |url=https://link.springer.com/content/pdf/10.1007%2Fs13735-012-0004-6.pdf |access-date=2019-07-15 |
In 2012, Ana Rebelo et al. surveyed techniques for optical music recognition.<ref>{{cite journal |last1=Rebelo |first1=Ana |last2=Fujinaga |first2=Ichiro |last3=Paszkiewicz |first3=Filipe |last4=Marcal |first4=Andre R.S. |last5=Guedes |first5=Carlos |last6=Cardoso |first6=Jamie dos Santos |date=2012 |title=Optical music recognition: state-of-the-art and open issues |journal=[[International Journal of Multimedia Information Retrieval]] |volume=1 |issue=3 |pages=173–190 |doi=10.1007/s13735-012-0004-6 |s2cid=12964479 |url=https://link.springer.com/content/pdf/10.1007%2Fs13735-012-0004-6.pdf |access-date=2019-07-15}}</ref> They categorized the published research and refined the OMR pipeline into the four stages: Preprocessing, Music symbols recognition, Musical notation reconstruction and Final representation construction. This framework became the de facto standard for OMR and is still being used today (although sometimes with slightly different terminology). For each block, they give an overview of techniques that are used to tackle that problem. This publication is the most cited paper on OMR research as of 2019. |
||
== |
==Deep learning (since 2016)== |
||
With the advent of [[deep learning]], many computer vision problems have shifted from imperative programming with hand-crafted heuristics and feature engineering towards machine learning. In optical music recognition, the staff processing stage,<ref>{{cite journal |last1=Gallego |first1=Antonio-Javier |last2=Calvo-Zaragoza |first2=Jorge |date=2017 |title=Staff-line removal with selectional auto-encoders |journal=Expert Systems with Applications |volume=89 |pages=138–148 |doi=10.1016/j.eswa.2017.07.002 |hdl=10045/68971 |hdl-access=free |
With the advent of [[deep learning]], many computer vision problems have shifted from imperative programming with hand-crafted heuristics and feature engineering towards machine learning. In optical music recognition, the staff processing stage,<ref>{{cite journal |last1=Gallego |first1=Antonio-Javier |last2=Calvo-Zaragoza |first2=Jorge |date=2017 |title=Staff-line removal with selectional auto-encoders |journal=Expert Systems with Applications |volume=89 |pages=138–148 |doi=10.1016/j.eswa.2017.07.002 |hdl=10045/68971 |hdl-access=free}}</ref><ref>{{cite conference |last1=Castellanos |first1=Fancisco J. |last2=Calvo-Zaragoza |first2=Jorge |first3=Gabriel |last3=Vigliensoni |first4=Ichiro |last4=Fujinaga |date=2018 |title=Document Analysis of Music Score Images with Selectional Auto-Encoders |conference=19th International Society for Music Information Retrieval Conference |pages=256–263 |location=Paris, France |url=http://ismir2018.ircam.fr/doc/pdfs/93_Paper.pdf |access-date=2019-07-15}}</ref> the music object detection stage,<ref>{{cite conference |last1=Tuggener |first1=Lukas |last2=Elezi |first2=Ismail |first3=Jürgen |last3=Schmidhuber |first4=Thilo |last4=Stadelmann |date=2018 |title=Deep Watershed Detector for Music Object Recognition |conference=19th International Society for Music Information Retrieval Conference |pages=271–278 |location=Paris, France |url=http://ismir2018.ircam.fr/doc/pdfs/225_Paper.pdf |access-date=2019-07-15}}</ref><ref>{{cite conference |last1=Hajič |first1=Jan jr. |last2=Dorfer |first2=Matthias |first3=Gerhard |last3=Widmer |first4=Pavel |last4=Pecina |date=2018 |title=Towards Full-Pipeline Handwritten OMR with Musical Symbol Detection by U-Nets |conference=19th International Society for Music Information Retrieval Conference |pages=225–232 |location=Paris, France |url=http://ismir2018.ircam.fr/doc/pdfs/175_Paper.pdf |access-date=2019-07-15}}</ref><ref>{{cite journal |last1=Pacha |first1=Alexander |last2=Hajič |first2=Jan jr. |last3=Calvo-Zaragoza |first3=Jorge |date=2018 |title=A Baseline for General Music Object Detection with Deep Learning |journal=Applied Sciences |volume=8 |issue=9 |pages=1488–1508 |doi=10.3390/app8091488 |doi-access=free|hdl=20.500.12708/20052 |hdl-access=free }}</ref><ref>{{cite conference |last1=Pacha |first1=Alexander |last2=Choi |first2=Kwon-Young |last3=Coüasnon |first3=Bertrand |last4=Ricquebourg |first4=Yann |last5=Zanibbi |first5=Richard |last6=Eidenberger |first6=Horst |date=2018 |title=Handwritten Music Object Detection: Open Issues and Baseline Results |conference=13th International Workshop on Document Analysis Systems |pages=163–168 |doi=10.1109/DAS.2018.51 |url=https://hal.archives-ouvertes.fr/hal-01972424/file/DAS_2018_paper_59.pdf |access-date=2019-09-02}}</ref> as well as the music notation reconstruction stage<ref>{{cite conference |last1=Pacha |first1=Alexander |last2=Calvo-Zaragoza |first2=Jorge |last3=Hajič |first3=Jan jr. |date=2019 |title=Learning Notation Graph Construction for Full-Pipeline Optical Music Recognition |conference=20th International Society for Music Information Retrieval Conference|url=http://archives.ismir.net/ismir2019/paper/000006.pdf |access-date=2023-07-02}}</ref> have seen successful attempts to solve them with deep learning. |
||
Even completely new approaches have been proposed, including solving OMR in an end-to-end fashion with sequence-to-sequence models, that take an image of music scores and directly produce the recognized music in a simplified format.<ref>{{cite conference |last1=van der Wel |first1=Eelco |last2=Ullrich |first2=Karen |date=2017 |title=Optical Music Recognition with Convolutional Sequence-to-Sequence Models |conference=18th International Society for Music Information Retrieval Conference |location=Suzhou, China |url=https://archives.ismir.net/ismir2017/paper/000069.pdf}}</ref><ref>{{cite journal |last1=Calvo-Zaragoza |first1=Jorge |last2=Rizo |first2=David |date=2018 |title=End-to-End Neural Optical Music Recognition of Monophonic Scores |journal=Applied Sciences |volume=8 |issue=4 |pages=606 |doi=10.3390/app8040606 |doi-access=free }}</ref><ref>{{cite conference |last1=Baró |first1=Arnau |last2=Riba |first2=Pau |last3=Calvo-Zaragoza |first3=Jorge |last4=Fornés |first4=Alicia |date=2017 |title=Optical Music Recognition by Recurrent Neural Networks |conference=14th International Conference on Document Analysis and Recognition |pages=25–26 |doi=10.1109/ICDAR.2017.260}}</ref><ref>{{cite journal|last1=Baró|first1=Arnau|last2=Riba|first2=Pau|last3=Calvo-Zaragoza|first3=Jorge|last4=Fornés|first4=Alicia|date=2019|title=From Optical Music Recognition to Handwritten Music Recognition: A baseline|journal=Pattern Recognition Letters|volume=123|pages=1–8|doi=10.1016/j.patrec.2019.02.029|hdl=10045/89708|hdl-access=free}}</ref> |
Even completely new approaches have been proposed, including solving OMR in an end-to-end fashion with sequence-to-sequence models, that take an image of music scores and directly produce the recognized music in a simplified format.<ref>{{cite conference |last1=van der Wel |first1=Eelco |last2=Ullrich |first2=Karen |date=2017 |title=Optical Music Recognition with Convolutional Sequence-to-Sequence Models |conference=18th International Society for Music Information Retrieval Conference |location=Suzhou, China |url=https://archives.ismir.net/ismir2017/paper/000069.pdf}}</ref><ref>{{cite journal |last1=Calvo-Zaragoza |first1=Jorge |last2=Rizo |first2=David |date=2018 |title=End-to-End Neural Optical Music Recognition of Monophonic Scores |journal=Applied Sciences |volume=8 |issue=4 |pages=606 |doi=10.3390/app8040606 |doi-access=free|hdl=10251/143793 |hdl-access=free }}</ref><ref>{{cite conference |last1=Baró |first1=Arnau |last2=Riba |first2=Pau |last3=Calvo-Zaragoza |first3=Jorge |last4=Fornés |first4=Alicia |date=2017 |title=Optical Music Recognition by Recurrent Neural Networks |conference=14th International Conference on Document Analysis and Recognition |pages=25–26 |doi=10.1109/ICDAR.2017.260}}</ref><ref>{{cite journal |last1=Baró |first1=Arnau |last2=Riba |first2=Pau |last3=Calvo-Zaragoza |first3=Jorge |last4=Fornés |first4=Alicia |date=2019 |title=From Optical Music Recognition to Handwritten Music Recognition: A baseline |journal=Pattern Recognition Letters |volume=123 |pages=1–8 |doi=10.1016/j.patrec.2019.02.029 |bibcode=2019PaReL.123....1B |hdl=10045/89708 |s2cid=127170982 |hdl-access=free}}</ref> |
||
== |
==Notable scientific projects== |
||
=== |
===Staff removal challenge=== |
||
For systems that were developed before 2016, staff detection and removal posed a significant obstacle. A scientific competition was organized to improve the state of the art and advance the field.<ref>{{cite |
For systems that were developed before 2016, staff detection and removal posed a significant obstacle. A scientific competition was organized to improve the state of the art and advance the field.<ref>{{cite conference <!-- Citation bot no -->|last1=Fornés |first1=Alicia |last2=Dutta |first2=Anjan |last3=Gordo |first3=Albert |last4=Lladós |first4=Josep |date=2013 |chapter=The 2012 Music Scores Competitions: Staff Removal and Writer Identification |title=Graphics Recognition. New Trends and Challenges |volume=7423 |publisher=Springer |pages=173–186 |doi=10.1007/978-3-642-36824-0_17 |series=Lecture Notes in Computer Science |isbn=978-3-642-36823-3| editor1= Young-Bin Kwon| editor2=Jean-Marc Ogier}}</ref> Due to excellent results and modern techniques that made the staff removal stage obsolete, this competition was discontinued. |
||
However, the freely available CVC-MUSCIMA dataset that was developed for this challenge is still highly relevant for OMR research as it contains 1000 high-quality images of handwritten music scores, transcribed by 50 different musicians. It has been further extended into the MUSCIMA++ dataset, which contains detailed annotations for 140 out of 1000 pages. |
However, the freely available CVC-MUSCIMA dataset that was developed for this challenge is still highly relevant for OMR research as it contains 1000 high-quality images of handwritten music scores, transcribed by 50 different musicians. It has been further extended into the MUSCIMA++ dataset, which contains detailed annotations for 140 out of 1000 pages. |
||
=== |
===SIMSSA=== |
||
The Single Interface for Music Score Searching and Analysis project (SIMSSA)<ref>{{cite web | |
The Single Interface for Music Score Searching and Analysis project (SIMSSA)<ref>{{cite web |url=https://simssa.ca |title=Single Interface for Music Score Searching and Analysis project |publisher=McGill University |access-date=July 14, 2019 |website=simssa.ca}}</ref> is probably the largest project that attempts to teach computers to recognize musical scores and make them accessible. Several sub-projects have already been successfully completed, including the Liber Usualis<ref>{{cite web |url=http://liber.simssa.ca |title=Search the Liber Usualis |publisher=McGill University |access-date=July 14, 2019 |website=liber.simssa.ca}}</ref> and Cantus Ultimus.<ref>{{cite web |url=https://cantus.simssa.ca |title=Cantus Ultimus |publisher=McGill University |access-date=July 14, 2019 |website=cantus.simssa.ca}}</ref> |
||
=== |
===TROMPA=== |
||
Towards Richer Online Music Public-domain Archives (TROMPA) is an international research project, sponsored by the European Union that investigates how to make public-domain digital music resources more accessible.<ref>{{cite web | |
Towards Richer Online Music Public-domain Archives (TROMPA) is an international research project, sponsored by the European Union that investigates how to make public-domain digital music resources more accessible.<ref>{{cite web |url=https://trompamusic.eu |title=Towards Richer Online Music Public-domain Archives |access-date=July 14, 2019 |website=trompamusic.eu}}</ref> |
||
== |
==Datasets== |
||
The development of OMR systems benefits from test datasets of sufficient size and diversity to ensure the system being developed works under various conditions. However, |
The development of OMR systems benefits from test datasets of sufficient size and diversity to ensure the system being developed works under various conditions. However, for legal reasons and potential copyright violations, it is challenging to compile and publish such a dataset. The most notable datasets for OMR are referenced and summarized by the OMR Datasets project<ref>{{cite web |url=https://apacha.github.io/OMR-Datasets/ |title=Optical Music Recognition Datasets |last=Pacha |first=Alexander |access-date=July 14, 2019 |website=github.io}}</ref> and include the CVC-MUSCIMA,<ref>{{cite journal |last1=Fornés |first1=Alicia |last2=Dutta |first2=Anjan |last3=Gordo |first3=Albert |last4=Lladós |first4=Josep |date=2012 |title=CVC-MUSCIMA: A Ground-truth of Handwritten Music Score Images for Writer Identification and Staff Removal |journal=International Journal on Document Analysis and Recognition |volume=15 |issue=3 |pages=243–251 |doi=10.1007/s10032-011-0168-2 |s2cid=17946417}}</ref> MUSCIMA++,<ref>{{cite conference |last1=Hajič |first1=Jan jr. |last2=Pecina |first2=Pavel |date=2017 |title=The MUSCIMA++ Dataset for Handwritten Optical Music Recognition |conference=14th International Conference on Document Analysis and Recognition |pages=39–46 |doi=10.1109/ICDAR.2017.16 |location=Kyoto, Japan}}</ref> DeepScores,<ref>{{cite conference |last1=Tuggener |first1=Lukas |last2=Elezi |first2=Ismail |last3=Schmidhuber |first3=Jürgen |last4=Pelillo |first4=Marcello |last5=Stadelmann |first5=Thilo |date=2018 |title=DeepScores – A Dataset for Segmentation, Detection and Classification of Tiny Objects |conference=24th International Conference on Pattern Recognition |doi=10.21256/zhaw-4255 |location=Beijing, China}}</ref> PrIMuS,<ref>{{cite conference |last1=Calvo-Zaragoza |first1=Jorge |last2=Rizo |first2=David |date=2018 |title=Camera-PrIMuS: Neural End-to-End Optical Music Recognition on Realistic Monophonic Scores |conference=19th International Society for Music Information Retrieval Conference |pages=248–255 |url=http://ismir2018.ircam.fr/doc/pdfs/33_Paper.pdf |location=Paris, France |access-date=2019-07-15}}</ref> HOMUS,<ref>{{cite conference |last1=Calvo-Zaragoza |first1=Jorge |last2=Oncina |first2=Jose |date=2014 |title=Recognition of Pen-Based Music Notation: The HOMUS Dataset |conference=22nd International Conference on Pattern Recognition |pages=3038–3043 |doi=10.1109/ICPR.2014.524}}</ref> and SEILS dataset,<ref>{{cite conference |last1=Parada-Cabaleiro |first1=Emilia |last2=Batliner |first2=Anton |last3=Baird |first3=Alice |last4=Schuller |first4=Björn |date=2017 |title=The SEILS Dataset: Symbolically Encoded Scores in Modern-Early Notation for Computational Musicology |conference=18th International Society for Music Information Retrieval Conference |pages=575–581 |url=https://archives.ismir.net/ismir2017/paper/000014.pdf |location=Suzhou, China |access-date=2020-08-12}}</ref> as well as the Universal Music Symbol Collection.<ref>{{cite conference |last1=Pacha |first1=Alexander |last2=Eidenberger |first2=Horst |date=2017 |title=Towards a Universal Music Symbol Classifier |conference=14th International Conference on Document Analysis and Recognition |pages=35–36 |doi=10.1109/ICDAR.2017.265 |location=Kyoto, Japan}}</ref> |
||
French company [[Newzik]] took a different approach in the development of its OMR technology Maestria<ref name=maestria>{{cite web | |
French company [[Newzik]] took a different approach in the development of its OMR technology Maestria,<ref name=maestria>{{cite web |url=https://newzik.com/maestria |title=Maestria |publisher=Newzik |website=newzik.com |access-date=2021-06-24}}</ref> by using random score generation. Using synthetic data helped with avoiding copyright issues and training the artificial intelligence algorithms on musical cases that rarely occur in actual repertoire, ultimately resulting in (according to claims by the company) more accurate music recognition.<ref>{{cite AV media |title=Apprendre le solfège à des algorithmes avec Marie Chupeau, chercheuse en intelligence artificielle |url=https://www.youtube.com/watch?v=lL6ehE2YCnM |archive-url=https://ghostarchive.org/varchive/youtube/20211221/lL6ehE2YCnM |archive-date=2021-12-21 |url-status=live|language=fr |date=2021-06-21 |access-date=2021-06-24 |publisher=[[YouTube]] |website=youtube.com}}{{cbignore}}</ref> |
||
== |
==Software== |
||
=== |
===Academic and open-source software=== |
||
[[Open source]] OMR projects vary significantly, from well developed software such as [[Audiveris]], to many projects that have been realized in academia, but only a few of which reached a mature state and have been successfully deployed to users. These systems include: |
|||
* Aruspix<ref>{{Cite web |url=http://www.aruspix.net |title=Aruspix |access-date=2019-07-15 | |
* Aruspix<ref>{{Cite web |url=http://www.aruspix.net |title=Aruspix |access-date=2019-07-15 |website=aruspix.net}}</ref> |
||
* [[Audiveris]]<ref>{{Cite web |url=https://github.com/audiveris |title=Audiveris |access-date=2019-07-15 | |
* [[Audiveris]]<ref>{{Cite web |url=https://github.com/audiveris |title=Audiveris |access-date=2019-07-15 |website=github.com}}</ref> |
||
* CANTOR<ref>{{Cite web |url=https://www.cs.waikato.ac.nz/~davidb/home.html |title=David Bainbridge (Home Page) | |
* CANTOR<ref>{{Cite web |url=https://www.cs.waikato.ac.nz/~davidb/home.html |title=David Bainbridge (Home Page) |access-date=2019-07-15 |website=waikato.ac.nz}}</ref> |
||
* MusicStaves toolkit for Gamera<ref>{{Cite web |url=https://gamera.informatik.hsnr.de/addons/musicstaves/index.html |title=Gamera Addon: MusicStaves Toolkit |access-date=2019-07-15 | |
* MusicStaves toolkit for Gamera<ref>{{Cite web |url=https://gamera.informatik.hsnr.de/addons/musicstaves/index.html |title=Gamera Addon: MusicStaves Toolkit |access-date=2019-07-15 |website=hsnr.de}}</ref> |
||
* DMOS<ref>{{cite conference |last1=Coüasnon |first1=Bertrand |date=2001 |title=DMOS: a generic document recognition method, application to an automatic generator of musical scores, mathematical formulae and table structures recognition systems |conference=Sixth International Conference on Document Analysis and Recognition |pages=215–220 |doi=10.1109/ICDAR.2001.953786}}</ref> |
* DMOS<ref>{{cite conference |last1=Coüasnon |first1=Bertrand |date=2001 |title=DMOS: a generic document recognition method, application to an automatic generator of musical scores, mathematical formulae and table structures recognition systems |conference=Sixth International Conference on Document Analysis and Recognition |pages=215–220 |doi=10.1109/ICDAR.2001.953786}}</ref> |
||
* Oemer<ref>{{cite web | url=https://github.com/BreezeWhite/oemer | title=oemer: End-to-end Optical Music Recognition (OMR) system | website=github.com | access-date=2023-09-21}}</ref> |
|||
* [[OpenOMR]]<ref>{{Cite web |url=https://sourceforge.net/projects/openomr/ |title=OpenOMR |access-date=2017-01-26 |archive-date=2016-06-14 |archive-url=https://web.archive.org/web/20160614032841/https://sourceforge.net/projects/openomr/ |url-status=live }}</ref> |
|||
* |
* [[OpenOMR]]<ref>{{Cite web |url=https://sourceforge.net/projects/openomr/ |title=OpenOMR |access-date=2017-01-26 |website=sourceforge.net|date=April 10, 2013 }}</ref> |
||
* Rodan<ref>{{Cite web |url=https://github.com/DDMAL/Rodan/wiki |title=Rodan |access-date=2019-07-15 |website=github.com}}</ref> |
|||
=== |
===Commercial software=== |
||
Most of the commercial desktop applications that were developed in the last 20 years have been shut down again due to the lack of commercial success, leaving only a few vendors that are still developing, maintaining, and selling OMR products. |
Most of the commercial desktop applications that were developed in the last 20 years have been shut down again due to the lack of commercial success, leaving only a few vendors that are still developing, maintaining, and selling OMR products. |
||
Some of these products claim extremely high recognition rates with up to 100% accuracy <ref>{{Cite web |url=https://www.capella-software.com/us/index.cfm/products/capella-scan/eighth-rest-or-smudge/ |title= |
Some of these products claim extremely high recognition rates with up to 100% accuracy <ref>{{Cite web |url=https://www.capella-software.com/us/index.cfm/products/capella-scan/eighth-rest-or-smudge/ |title=Eighth rest or smudge |access-date=2019-07-15 |website=capella-software.com |publisher=capella-software AG}}</ref><ref name=photoscore /> but fail to disclose how those numbers were obtained, making it nearly impossible to verify them and compare different OMR systems. |
||
* [[Capella (notation program)#Companion products|capella-scan]]<ref name="capella-scan">{{Cite web|url=https://www.capella-software.com/ |
* [[Capella (notation program)#Companion products |capella-scan]]<ref name="capella-scan">{{Cite web |url=https://www.capella-software.com/us/index.cfm/products/capella-scan/info-capella-scan/ |title=capella-scan |website=capella-software.com |access-date=2021-06-24 |publisher=capella-software AG}}</ref> |
||
* |
* FORTE by Forte Notation<ref name=forte>{{cite web |url=https://www.fortenotation.com/en/products/writing-scores/forte-premium/ |title=FORTE 12 Premium Edition |access-date=2021-06-24 |website=fortenotation.com |publisher=Forte Notation}}</ref> |
||
* MIDI-Connections Scan by |
* MIDI-Connections Scan by Composing & Arranging Systems<ref>{{cite web |url=http://www.midi-connections.com/product_scan_e.htm |title=MIDI-Connections SCAN 2.1 |access-date=2021-06-24 |website=midi-connections.com |publisher=Composing & Arranging Systems}}</ref> |
||
⚫ | |||
* MP Scan by Braeburn.<ref>{{cite web | url=http://www.braeburn.co.uk/mpsinfo.htm |title=Music Publisher Scanning Edition | archive-url=https://web.archive.org/web/20130413045521/http://www.braeburn.co.uk/mpsinfo.htm | archive-date=2013-04-13 | access-date=2021-06-24}}</ref> Uses SharpEye SDK. |
|||
* Myriad SARL |
|||
⚫ | |||
* OMeR (Optical Music easy Reader) Add-on for Harmony Assistant and Melody Assistant: Myriad Software<ref>{{Cite web |url= |
** OMeR (Optical Music easy Reader) Add-on for Harmony Assistant and Melody Assistant: Myriad Software<ref>{{Cite web |url=https://www.myriad-online.com/en/products/omer.htm |title=OMeR |access-date=2013-10-06 |website=myriad-online.com |publisher=Myriad SARL}}</ref> |
||
* PDFtoMusic<ref>{{Cite web |url= |
** PDFtoMusic Pro<ref>{{Cite web |url=https://www.myriad-online.com/en/products/pdftomusicpro.htm |title=PDFtoMusic Pro |access-date=2015-11-13 |website=myriad-online.com |publisher=Myriad SARL}}</ref> |
||
* PhotoScore by Neuratron |
* PhotoScore by Neuratron<ref name="photoscore">{{Cite web |url=https://www.neuratron.com/photoscore.htm |title=PhotoScore & NotateMe Ultimate |access-date=2021-06-24 |website=neuratron.com |publisher=Neuratron}}</ref> The Light version of PhotoScore is used in [[Sibelius notation program |Sibelius]]; PhotoScore uses the SharpEye SDK |
||
* Scorscan by npcImaging |
* Scorscan by npcImaging<ref>{{Cite web |url=http://www.npcimaging.com/scscinfo/scscinfo.html |title=ScorScan information |access-date=2013-10-06 |publisher=NPC Imaging |website=npcimaging.com}}</ref> |
||
⚫ | |||
* SharpEye by Visiv<ref>{{Cite web |url=http://www.visiv.co.uk/ |title=SharpEye |access-date=2010-08-20 |archive-date=2010-08-14 |archive-url=https://web.archive.org/web/20100814162847/http://www.visiv.co.uk/ |url-status=live }}</ref> |
|||
*ScanScore<ref>{{Cite web |url=https://scan-score.com/en/ |title=ScanScore |website=scan-score.com |access-date=2019-11-24 |publisher=SCANSCORE}}</ref> (Also as a bundle with [[Forte (notation program) |Forte Notation]].)<ref name=forte /> |
|||
** VivaldiScan (same as SharpEye)<ref>{{cite web | url=http://www.vivaldistudio.com/ENG/VivaldiScan.asp |title=VivaldiScan | archive-url=https://web.archive.org/web/20051224185409/http://www.vivaldistudio.com/Eng/VivaldiScan.asp | archive-date=2005-12-24 | access-date=2021-06-24}}</ref> |
|||
* Soundslice PDF/image importer.<ref>{{Cite web |url=https://www.soundslice.com/sheet-music-scanner/ |title=Soundslice sheet music scanner |access-date=2022-12-17 |website=soundslice.com |publisher=Soundslice}}</ref> AI-based OMR system released in beta in September 2022.<ref>{{Cite web |url=https://www.soundslice.com/blog/226/pdf-and-photo-scanning-beta/ |title=Soundslice PDF and photo scanning (beta) |access-date=2022-12-17 |website=soundslice.com |publisher=Soundslice}}</ref> |
|||
⚫ | * [[SmartScore]] by Musitek.<ref>{{cite web | |
||
⚫ | * Maestria by Newzik.<ref name=maestria /> Released in May 2021, Maestria is an example of new-generation OMR technology based on deep learning. The company claims it not only brings better results but also means "it becomes more accurate with each conversion".<ref>{{Cite web |last=Rothman |first=Philip |date=2021-05-26 |title=Newzik introduces interactive LiveScores with Maestria, AI-based optical music recognition |url=https://www.scoringnotes.com/news/newzik-introduces-livescores-with-maestria/ |access-date=2021-06-24 |publisher=Scoring Note |website=scoringnotes.com}}</ref> |
||
*ScanScore<ref>{{Cite web|url=https://scan-score.com/en/|title=Sheet Music Scanner {{!}} SCANSCORE Sheet Music Scanning Software|website=SCANSCORE|language=en-US|access-date=2019-11-24|archive-date=2019-12-19|archive-url=https://web.archive.org/web/20191219074118/https://scan-score.com/en|url-status=live}}</ref> (Also as a bundle with [[Forte (notation program)|Forte Notation]].) <ref>{{Cite web|url=https://www.fortenotation.com/en/products/writing-scores/forte-premium/|title=FORTE 11 Premium|website=Forte Notation|language=en-US|access-date=2019-12-19|archive-date=2019-12-19|archive-url=https://web.archive.org/web/20191219074103/https://www.fortenotation.com/en/products/writing-scores/forte-premium/|url-status=live}}</ref> |
|||
⚫ | * Maestria by Newzik.<ref name=maestria /> Released in May 2021, Maestria is an example of new-generation OMR technology based on deep |
||
=== |
===Mobile apps=== |
||
Better cameras and increases in processing power have enabled a range of mobile applications, both on the Google Play Store and the Apple Store. Frequently the focus is on sight-playing (see [[sight-reading]]) |
Better cameras and increases in processing power have enabled a range of mobile applications, both on the Google Play Store and the Apple Store. Frequently the focus is on sight-playing (see [[sight-reading]]) – converting the sheet music into sound that is played on the device. |
||
* iSeeNotes by Gear Up AB<ref>{{cite web | |
* iSeeNotes by Gear Up AB<ref>{{cite web |url=http://www.iseenotes.com |title=iSeeNotes |access-date=2021-06-24 |website=iseenotes.com |publisher=Geer Up AB}}</ref> |
||
* NotateMe Now by Neuratron<ref>{{cite web | |
* NotateMe Now by Neuratron<ref>{{cite web |url=https://www.neuratron.com/notateme.html |title=NotateMe |publisher=Neuratron |website=neuratron.com |access-date=2021-06-24}}</ref> |
||
* Notation Scanner by Song Zhang<ref>{{cite web | |
* Notation Scanner by Song Zhang<ref>{{cite web |url=https://apps.apple.com/us/app/notation-scanner-music-ocr/id1260311003 |title=Notation Scanner |publisher=[[Apple Inc.]] |website=apps.apple.com |date=March 23, 2020 |access-date=2021-06-24}}</ref> |
||
* PlayScore 2 by Organum Ltd<ref>{{cite web | |
* PlayScore 2 by Organum Ltd<ref>{{cite web |url=https://www.playscore.co |title=PlayScore 2 |publisher=PlayScore |website=playscore.co |access-date=2021-06-24}}</ref> |
||
* SmartScore NoteReader by Musitek<ref>{{cite web | |
* SmartScore NoteReader by Musitek<ref>{{cite web |url=https://play.google.com/store/apps/details?id=com.musitek.notereader&hl=en_US&gl=US |title=SmartScore NoteReader |website=play.google.com |access-date=2021-06-24}}</ref> |
||
* Newzik app<ref>{{cite web | |
* Newzik app<ref>{{cite web |url=https://newzik.com/app/ |title=Newzik app |publisher=Newzik |website=newzik.com |access-date=2021-06-24}}</ref> |
||
== |
==See also== |
||
* [[Music information retrieval]] (MIR) is the broader problem of retrieving music information from media including music scores and audio. |
* [[Music information retrieval]] (MIR) is the broader problem of retrieving music information from media including music scores and audio. |
||
* [[Optical character recognition]] (OCR) is the recognition of text which can be applied to [[document retrieval]], analogously to OMR and MIR. However, a complete OMR system must faithfully represent text that is present in music scores, so OMR is in fact a superset of OCR.<ref name=Bainbridge2001>{{Cite journal|last1=Bainbridge|first1=David|last2=Bell|first2=Tim|year=2001|title=The challenge of optical music recognition|url=https://www.researchgate.net/publication/220147775|journal=Computers and the Humanities|volume=35|issue=2|pages=95–121|access-date=23 February 2017|doi=10.1023/A:1002485918032|s2cid=18602074}}</ref> |
* [[Optical character recognition]] (OCR) is the recognition of text which can be applied to [[document retrieval]], analogously to OMR and MIR. However, a complete OMR system must faithfully represent text that is present in music scores, so OMR is in fact a superset of OCR.<ref name=Bainbridge2001>{{Cite journal |last1=Bainbridge |first1=David |last2=Bell |first2=Tim |year=2001 |title=The challenge of optical music recognition |url=https://www.researchgate.net/publication/220147775 |journal=Computers and the Humanities |volume=35 |issue=2 |pages=95–121 |access-date=23 February 2017 |doi=10.1023/A:1002485918032 |s2cid=18602074}}</ref> |
||
== |
==References== |
||
{{Reflist}} |
{{Reflist}} |
||
== |
==External links== |
||
* [https://www.youtube.com/playlist?list=PL1jvwDVNwQke-04UxzlzY4FM33bo1CGS0 Recording of the ISMIR 2018 tutorial "Optical Music Recognition for Dummies"] |
* [https://www.youtube.com/playlist?list=PL1jvwDVNwQke-04UxzlzY4FM33bo1CGS0 Recording of the ISMIR 2018 tutorial "Optical Music Recognition for Dummies"] |
||
* [http://www.music-notation.info/en/compmus/omr.html Optical Music Recognition (OMR): Programs and scientific papers] |
* [http://www.music-notation.info/en/compmus/omr.html Optical Music Recognition (OMR): Programs and scientific papers] |
||
* [https://web.archive.org/web/20131218222736/http://www.informatics.indiana.edu/donbyrd/OMRSystemsTable.html OMR (Optical Music Recognition) Systems]: Comprehensive table of OMR (Last updated: 30 January 2007). |
* [https://web.archive.org/web/20131218222736/http://www.informatics.indiana.edu/donbyrd/OMRSystemsTable.html OMR (Optical Music Recognition) Systems]: Comprehensive table of OMR (Last updated: 30 January 2007). |
||
{{commons category-inline|Optical music recognition}} |
{{commons category-inline |Optical music recognition}} |
||
[[Category:Music OCR software| ]] |
[[Category:Music OCR software |Music OCR software ]] |
||
[[Category:Musical notation]] |
[[Category:Musical notation]] |
Latest revision as of 00:41, 25 October 2024
Optical music recognition (OMR) is a field of research that investigates how to computationally read musical notation in documents.[1] The goal of OMR is to teach the computer to read and interpret sheet music and produce a machine-readable version of the written music score. Once captured digitally, the music can be saved in commonly used file formats, e.g. MIDI (for playback) and MusicXML (for page layout). In the past it has, misleadingly, also been called "music optical character recognition". Due to significant differences, this term should no longer be used.[2]
History
[edit]Optical music recognition of printed sheet music started in the late 1960s at the Massachusetts Institute of Technology when the first image scanners became affordable for research institutes.[3][4][5] Due to the limited memory of early computers, the first attempts were limited to only a few measures of music. In 1984, a Japanese research group from Waseda University developed a specialized robot, called WABOT (WAseda roBOT), which was capable of reading the music sheet in front of it and accompanying a singer on an electric organ.[6][7]
Early research in OMR was conducted by Ichiro Fujinaga, Nicholas Carter, Kia Ng, David Bainbridge, and Tim Bell. These researchers developed many of the techniques that are still being used today.
The first commercial OMR application, MIDISCAN (now SmartScore), was released in 1991 by Musitek Corporation.
The availability of smartphones with good cameras and sufficient computational power, paved the way to mobile solutions where the user takes a picture with the smartphone and the device directly processes the image.
Relation to other fields
[edit]Optical music recognition relates to other fields of research, including computer vision, document analysis, and music information retrieval. It is relevant for practicing musicians and composers that could use OMR systems as a means to enter music into the computer and thus ease the process of composing, transcribing, and editing music. In a library, an OMR system could make music scores searchable[8] and for musicologists it would allow to conduct quantitative musicological studies at scale.[9]
OMR vs. OCR
[edit]Optical music recognition has frequently been compared to Optical character recognition.[2][10][11] The biggest difference is that music notation is a featural writing system. This means that while the alphabet consists of well-defined primitives (e.g., stems, noteheads, or flags), it is their configuration – how they are placed and arranged on the staff – that determines the semantics and how it should be interpreted.
The second major distinction is the fact that while an OCR system does not go beyond recognizing letters and words, an OMR system is expected to also recover the semantics of music: The user expects that the vertical position of a note (graphical concept) is being translated into the pitch (musical concept) by applying the rules of music notation. Notice that there is no proper equivalent in text recognition. By analogy, recovering the music from an image of a music sheet can be as challenging as recovering the HTML source code from the screenshot of a website.
The third difference comes from the used character set. Although writing systems like Chinese have extraordinarily complex character sets, the character set of primitives for OMR spans a much greater range of sizes, ranging from tiny elements such as a dot to big elements that potentially span an entire page such as a brace. Some symbols have a nearly unrestricted appearance like slurs, that are only defined as more-or-less smooth curves that may be interrupted anywhere.
Finally, music notation involves ubiquitous two-dimensional spatial relationships, whereas text can be read as a one-dimensional stream of information, once the baseline is established.
Approaches to OMR
[edit]The process of recognizing music scores is typically broken down into smaller steps that are handled with specialized pattern recognition algorithms.
Many competing approaches have been proposed with most of them sharing a pipeline architecture, where each step in this pipeline performs a certain operation, such as detecting and removing staff lines before moving on to the next stage. A common problem with that approach is that errors and artifacts that were made in one stage are propagated through the system and can heavily affect the performance. For example, if the staff line detection stage fails to correctly identify the existence of the music staffs, subsequent steps will probably ignore that region of the image, leading to missing information in the output.
Optical music recognition is frequently underestimated due to the seemingly easy nature of the problem: If provided with a perfect scan of typeset music, the visual recognition can be solved with a sequence of fairly simple algorithms, such as projections and template matching. However, the process gets significantly harder for poor scans or handwritten music, which many systems fail to recognize altogether. And even if all symbols would have been detected perfectly, it is still challenging to recover the musical semantics due to ambiguities and frequent violations of the rules of music notation (see the example of Chopin's Nocturne). Donald Byrd and Jakob Simonsen argue that OMR is difficult because modern music notation is extremely complex.[11]
Donald Byrd also collected a number of interesting examples[12] as well as extreme examples[13] of music notation that demonstrate the sheer complexity of music notation.
Outputs of OMR systems
[edit]Typical applications for OMR systems include the creation of an audible version of the music score (referred to as replayability). A common way to create such a version is by generating a MIDI file, which can be synthesised into an audio file. MIDI files, though, are not capable of storing engraving information (how the notes were laid out) or enharmonic spelling.
If the music scores are recognized with the goal of human readability (referred to as reprintability), the structured encoding has to be recovered, which includes precise information on the layout and engraving. Suitable formats to store this information include MEI and MusicXML.
Apart from those two applications, it might also be interesting to just extract metadata from the image or enable searching. In contrast to the first two applications, a lower level of comprehension of the music score might be sufficient to perform these tasks.
General framework (2001)
[edit]In 2001, David Bainbridge and Tim Bell published their work on the challenges of OMR, where they reviewed previous research and extracted a general framework for OMR.[10] Their framework has been used by many systems developed after 2001. The framework has four distinct stages with a heavy emphasis on the visual detection of objects. They noticed that the reconstruction of the musical semantics was often omitted from published articles because the used operations were specific to the output format.
Refined framework (2012)
[edit]In 2012, Ana Rebelo et al. surveyed techniques for optical music recognition.[14] They categorized the published research and refined the OMR pipeline into the four stages: Preprocessing, Music symbols recognition, Musical notation reconstruction and Final representation construction. This framework became the de facto standard for OMR and is still being used today (although sometimes with slightly different terminology). For each block, they give an overview of techniques that are used to tackle that problem. This publication is the most cited paper on OMR research as of 2019.
Deep learning (since 2016)
[edit]With the advent of deep learning, many computer vision problems have shifted from imperative programming with hand-crafted heuristics and feature engineering towards machine learning. In optical music recognition, the staff processing stage,[15][16] the music object detection stage,[17][18][19][20] as well as the music notation reconstruction stage[21] have seen successful attempts to solve them with deep learning.
Even completely new approaches have been proposed, including solving OMR in an end-to-end fashion with sequence-to-sequence models, that take an image of music scores and directly produce the recognized music in a simplified format.[22][23][24][25]
Notable scientific projects
[edit]Staff removal challenge
[edit]For systems that were developed before 2016, staff detection and removal posed a significant obstacle. A scientific competition was organized to improve the state of the art and advance the field.[26] Due to excellent results and modern techniques that made the staff removal stage obsolete, this competition was discontinued.
However, the freely available CVC-MUSCIMA dataset that was developed for this challenge is still highly relevant for OMR research as it contains 1000 high-quality images of handwritten music scores, transcribed by 50 different musicians. It has been further extended into the MUSCIMA++ dataset, which contains detailed annotations for 140 out of 1000 pages.
SIMSSA
[edit]The Single Interface for Music Score Searching and Analysis project (SIMSSA)[27] is probably the largest project that attempts to teach computers to recognize musical scores and make them accessible. Several sub-projects have already been successfully completed, including the Liber Usualis[28] and Cantus Ultimus.[29]
TROMPA
[edit]Towards Richer Online Music Public-domain Archives (TROMPA) is an international research project, sponsored by the European Union that investigates how to make public-domain digital music resources more accessible.[30]
Datasets
[edit]The development of OMR systems benefits from test datasets of sufficient size and diversity to ensure the system being developed works under various conditions. However, for legal reasons and potential copyright violations, it is challenging to compile and publish such a dataset. The most notable datasets for OMR are referenced and summarized by the OMR Datasets project[31] and include the CVC-MUSCIMA,[32] MUSCIMA++,[33] DeepScores,[34] PrIMuS,[35] HOMUS,[36] and SEILS dataset,[37] as well as the Universal Music Symbol Collection.[38]
French company Newzik took a different approach in the development of its OMR technology Maestria,[39] by using random score generation. Using synthetic data helped with avoiding copyright issues and training the artificial intelligence algorithms on musical cases that rarely occur in actual repertoire, ultimately resulting in (according to claims by the company) more accurate music recognition.[40]
Software
[edit]Academic and open-source software
[edit]Open source OMR projects vary significantly, from well developed software such as Audiveris, to many projects that have been realized in academia, but only a few of which reached a mature state and have been successfully deployed to users. These systems include:
- Aruspix[41]
- Audiveris[42]
- CANTOR[43]
- MusicStaves toolkit for Gamera[44]
- DMOS[45]
- Oemer[46]
- OpenOMR[47]
- Rodan[48]
Commercial software
[edit]Most of the commercial desktop applications that were developed in the last 20 years have been shut down again due to the lack of commercial success, leaving only a few vendors that are still developing, maintaining, and selling OMR products. Some of these products claim extremely high recognition rates with up to 100% accuracy [49][50] but fail to disclose how those numbers were obtained, making it nearly impossible to verify them and compare different OMR systems.
- capella-scan[51]
- FORTE by Forte Notation[52]
- MIDI-Connections Scan by Composing & Arranging Systems[53]
- NoteScan bundled with Nightingale[54]
- Myriad SARL
- PhotoScore by Neuratron[50] The Light version of PhotoScore is used in Sibelius; PhotoScore uses the SharpEye SDK
- Scorscan by npcImaging[57]
- SmartScore by Musitek.[58] Formerly packaged as "MIDISCAN". (SmartScore Lite has been used in previous versions of Finale).
- ScanScore[59] (Also as a bundle with Forte Notation.)[52]
- Soundslice PDF/image importer.[60] AI-based OMR system released in beta in September 2022.[61]
- Maestria by Newzik.[39] Released in May 2021, Maestria is an example of new-generation OMR technology based on deep learning. The company claims it not only brings better results but also means "it becomes more accurate with each conversion".[62]
Mobile apps
[edit]Better cameras and increases in processing power have enabled a range of mobile applications, both on the Google Play Store and the Apple Store. Frequently the focus is on sight-playing (see sight-reading) – converting the sheet music into sound that is played on the device.
- iSeeNotes by Gear Up AB[63]
- NotateMe Now by Neuratron[64]
- Notation Scanner by Song Zhang[65]
- PlayScore 2 by Organum Ltd[66]
- SmartScore NoteReader by Musitek[67]
- Newzik app[68]
See also
[edit]- Music information retrieval (MIR) is the broader problem of retrieving music information from media including music scores and audio.
- Optical character recognition (OCR) is the recognition of text which can be applied to document retrieval, analogously to OMR and MIR. However, a complete OMR system must faithfully represent text that is present in music scores, so OMR is in fact a superset of OCR.[10]
References
[edit]- ^ Pacha, Alexander (2019). Self-Learning Optical Music Recognition (PhD). TU Wien, Austria. doi:10.13140/RG.2.2.18467.40484.
- ^ a b Calvo-Zaragoza, Jorge; Hajič, Jan jr.; Pacha, Alexander (2020). "Understanding Optical Music Recognition". ACM Computing Surveys. 53 (4): 1–35. arXiv:1908.03608. doi:10.1145/3397499. S2CID 199543265.
- ^ Optical Music Recognition for Dummies - Part 2 - Introduction and History. youtube.com. YouTube. October 3, 2018. Archived from the original on December 21, 2021. Retrieved June 24, 2021.
- ^ Pruslin, Dennis Howard (1966). Automatic Recognition of Sheet Music (PhD). Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
- ^ Prerau, David S. (1971). Computer pattern recognition of printed music. Fall Joint Computer Conference. pp. 153–162.
- ^ "WABOT – WAseda roBOT". waseda.ac.jp. Waseda University Humanoid. Retrieved July 14, 2019.
- ^ "Wabot 2". IEEE. IEEE. Retrieved July 14, 2019.
- ^ Laplante, Audrey; Fujinaga, Ichiro (2016). Digitizing Musical Scores: Challenges and Opportunities for Libraries. 3rd International Workshop on Digital Libraries for Musicology. pp. 45–48.
- ^ Hajič, Jan jr.; Kolárová, Marta; Pacha, Alexander; Calvo-Zaragoza, Jorge (2018). How Current Optical Music Recognition Systems Are Becoming Useful for Digital Libraries. 5th International Conference on Digital Libraries for Musicology. Paris, France. pp. 57–61.
- ^ a b c Bainbridge, David; Bell, Tim (2001). "The challenge of optical music recognition". Computers and the Humanities. 35 (2): 95–121. doi:10.1023/A:1002485918032. S2CID 18602074. Retrieved February 23, 2017.
- ^ a b Byrd, Donald; Simonsen, Jakob Grue (2015). "Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images". Journal of New Music Research. 44 (3): 169–195. doi:10.1080/09298215.2015.1045424.
- ^ Byrd, Donald (November 2017). "Gallery of Interesting Music Notation". indiana.edu. Retrieved July 14, 2019.
- ^ Byrd, Donald (October 2018). "Extremes of Conventional Music Notation". indiana.edu. Retrieved July 14, 2019.
- ^ Rebelo, Ana; Fujinaga, Ichiro; Paszkiewicz, Filipe; Marcal, Andre R.S.; Guedes, Carlos; Cardoso, Jamie dos Santos (2012). "Optical music recognition: state-of-the-art and open issues" (PDF). International Journal of Multimedia Information Retrieval. 1 (3): 173–190. doi:10.1007/s13735-012-0004-6. S2CID 12964479. Retrieved July 15, 2019.
- ^ Gallego, Antonio-Javier; Calvo-Zaragoza, Jorge (2017). "Staff-line removal with selectional auto-encoders". Expert Systems with Applications. 89: 138–148. doi:10.1016/j.eswa.2017.07.002. hdl:10045/68971.
- ^ Castellanos, Fancisco J.; Calvo-Zaragoza, Jorge; Vigliensoni, Gabriel; Fujinaga, Ichiro (2018). Document Analysis of Music Score Images with Selectional Auto-Encoders (PDF). 19th International Society for Music Information Retrieval Conference. Paris, France. pp. 256–263. Retrieved July 15, 2019.
- ^ Tuggener, Lukas; Elezi, Ismail; Schmidhuber, Jürgen; Stadelmann, Thilo (2018). Deep Watershed Detector for Music Object Recognition (PDF). 19th International Society for Music Information Retrieval Conference. Paris, France. pp. 271–278. Retrieved July 15, 2019.
- ^ Hajič, Jan jr.; Dorfer, Matthias; Widmer, Gerhard; Pecina, Pavel (2018). Towards Full-Pipeline Handwritten OMR with Musical Symbol Detection by U-Nets (PDF). 19th International Society for Music Information Retrieval Conference. Paris, France. pp. 225–232. Retrieved July 15, 2019.
- ^ Pacha, Alexander; Hajič, Jan jr.; Calvo-Zaragoza, Jorge (2018). "A Baseline for General Music Object Detection with Deep Learning". Applied Sciences. 8 (9): 1488–1508. doi:10.3390/app8091488. hdl:20.500.12708/20052.
- ^ Pacha, Alexander; Choi, Kwon-Young; Coüasnon, Bertrand; Ricquebourg, Yann; Zanibbi, Richard; Eidenberger, Horst (2018). Handwritten Music Object Detection: Open Issues and Baseline Results (PDF). 13th International Workshop on Document Analysis Systems. pp. 163–168. doi:10.1109/DAS.2018.51. Retrieved September 2, 2019.
- ^ Pacha, Alexander; Calvo-Zaragoza, Jorge; Hajič, Jan jr. (2019). Learning Notation Graph Construction for Full-Pipeline Optical Music Recognition (PDF). 20th International Society for Music Information Retrieval Conference. Retrieved July 2, 2023.
- ^ van der Wel, Eelco; Ullrich, Karen (2017). Optical Music Recognition with Convolutional Sequence-to-Sequence Models (PDF). 18th International Society for Music Information Retrieval Conference. Suzhou, China.
- ^ Calvo-Zaragoza, Jorge; Rizo, David (2018). "End-to-End Neural Optical Music Recognition of Monophonic Scores". Applied Sciences. 8 (4): 606. doi:10.3390/app8040606. hdl:10251/143793.
- ^ Baró, Arnau; Riba, Pau; Calvo-Zaragoza, Jorge; Fornés, Alicia (2017). Optical Music Recognition by Recurrent Neural Networks. 14th International Conference on Document Analysis and Recognition. pp. 25–26. doi:10.1109/ICDAR.2017.260.
- ^ Baró, Arnau; Riba, Pau; Calvo-Zaragoza, Jorge; Fornés, Alicia (2019). "From Optical Music Recognition to Handwritten Music Recognition: A baseline". Pattern Recognition Letters. 123: 1–8. Bibcode:2019PaReL.123....1B. doi:10.1016/j.patrec.2019.02.029. hdl:10045/89708. S2CID 127170982.
- ^ Fornés, Alicia; Dutta, Anjan; Gordo, Albert; Lladós, Josep (2013). "The 2012 Music Scores Competitions: Staff Removal and Writer Identification". In Young-Bin Kwon; Jean-Marc Ogier (eds.). Graphics Recognition. New Trends and Challenges. Lecture Notes in Computer Science. Vol. 7423. Springer. pp. 173–186. doi:10.1007/978-3-642-36824-0_17. ISBN 978-3-642-36823-3.
- ^ "Single Interface for Music Score Searching and Analysis project". simssa.ca. McGill University. Retrieved July 14, 2019.
- ^ "Search the Liber Usualis". liber.simssa.ca. McGill University. Retrieved July 14, 2019.
- ^ "Cantus Ultimus". cantus.simssa.ca. McGill University. Retrieved July 14, 2019.
- ^ "Towards Richer Online Music Public-domain Archives". trompamusic.eu. Retrieved July 14, 2019.
- ^ Pacha, Alexander. "Optical Music Recognition Datasets". github.io. Retrieved July 14, 2019.
- ^ Fornés, Alicia; Dutta, Anjan; Gordo, Albert; Lladós, Josep (2012). "CVC-MUSCIMA: A Ground-truth of Handwritten Music Score Images for Writer Identification and Staff Removal". International Journal on Document Analysis and Recognition. 15 (3): 243–251. doi:10.1007/s10032-011-0168-2. S2CID 17946417.
- ^ Hajič, Jan jr.; Pecina, Pavel (2017). The MUSCIMA++ Dataset for Handwritten Optical Music Recognition. 14th International Conference on Document Analysis and Recognition. Kyoto, Japan. pp. 39–46. doi:10.1109/ICDAR.2017.16.
- ^ Tuggener, Lukas; Elezi, Ismail; Schmidhuber, Jürgen; Pelillo, Marcello; Stadelmann, Thilo (2018). DeepScores – A Dataset for Segmentation, Detection and Classification of Tiny Objects. 24th International Conference on Pattern Recognition. Beijing, China. doi:10.21256/zhaw-4255.
- ^ Calvo-Zaragoza, Jorge; Rizo, David (2018). Camera-PrIMuS: Neural End-to-End Optical Music Recognition on Realistic Monophonic Scores (PDF). 19th International Society for Music Information Retrieval Conference. Paris, France. pp. 248–255. Retrieved July 15, 2019.
- ^ Calvo-Zaragoza, Jorge; Oncina, Jose (2014). Recognition of Pen-Based Music Notation: The HOMUS Dataset. 22nd International Conference on Pattern Recognition. pp. 3038–3043. doi:10.1109/ICPR.2014.524.
- ^ Parada-Cabaleiro, Emilia; Batliner, Anton; Baird, Alice; Schuller, Björn (2017). The SEILS Dataset: Symbolically Encoded Scores in Modern-Early Notation for Computational Musicology (PDF). 18th International Society for Music Information Retrieval Conference. Suzhou, China. pp. 575–581. Retrieved August 12, 2020.
- ^ Pacha, Alexander; Eidenberger, Horst (2017). Towards a Universal Music Symbol Classifier. 14th International Conference on Document Analysis and Recognition. Kyoto, Japan. pp. 35–36. doi:10.1109/ICDAR.2017.265.
- ^ a b "Maestria". newzik.com. Newzik. Retrieved June 24, 2021.
- ^ Apprendre le solfège à des algorithmes avec Marie Chupeau, chercheuse en intelligence artificielle. youtube.com (in French). YouTube. June 21, 2021. Archived from the original on December 21, 2021. Retrieved June 24, 2021.
- ^ "Aruspix". aruspix.net. Retrieved July 15, 2019.
- ^ "Audiveris". github.com. Retrieved July 15, 2019.
- ^ "David Bainbridge (Home Page)". waikato.ac.nz. Retrieved July 15, 2019.
- ^ "Gamera Addon: MusicStaves Toolkit". hsnr.de. Retrieved July 15, 2019.
- ^ Coüasnon, Bertrand (2001). DMOS: a generic document recognition method, application to an automatic generator of musical scores, mathematical formulae and table structures recognition systems. Sixth International Conference on Document Analysis and Recognition. pp. 215–220. doi:10.1109/ICDAR.2001.953786.
- ^ "oemer: End-to-end Optical Music Recognition (OMR) system". github.com. Retrieved September 21, 2023.
- ^ "OpenOMR". sourceforge.net. April 10, 2013. Retrieved January 26, 2017.
- ^ "Rodan". github.com. Retrieved July 15, 2019.
- ^ "Eighth rest or smudge". capella-software.com. capella-software AG. Retrieved July 15, 2019.
- ^ a b "PhotoScore & NotateMe Ultimate". neuratron.com. Neuratron. Retrieved June 24, 2021.
- ^ "capella-scan". capella-software.com. capella-software AG. Retrieved June 24, 2021.
- ^ a b "FORTE 12 Premium Edition". fortenotation.com. Forte Notation. Retrieved June 24, 2021.
- ^ "MIDI-Connections SCAN 2.1". midi-connections.com. Composing & Arranging Systems. Retrieved June 24, 2021.
- ^ "Nightingale". ngale.com. Adept Music Notation Solutions. January 11, 2008. Retrieved March 30, 2021.
- ^ "OMeR". myriad-online.com. Myriad SARL. Retrieved October 6, 2013.
- ^ "PDFtoMusic Pro". myriad-online.com. Myriad SARL. Retrieved November 13, 2015.
- ^ "ScorScan information". npcimaging.com. NPC Imaging. Retrieved October 6, 2013.
- ^ "SmartScore". musitek.com. Musitek. Retrieved June 24, 2021.
- ^ "ScanScore". scan-score.com. SCANSCORE. Retrieved November 24, 2019.
- ^ "Soundslice sheet music scanner". soundslice.com. Soundslice. Retrieved December 17, 2022.
- ^ "Soundslice PDF and photo scanning (beta)". soundslice.com. Soundslice. Retrieved December 17, 2022.
- ^ Rothman, Philip (May 26, 2021). "Newzik introduces interactive LiveScores with Maestria, AI-based optical music recognition". scoringnotes.com. Scoring Note. Retrieved June 24, 2021.
- ^ "iSeeNotes". iseenotes.com. Geer Up AB. Retrieved June 24, 2021.
- ^ "NotateMe". neuratron.com. Neuratron. Retrieved June 24, 2021.
- ^ "Notation Scanner". apps.apple.com. Apple Inc. March 23, 2020. Retrieved June 24, 2021.
- ^ "PlayScore 2". playscore.co. PlayScore. Retrieved June 24, 2021.
- ^ "SmartScore NoteReader". play.google.com. Retrieved June 24, 2021.
- ^ "Newzik app". newzik.com. Newzik. Retrieved June 24, 2021.
External links
[edit]- Recording of the ISMIR 2018 tutorial "Optical Music Recognition for Dummies"
- Optical Music Recognition (OMR): Programs and scientific papers
- OMR (Optical Music Recognition) Systems: Comprehensive table of OMR (Last updated: 30 January 2007).
Media related to Optical music recognition at Wikimedia Commons