Skip to main content
eScholarship
Open Access Publications from the University of California

Department of Linguistics

Proposals from the Script Encoding Initiative  bannerUC Berkeley

The “Proposals from the Script Encoding Initiative” series contains documents that propose scripts and characters for inclusion in the international standard Unicode. All documents have received technical review by the Unicode Technical Committee, and were funded (wholly or in part) by the Script Encoding Initiative in the Department of Linguistics at UC Berkeley or by its NEH-sponsored counterpart, the Universal Scripts Project.

Most of the scripts and characters in these documents have been published in the Unicode Standard. Occasionally the names, glyphs, or code points have been changed from the proposal document when published in the Unicode Standard. Comments or corrections to the contents of the proposals should be sent to the proposal author, who can be contacted through Deborah Anderson in the Dept. of Linguistics.

Consideration of the encoding of Garay with updated user feedback (revised)

(2022)

This is a proposal to add the Garay script to the international character encoding standard Unicode. The script was created by Assane Faye in Senegal and published in 1961. The script is used to write the Wolof language (ISO 639-3: wol).  This proposal built off an earlier proposal by Michael Everson, which includes additional examples (see: https://unicode.org/L2/L2016/16069-n4709-garay-revision.pdf)

Financial support was provided by the National Endowment for the Humanities (PR-253360-17 and PR-268710-20) for the Universal Scripts Project, part of the Script Encoding Initiative at UC Berkeley. The Unicode Consortium hosts the document registry that contains this proposal.

The script was approved by the Unicode Technical Committee in April 2022 and will appear in a future version of the Unicode Standard.

Proposal to encode the Sunuwar script in Unicode

(2021)

This is a proposal to add the Sunuwar script to the international character encoding standard Unicode.  The script is used to write the Sunuwar (or Kõinch) language (ISO 639:3: suz) of Nepal and Sikkim, India. It was developed by Krishna Bahadur Jentich in the 1940s.

Financial support was provided by the National Endowment for the Humanities (PR-268710-20) for the Universal Scripts Project, part of the Script Encoding Initiative at UC Berkeley. Additional support came from the Translation Commons project. The Unicode Consortium hosts the document registry that contains this proposal.

The script was approved by the Unicode Technical Committee in January 2022 and will appear in a future version of the Unicode Standard.

Note: There is a known error in the code chart: 11BD2 and 11BDC are swapped in the chart on page 14.

Proposal to add the Tangsa Script in the SMP

(2021)

This is a proposal to add the Tangsa script to the international encoding standard Unicode. The Tangsa script (ISO 15924: Tnsa) is used for writing the Tangsa languages (ISO 639-3: nst), which are spoken in Arunachal Pradesh, India and Sagaing Region of Myanmar. The script was created in 1990 by Mr. Lakhum Mossang. 

The characters were published in Unicode Standard version 14.0 in September 2021.

The proposal was written by Stephen Morey, based on an earlier version by Anshuman Pandey. Deborah Anderson assisted on the Unicode side;  she had financial support from the National Endowment for the Humanities for the Universal Scripts Project (PR-253360-17), part of the Script Encoding Initiative at UC Berkeley.

Proposal for encoding the Todhri script in the SMP of the UCS

(2020)

This is a proposal to add the Todhri script to the international character encoding standard Unicode. Todhri was used in Elbasan, a region in central Albania, to write the Albanian language in the 18th and 19th centuries, and sporadically into the 20th century. The script is said to have been created by Dhaskal Todhri, whose full name was Theodor Haxhifilipi.

Financial support was provided by the National Endowment for the Humanities (PR-253360-17 and PR -268710-20) for the Universal Scripts Project, part of the Script Encoding Initiative at UC Berkeley. The Unicode Consortium hosts the document registry that contains this proposal.

This proposal was approved by the Unicode Technical Committee in April 2022 for a future version of the Unicode Standard. 

There is one change to the proposal involving the decomposition of two characters, but the basic character set remains the same. 

Final proposal to encode Old Uyghur

(2020)

This is a proposal to encode the Old Uyghur script in the international standard Unicode. The script was published in the Unicode Standard version 14.0 in September 2021. The Old Uyghur script (ISO 15924: OUgr)  flourished between 8c and 17c CE. Though originally used to write medieval Turkic languages, it later was used for writing languages as Chinese, Mongolian, Tibetan, and Arabic. 

Financial support was provided by the National Endowment for the Humanities for the Universal Scripts Project (PR-253360-17), part of the Script Encoding Initiative at UC Berkeley, and the Unicode Adopt-a-Character program. The Unicode Consortium hosts the document registry that contains this proposal. 

Proposal for encoding the Vithkuqi script in the SMP of the UCS

(2020)

This is a proposal to add the Vithkuqi script (sometimes referred to as Veqilharxhi, Büthakukye, or Beitha Kukju) to the international character encoding standad Unicode. The script was devised in the period between 1824 to 1845, but didn't take hold in the latter part of the 19c. However, there are efforts in the 21st century to revive the script for artistic and cultural purposes.  The Vithkuqi script (ISO 15924: Vith) was used to write the Albanian language (ISO 639-3: sqi). 

The script was published in Unicode Standard version 14.0 in September 2021. Note: A number of the recently invented characters in this proposal have not been approved yet. 

Financial support was provided by the National Endowment for the Humanities (PR-253360-17) for the Universal Scripts Project, part of the Script Encoding Initiative at UC Berkeley. The Unicode Consortium hosts the document registry that contains this proposal.

Cover page of Medieval Latin Character Recommendations

Medieval Latin Character Recommendations

(2020)

This is a document that provides recommendations on which Unicode characters to use for various letters, symbols, and numbers in Medieval Latin. The document also includes characters which require more information before final recommendations can be made, with directions on how experts can submit feedback.

Final proposal to encode the Cypro-Minoan script in the SMP (WG2 N5135)

(2020)

This is a proposal to get the Cypro-Minoan script into the international character encoding standard Unicode. The Cypro-Minoan script is an undeciphered syllabary used on Cyprus and surrounding areas during the Late Bronze Age (ca. 1550-1050 BCE).  

The script was published in Unicode Standard version 14.0 in September 2021. Note: An updated chart (31 Dec. 2020) with one additional character and the accepted code points is located at: https://www.unicode.org/L2/L2020/20156r-n5137r-cyprominoan-font.pdf. The repertoire in this latter document should be used as background on the final repertoire, shown now at https://www.unicode.org/charts/PDF/U12F90.pdf.

Financial support was provided by the National Endowment for the Humanities (PR-253360-17) for the Universal Scripts Project, part of the Script Encoding Initiative at UC Berkeley. The Unicode Consortium hosts the document registry that contains this proposal.

Cover page of Arabic additions for Quranic orthographies  

Arabic additions for Quranic orthographies  

(2019)

This is a proposal to add 39 Arabic Quranic characters to the international character encoding standard Unicode. The characters are used to represent text in minority orthographies, and many were contained in earlier documents submitted by others (cited on page 1 of this proposal).The characters are scheduled to be published in Unicode Standard version 14.0 in September 2021. A few modifications have been made to the names or location of characters in Unicode, so users should check the code charts when the Unicode Standard is published. The charts will be accessible at: https://www.unicode.org/charts/.

Proposal for encoding the Toto script in the SMP of the UCS

(2019)

This is a proposal to encode the Toto script in the international character encoding standard Unicode. The script was published in Unicode Standard version 14.0 in September 2021. The script is used to write the Toto language used in a village in India near Bhutan.  The ISO 15924 code is Toto.