RESOURCES FOR TEXT ANALYSIS

WEB-BASED AND DOWNLOADABLE CONCORDANCERS

  • AntConc

A multi-purpose corpus analysis toolkit designed for conducting corpus linguistics research and data-driven learning.

https://www.laurenceanthony.net/software/antconc/

  • Key-BNC: School of Liberal Arts, King Mongkut’s University of Technology Thonburi

A program for quickly performing basic keyword analyses of a corpus compared to the BNC.

Online version https://key-bnc.tfiaa.com/

Offline version http://crs2.kmutt.ac.th/Key-BNC/

  • Corpus-Based Engineering English Materials: School of Liberal Arts, King Mongkut’s University of Technology Thonburi

An interactive website with activities for students learning English for engineering. Lots of fun and challenging activities.

http://crs2.kmutt.ac.th/ceem

  • Sketch Engine

A tool designed for text analysis or text mining applications.

http://www.sketchengine.eu/tools-for-text-analysis/

  • Wmatrix

A corpus analysis tool with a web interface to the English USAS and CLAWS corpus annotation tools, and standard corpus linguistic methodologies such as frequency lists and concordances.

http://ucrel.lancs.ac.uk/wmatrix/

  • English-Corpora.org

A web portal giving access to a variety of widely used corpora.

https://www.english-corpora.org/corpora.asp

  • #LancsBox: Lancaster University corpus toolbox

#LancsBox is a software package for the analysis of language data and corpora developed at Lancaster University.

http://corpora.lancs.ac.uk/lancsbox/

ANNOTATION TOOLS

  • CLAWS part-of-speech tagger for English

http://ucrel.lancs.ac.uk/claws/

  • UCREL Semantic Analysis System (USAS)

http://ucrel.lancs.ac.uk/usas/

OTHER RESOURCES

The following websites offer comprehensive lists of useful tools for text linguistics and corpus linguistics.

  • All About Corpora

https://allaboutcorpora.com/corpus-software-2

  • Bodleian Libraries, University of Oxford

http://ox.libguides.com/c.php?g=422982&p=2888571

  • Corpus-analysis.com

https://corpus-analysis.com/

  • Illinois Library, University of Illinois

https://guides.library.illinois.edu/c.php?g=405110&p=5804542

  • Laurence Anthony’s Website

http://www.laurenceanthony.net/software.html

  • School of Humanities and Sciences, Stanford University

https://linguistics.stanford.edu/resourcescorpora/corpus-tools

  • W3-Corpora Project, University of Essex

https://www1.essex.ac.uk/linguistics/external/clmt/w3c/corpus_ling/content/software.html

THAI LANGUAGE ANALYSIS TOOLS

THAI WORD SEGMENTATION PROGRAMS

  • LexTo

http://www.sansarn.com/lexto/

  • TLex

http://www.sansarn.com/tlex/

  • LexTo+ (Dictionary based, Longest matching)

https://aiforthai.in.th/service_bn.php

  • TLex+ (Machine learning, Conditional Random Fields)

https://aiforthai.in.th/service_bn.php

  • Thai word segmentation

http://161.200.50.2/wordsegment

  • Thai syllable segmentation

http://161.200.50.2/sylsegment               

  • Thai chunks

http://161.200.50.2/chunk

CORPUS

  • LST20 Corpus

A corpus of 3,164,002 words for Thai language processing developed by National Electronics and Computer Technology Center (NECTEC), Thailand. It offers five layers of linguistic annotation: word boundaries, POS tagging, named entities, clause boundaries, and sentence boundaries.

https://aiat.or.th/lst20-corpus/

  • Thai National Corpus

A general corpus of 14 million words which is designed to be comparable to the British National Corpus. This corpus is created by Department of Linguistics, Faculty of Arts, Chulalongkorn University.

http://www.arts.chula.ac.th/~ling/tnc3/

CONCORDANCE

http://www.arts.chula.ac.th/~ling/ThaiConc/

by Wirote Aroonmanakun, Chulalongkon University

SENTIMENT ANALYSIS TOOLS

  • Sentiment Analysis

https://aiforthai.in.th/service_sa.php

by NECTEC

  • S-Sense: Social sensing

http://pop.ssense.in.th/

WORDLIST

http://www.arts.chula.ac.th/ling/tnc/searchtnc/

POS TAGGERS

  • TLex++

(The users can use TLex++ to segment Thai text and tag each word part of speech proceeded by Machine learning techniques with Conditional Random Fields algorithm.)

https://aiforthai.in.th/service_bn.php

  • Thai POS Tagging

http://161.200.50.2/postag

SPEECH TO TEXT

Partii (Its service is to convert speech sounds into text.)

https://aiforthai.in.th/service_st.php

by NECTEC

OTHER RESOURCES FOR THAI LANGUAGE ANALYSIS

http://thainlp.wannaphong.com/p/corpus.html

https://thailang.nectec.or.th/archive/indexdca0.html?q=node/21

https://aiforthai.in.th/index.php

https://saki.siit.tu.ac.th/thainlp/

https://www.hawaii.edu/thai/tech.htm

https://saki.siit.tu.ac.th/kindml/thainest/index.php