Data Analytics

Text Analytics Minor

The Text Analytics Minor is directed at matriculated, undergraduate students looking to enhance their resumes after graduation.

We are offering the Text Analytics Minor as an assessment tool to prospective employers. Individuals who complete the Minor can employ their skills in a variety of areas:

  • in Speech Recognition and Machine Translation
  • in various applications involving text-mining, text-classification, or text-filtering such as sentiment analysis of product reviews or microblogs, customer service support analysis,  SPAM detection, and litigation support
  • in Information Retrieval and Information Extraction

The minor in text analytics consists of a minimum of 15 units to include

  • Linguistics 571 (3 units) or Linguistics 572 (3 units)
  • Statistics 550 (3 units) or Statistics 551A (3 units) 
  • Linguistics 581 (3 units) or Computer Science 581 (3 units)
  • Linguistics 583 (3 units)
  • One of the following:
    • Linguistics 551 (3 units)
    • Biology 568 (3 units)
    • Computer Science 550 (3 units)
    • Statistics 520 (3 units)

For Statistics 550 and 551A, students must satisfy lower division calculus and linear algebra prerequisites (Mathematics 151, 252, and 254).

Courses in the minor may not be counted toward the major. A minimum of six upper division units must be completed in residence at San Diego State University.

Linguistics 551: Sociolinguistics (3 units)
Prerequisite: A course in introductory linguistics.
Investigation of the correlation of social structure and linguistic behavior.

Linguistics 571: Computational Corpus Linguistics (3 units)
Prerequisite: Upper division standing.
Practical introduction to computation with text corpora and introduction to Python. Tokenizing, part-of-speech tagging, and lemmatizing (stemming) large corpora. Writing of Python programs required.

Linguistics 572: Python Scripting for Social Science (3 units) (Same course as Big Data Analytics 572)
Prerequisite: Upper division or graduate standing.
Python scripting for social science data. Statements and expressions. Strings, lists, dictionaries, files. Python with unformatted data (regular expressions). Graphs and social networks. Spatial data and simple GIS scripts. 

Linguistics 581: Computational Linguistics (3 units) (Same course as Computer Science 581)
Prerequisite: Linguistics 571 or Linguistics 572 or Big Data Analytics 572 or Computer Science 320.
Basic concepts in computational linguistics including regular expressions, finite-state automata, finite-state transducers, weighted finite-state automata, and n-gram language models. Applications to phonology, orthography, morphology, syntax. Probabilistic models. Statistical techniques for speech recognition.

Linguistics 583: Statistical Methods in Text Analysis (3 units)
Prerequisites: Linguistics 571 or Linguistics 572 or Big Data Analytics 572; and Statistics 550 or Statistics 551A.
Statistical methods for analysis of large texts to include Bayesian classifiers, Markov models, maximum entropy models, neural nets, and support vector machines. Data collection and annotation. Applications to annotation, relation detection, sentiment analysis, and topic modeling.

Biology 568: Bioinformatics (3 units) (Same as Bioinformatics and Medical Informatics 568)
Two lectures and three hours of laboratory.
Prerequisite: Biology 366.
Bioinformatics analysis methods and programming skills. Practical bioinformatic software for sequence analysis, bioinformatic algorithms and programming fundamentals.
Note: Writing Requirement for Undergraduates: Completion of the Graduation Writing Assessment Requirement or the eligibility to enroll in an upper division writing course is a prerequisite for all upper division biology courses numbered 450 and above.

Computer Science 550: Artificial Intelligence (3 units)
Prerequisites: Computer Science 210 and either Mathematics 245 or Mathematics 523.
Heuristic approaches to problem solving. Systematic methods of search of the problem state space. Theorem proving by machine. Resolution principle and its applications.

Statistics 520: Applied Multivariate Analysis (3 units)
Prerequisite: Statistics 350B or comparable course in statistics.
Multivariate normal distribution, multivariate analysis of variance, principal components, factor analysis, discriminant function analysis, classification, and clustering. Statistical software packages will be used for data analysis.

Statistics 550: Applied Probability (3 units)
Prerequisites: Mathematics 151 and Mathematics 254.
Computation of probabilities via enumeration and simulation, discrete and continuous distributions, moments of random variables. Markov chains, counting and queuing processes, and selected topics.

Statistics 551A: Probability and Mathematical Statistics (3 units)
Prerequisite: Mathematics 252.
Discrete and continuous random variables, probability mass functions and density functions, conditional probability and Bayes’ theorem, moments, properties of expectation and variance, joint and marginal distributions, functions of random variables, moment generating functions. Special distributions and sampling distributions.

Contact Us

Mark Gawron, Program Advisor
Email: [email protected] | Office: SHW 238


Important Links

SDSU Catalog | Class Schedule