I’ve been at my internship for almost two weeks now, but before I talk about what I’ve learned there, I’ll mention a few things I found and did before it.
First of all, I wanted to start learning C++ because it might come in handy, so I asked people for suggestions on where to do so, which led me to this awesome site called SoloLearn. They have several different programming language courses that are awesome and really easy to follow. I have since completed their Python course and paused on learning C++ in favor of doing other things. They also have a bunch of phone apps so you can learn on the go! (At least they do for Androids; not sure about iPhones.)
Another thing I found is Big Data University from IBM. They have a bunch of courses and course sequences related to Data Science, Big Data, R, Hadoop, Spark, and lots of other relevant things. I finished a few intro Data Science classes before work started, and hopefully I’ll have time to go through a few more before the summer is over. The only gripe I have about these classes is that the ones I have joined don’t let you speed up the videos and the instructors speak really slowly, but otherwise they’re great!
Also, if anyone is interested, I have been studying for the GRE (test to get into graduate schools) on an app called Prep4GRE and it’s pretty good. I like that whenever I study the flashcards or go through problems, it apparently doesn’t need an internet connection because I guess it loaded things ahead of time? That means I can study on my commute through tunnels and subways and stuff! Efficiency!
Now about my job. It’s been pretty interesting; I’ll be working partly on some NLP stuff and partly on some machine learning stuff, which is entirely great. I have already learned several things:
TutorialsPoint is absolutely amazing. I have learned about PL/SQL, Spring, and RESTful Web Services from there already, and I will definitely continue going to them in the future! The tutorials are very easy to follow and provide a lot of examples.
I also finished about a quarter of this awesome tutorial on Flask for Python, and it’s been very interesting. I have a tiny bit of prior experience with Django, so it was interesting to see the differences. I think Flask may come in handy next semester.
In brushing up on some of my machine learning knowledge, I went through this explanation of kd-Trees and priority queues for approximate K-Nearest Neighbors classification. I also learned a bit about Random Forests and Gradient Boosted Regression Trees, but I’m not very well versed on them yet.
I am going to try to write blog posts more often so that maybe that motivates me to do and learn more things outside work.
The semester is officially over! Time for summer break! Gonna relax for like two weeks, and then I start work and studying for the GRE and continue my Coursera classes and anything else I decide to divide my time into…
So a few days ago, I finally finished and turned in my final project for my NLP class. It was a poetry translator, meaning that if you input a prose sentence, it will output a version of that sentence translated to be more poetic. I used Dependency Parsing and a K-Nearest Neighbors classifier for this. I also used the Stanford Lexicalized Parser to create my training and test data sets. The code for all of this is on GitHub in the Poetify repository, which you can get to with a link on the right sidebar of this site. In the coming days, I’ll update the readme for it, and update this site with all that info too.
Lots of things to do as the semester draws to a close. Up until last Thursday, my group mates and I were working almost nonstop on our databases project. It was to design a rather rudimentary website for stock trading, with emphasis on the database side and the backend that handled it. We had to learn how to use JSP and JDBC from scratch, since the class didn’t really teach any of that. But we got it done, and I think we got almost a full score, so that’s good. I also took my final exam in that class yesterday, so I’m done with databases for now!
And now I move on to operating systems and NLP. My OS final exam is on Friday, and my NLP final project presentation is next Tuesday. The OS exam will be very difficult, especially since I have figured out that OS is not exactly my jam. And the NLP project won’t be a walk in the park either; I don’t have a partner or group on this one, so I’m left entirely to my own devices.
The NLP project is interesting, though. I went with a project I am calling “Poetify,” and the goal of which is to take in an input sentence and move the words and phrases around to make it sound more poetic. I’m using Python 3, some POS tagging, and a dependency parser based on a PCFG. I’m also using the Stanford Dependency Parser to create my training and testing data. I only have a few days left to work on it, so I best get cracking.
About a week ago, I finished my third homework assignment for my NLP class, which was to implement an application to do PCFG Parsing on sentences. You start with a training file of trees of sentences and use that to train your grammar. Then you write an application to take that grammar and a file of test sentences, and parse each of those sentences. It looks like my implementation works correctly, though the accuracy is only at around 70%.
This was a pretty interesting assignment, and I will probably be using it for my final project for this class. Speaking of which, my class project will serve as a preliminary venture into poetry generation territory, which is what I will do for my thesis project. For this class, the idea is to build something that will take a sentence as input, and “poetify” it, moving sentence parts around so that it sounds more like poetry.
I just finished all of the lessons for the R Programming class on Coursera. The last two weeks that I had left taught about various *apply functions, the split function, how to debug, how to simulate data, and how to use R Profiler.
I found the swirl exercises to be the best way to learn all of these things, and I think that the programming assignments were also really good to make you think about how to go about things. Unfortunately, I didn’t pay for the class, so the fact that I completed it won’t be acknowledged. But I’m all set to take the next class in the sequence as soon as it goes live.
In studying for my exam last week, I realized that I was actually very thankful that the textbook for this class is really awesome. I think it explains topics very well, and I highly recommend it. It is: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition by D. Jurafsky & James H. Martin. I’m using the second edition.
I also marked some of the topics I studied as interesting and that I should go back to them later:
I turned in my Part of Speech Tagging homework a couple days ago. I didn’t get to finish it entirely, as I spent a lot of my time trying to debug my taggers, so I didn’t implement any unknown word heuristics as we were supposed to. It’s a shame because I found that part of the assignment most interesting. I was hoping to do something with verb endings (-ed, -ing), popular prefixes and suffixes (-tion, de-, inter-), and try to possibly do something with word roots (such as “dict” in dictionary and contradictory).
I eventually fixed one issue I was having: turns out I was defining my emissions matrices wrong. However, I still couldn’t figure out what was wrong with my implementation of the Viterbi Algorithm. Hopefully, I’ll have time at some point to go back to this and fix it all. I probably won’t for a little while, as I have a bunch of exams coming up all at once and I have to catch up on my Coursera classes.