Deciding to break away from the usual, I attended the Computational Neuroscience conference COSYNE 2013 in Salt Lake City, UT. The take-aways that were the most compelling for me:
http://www.github.com/matti-kariluoma/LFManip/tree/master/html
Here, a version of LFmanip that doesn’t require an executable! As described in the Intel proposal.
Code: http://github.com/matti-kariluoma-ndsu/wheat-dissem/
Live site: http://www.ag.ndsu.edu/varietyselectiontool/
Contributors.
Sometime during March 2011, I was asked if I could so some work on a web view of NDSU’s varietal trial publications, which are currently sent out yearly to farmers and available as PDFs. I threw together a demo of search results with some jQuery polish using static data, and we were off!
Like all great projects, we needed some data entry. We hired a student to do this data entry (copying rows of numbers from PDFs), bullet dodged. Now that we were actually using a database, I tried to rope in our department’s systems administrator, but he wisely resisted. He did highly recommend the Django project (Python), and that’s how our framework was chosen. Since we couldn’t get the sysadmin, we hired an additional web developer, and later an artist.
After some steady progess, and talks with the administrators of www.ag.ndsu.edu, we were ready to put the site up live. By this point, the other developers had finished their tasks and gone back to school, and I was doing the work during my off-hours.
After a bit, we decided to tweak the search results a bit, but found the legacy code (only 1-year old!) to be getting in the way. Since summer (2012) was approaching, we decided to hire a handful of undergraduates to help with the polish and the rewrite of the search result backends.
With the fall semester underway, we’re back to a sole developer, and performing a few bug fixes now and again, with the exception of a new method for calculating our LSD results, using R behind-the-scenes as we orignally intended… turns out the Python LSD I wrote wasn’t as robust as we’d like.
Publications:
| Download: | Presentation (600K) |
| LaTeX Source Files (300K) |
I presented the paper “Comparing forgetting algorithms for artificial episodic memory systems” (citation below) during my seminar on intelligent systems.
In the paper, the authors describe a simulation where an agent must remember transaction histories in order to perform well, and such memory space is limited. They then compare a number of processes for forgetting (deleting) previous transactions in order to maximize the agent’s performance.
Nuxoll, Andrew, Dan Tecuci, Wan Ching Ho, and Ningxuan Wang. “Comparing forgetting algorithms for artificial episodic memory systems.” In Proc. of the Symposium on Human Memory for Artificial Agents. AISB, pp. 14-20. 2010.
Original Paper
AISB
| Download: | Proposal (6M) |
| LaTeX Source (300K) |
Intel passed on this one, ‘s shame. I planned to eventually implement most of what’s in there (minus the kinect bit) on my own time, and this would have been a great stimulus! Oh well, file it away.
http://github.com/matti-kariluoma-ndsu/wheat-dissem/blob/master/doc/LSD.html
I wanted to try out the calculations for the Python LSD implemetation in a browser instead, and was frustrated by the lack of an erfinv function! So, I wrote one:
// Matti Kariluoma May 2012// http://stackoverflow.com/questions/5971830/need-code-for-inverse-error-function function erfc(x) { z = Math.abs(x) t = 1.0 / (0.5 * z + 1.0) a1 = t * 0.17087277 + -0.82215223 a2 = t * a1 + 1.48851587 a3 = t * a2 + -1.13520398 a4 = t * a3 + 0.27886807 a5 = t * a4 + -0.18628806 a6 = t * a5 + 0.09678418 a7 = t * a6 + 0.37409196 a8 = t * a7 + 1.00002368 a9 = t * a8 a10 = -z * z - 1.26551223 + a9 a = t * Math.exp(a10) if (x < 0.0) { a = 2.0 - a } return a } function erf(x) { return 1.0 - erfc(x) } function erfinv(y) { if (y < -1.0 ||y > 1.0) { alert("input out of range!") return 0 } if (y == -1.0) { x = Number.POSITIVE_INFINITY } else if (y == 1.0) { x = Number.NEGATIVE_INFINITY } else if (y < -0.7) { z1 = (1.0 + y) / 2.0 z2 = Math.Ln(z1) z3 = Math.sqrt(-z2) z = z3 x1 = 1.641345311 * z + 3.429567803 x2 = x1 * z + -1.624906493 x3 = x2 * z + -1.970840454 x4 = 1.637067800 * z + 3.543889200 x5 = x4 * z + 1.0 x6 = -x3 / x5 // note: negative x = x6 } else if (y < 0.7) { z = y * y x1 = -0.140543331 * z + 0.914624893 x2 = x1 * z + -1.645349621 x3 = x2 * z + 0.886226899 x4 = 0.012229801 * z + -0.329097515 x5 = x4 * z + -0.329097515 x6 = x5 * z + 1.442710462 x7 = x6 * z + -2.118377725 x8 = x7 * z + 1.0 x9 = y * x3 / x8 x = x9 } else { z1 = (1.0 + y) / 2.0 z2 = Math.Ln(z1) z3 = Math.sqrt(-z2) z = z3 x1 = 1.641345311 * z + 3.429567803 x2 = x1 * z + -1.624906493 x3 = x2 * z + -1.970840454 x4 = 1.637067800 * z + 3.543889200 x5 = x4 * z + 1.0 x6 = x3 / x5 // note: positive x = x6 } x = x - (erf(x) - y) / (2.0/Math.sqrt(pi) * Math.exp(-x*x)); x = x - (erf(x) - y) / (2.0/math.sqrt(pi) * Math.exp(-x*x)); return x }
| Download: | Report (1M) |
| Code (6K) | |
| IJITCS Paper (300K) |
Update 21 Apr 2013: We’ve been published in the International Journal of Information Technology and Computer Science (IJITCS)!
Juan Li, Justin Anderson, Matti Kariluoma, Kailash Joshi, Prateek Rajan, Pubudu Wijeyaratne,”iText – SMS-based Web Services for Low-end Cell Phones”, IJITCS, vol.5, no.5, pp.22-28, 2013.DOI: 10.5815/ijitcs.2013.05.03
Paper
IJITCS
Also available on github : http://www.github.com/matti-kariluoma/sms-plusplus
For my computer networks class, I convinced our group to do a project involving the SMS protocol. We agreed to do some sort of SMS service, where you’d message a particular number or email address (not obvious that you can message an email address!) using a small command language, and access web services such as email, calender, etc.
We ended up with web search and email (through google’s OpenAuth and a registration website we cooked up), and managed to do a demo live in class during our report. We used a Nokia N900 and two webservers, which aren’t live anymore unfortunately, shit’s expensive.
| Download: | Presentation (300K) |
| LaTeX Source Files (60K) |
I presented the paper “Empirical evidence for the existence and uses of metacognition in computer science problem solving”, citation below. The paper details the result of a small (n=10) experiment into the methods that computer science undergraduates use during domain problem solving.
Parham, Jennifer, Leo Gugerty, and D. E. Stevenson. “Empirical evidence for the existence and uses of metacognition in computer science problem solving.” In Proceedings of the 41st ACM technical symposium on Computer science education, pp. 416-420. ACM, 2010.
Paper
ACM
http://www.github.com/matti-kariluoma-ndsu/wheat-dissem/blob/kariluoma/doc/LSD.py
The Least Significant Difference (LSD) is a measure of how much of a difference between means must be observed before one can say the means are significantly different. e.g. the means
are not significantly different from one another is their LSD is 1.0, but they are significantly different if their LSD is 0.1.
#!/usr/bin/python # coding=utf8 # # Python routines to caclulate the LSD of a balanced data set. # # Taken line-by-line from the agricolae project # (http://cran.r-project.org/web/packages/agricolae/index.html , # http://tarwi.lamolina.edu.pe/~fmendiburu) for R # (http://r-project.org) # # Matti Kariluoma Sep 2011from math import sqrt, pi, cos, sin, exp from scipy.special import erfinv def qnorm(probability): """ A reimplementation of R's qnorm() function. This function calculates the quantile function of the normal distributition. (http://en.wikipedia.org/wiki/Normal_distribution#Quantile_function) Required is the erfinv() function, the inverse error function. (http://en.wikipedia.org/wiki/Error_function#Inverse_function) """ if probability > 1 or probability <= 0: raise BaseException # TODO: raise a standard/helpful error else: print (2*probability - 1) print erfinv(2*probability - 1) return sqrt(2) * erfinv(2*probability - 1) def qt(probability, degrees_of_freedom): """ A reimplementation of R's qt() function. This function calculates the quantile function of the student's t distribution. (http://en.wikipedia.org/wiki/Quantile_function#The_Student.27s_t-distribution) This algorithm has been taken (line-by-line) from Hill, G. W. (1970) Algorithm 396: Student's t-quantiles. Communications of the ACM, 13(10), 619-620. Currently unimplemented are the improvements to Algorithm 396 from Hill, G. W. (1981) Remark on Algorithm 396, ACM Transactions on Mathematical Software, 7, 250-1. """ n = degrees_of_freedom P = probability t = 0 if (n < 1 or P > 1.0 or P <= 0.0 ): raise BaseException #TODO: raise a standard/helpful error elif (n == 2): t = sqrt(2.0/(P*(2.0-P)) - 2.0) elif (n == 1): P = P * pi/2 t = cos(P)/sin(P) else: a = 1.0/(n-0.5) b = 48.0/(a**2.0) c = (((20700.0*a)/b - 98.0)*a - 16.0)*a + 96.36 d = ((94.5/(b+c) - 3.0)/b + 1.0)*sqrt((a*pi)/2.0)*float(n) x = d*P y = x**(2.0/float(n)) if (y > 0.05 + a): x = qnorm(P*0.5) y = x**2.0 if (n < 5): c = c + 0.3*(float(n)-4.5)*(x+0.6) #c = (((0.05*d*x-5.0)*x-7.0)*x-2.0)*x+b+c c1 = (0.05*d*x) - 5.0 c2 = c1*x - 7.0 c3 = c2*x - 2.0 c4 = c3*x + b + c c = c4 #y = (((((0.4*y+6.3)*y+36.0)*y+94.5)/c-y-3.0)/b+1.0)*x y1 = (0.4*y+6.3)*y + 36.0 y2 = y1*y + 94.5 y3 = y2/c - y - 3.0 y4 = y3/b + 1.0 y5 = y4*x y = y5 y = a*(y**2.0) if (y > 0.002): y = exp(y) - 1.0 else: y = 0.5*(y**2.0) + y else: #y = ((1.0/(((float(n)+6.0)/(float(n)*y)-0.089*d-0.822)*(float(n)+2.0)*3.0)+0.5/(float(n)+4.0))*y-1.0)*(float(n)+1.0)/(float(n)+2.0)+1.0/y y1 = float(n)+6.0 y2 = y1/(float(n)*y) y3 = y2 - 0.089*d - 0.822 y4 = y3 * (float(n)+2.0) * 3.0 y5 = 1.0 / y4 y6 = y5 + 0.5/(float(n)+4.0) y7 = y6*y - 1.0 y8 = y7 * (float(n)+1.0) y9 = y8 / (float(n)+2.0) y10 = y9 + 1.0/y y= y10 t = sqrt(float(n)*y) return t def LSD(response_to_treatments, probability): """ A stripped-down reimplementation of LSD.test from the agricoloae package. (http://cran.r-project.org/web/packages/agricolae/index.html) Calculates the Least Significant Difference of a multiple comparisons trial, over a balanced dataset. """ trt = response_to_treatments #model = aov(y~trt) #df = df.residual(model) # df is the residual Degrees of Freedom # n are factors, k is responses per factor (treatments) n = len(trt) k = len(trt[0]) # == len(trt[1]) == ... == len(trt[n]) iff we have a balanced design degrees_freedom_of_error = (n-1)*(k-1) treatment_means = {} for i in range(n): # n == len(trt) total = 0.0 for j in range(k): total += float(trt[i][j]) treatment_means[i] = total/k block_means = {} for j in range(k): total = 0.0 for i in range(n): total += float(trt[i][j]) block_means[j] = total/n grand_mean = sum(treatment_means.values()) / float(n) # TODO: what is the difference between type I and type III SS? (http://www.statmethods.net/stats/anova.html) SSE = 0.0 for i in range(n): # n == len(trt) for j in range(k): SSE += (float(trt[i][j]) - treatment_means[i] - block_means[j] + grand_mean)**2.0 #print "SSE: %f\n" % (SSE) mean_squares_of_error = SSE / degrees_freedom_of_error #print "MSE: %f\n" % (mean_squares_of_error) Tprob = qt(probability, degrees_freedom_of_error) #print "t-value: %f\n" % (Tprob) LSD = Tprob * sqrt(2.0 * mean_squares_of_error / k) return LSD
References:
http://cran.r-project.org/web/packages/agricolae/index.html
http://tarwi.lamolina.edu.pe/~fmendiburu
http://en.wikipedia.org/wiki/Normal_distribution#Quantile_function
http://en.wikipedia.org/wiki/Error_function#Inverse_function
http://en.wikipedia.org/wiki/Quantile_function#The_Student.27s_t-distribution
Hill, G. W. (1970) Algorithm 396: Student’s t-quantiles. Communications of the ACM,
13(10), 619-620.
Hill, G. W. (1981) Remark on Algorithm 396, ACM Transactions on
Mathematical Software, 7, 250-1.
| Download: | RenameAll.py (1K) |
Someone wanted to use the tools I wrote for the cardboard bookscanner on a non-windows platform… et voilà, the same functionality in Python (2.7.3):
""" Matti Kariluoma Jun 2011A python rendition of RenameAll.exe (http://www.mattikariluoma.com/files/RenameAll.exe) from the Cardboard Bookscanner project, Jan 2010. (http://www.instructables.com/id/Bargain-Price-Book-Scanner-From-A-Cardboard-Box/) This script loads a directory of images, then renames them all such that the first half and the second half are intersperesed. The input directory is expected to contain 2n or 2n+1 images, where the first n images are of the right-hand side of a book, and the last n are the left-hand side of the book. """ import os, sys, glob import cv images_filename = [] images_filename = glob.glob1(sys.argv[1], "*.[j,J][p,P,pe,PE][g,G]") images_filename.sort() NUM_PIC = len(images_filename) n = 0 for filename in images_filename: image = cv.LoadImage(sys.argv[1]+"/"+filename) if (n < NUM_PIC / 2): #sprintf(temp, "%06d-a.jpg", n+1) cv.SaveImage(sys.argv[1]+"/%06d-a.jpg" % (n+1), image) else: #sprintf(temp, "%06d-b.jpg", n+1-NUM_PIC/2) cv.SaveImage(sys.argv[1]+"/%06d-b.jpg" % (n+1-NUM_PIC/2), image) n += 1
Truth be told, I only wrote the tools so windows users could play along too; I imagined non-windows users would write some shell one-liners.