Tuesday, May 27, 2014

Wednesday May 28th

Last lecture today, on predictor evaluation, more predictors, and clustering.

No lab finally.

This is about it for the seminar.


Tuesday, May 20, 2014

Wednesday May 21st

We'll have a lecture on frequent pattern mining. Slides here right before the class.

No lab today.

Tuesday, May 13, 2014

Wednesday, May 14th

We will finish Lecture 6 and go over Lecture 7.

There will be lab 14:00-16:00.

Monday, April 28, 2014

Next two weeks

No lab on this wednesday 30th, sorry. There will be less exercises and more labs in the second half of the course, data mining.

If you have time, start downloading MOA http://moa.cms.waikato.ac.nz/ and playing with it.

Also, I'm at a workshop next week (May 6th-9th), so no theory and no lab.

Ricard

Tuesday, April 22, 2014

Tuesday, April 8, 2014

Saturday, April 5, 2014

Glitches in lab 1, c++ source

I've found a couple of glitches when doing the lab myself and using the c++ routines I pointed to.

- in my system, though int's are 32 bits, the constants RAND_MAX and LONG_PRIME are 16 bits (so at most 2^15-1). This gives far too little randomness for checking large sets of items. I've reposted distrib.h which simulates (badly) a 32 random bit generator. Also, if this happens in your system, you may want to change the lines

  hashes[i][0] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);
  hashes[i][1] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

in the genajbj method of count_min_sketch.cpp with, for example,

  hashes[i][0] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);
  hashes[i][0] *= RAND_MAX;
  hashes[i][0] += int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);
  hashes[i][1] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);
  hashes[i][1] *= RAND_MAX;
  hashes[i][1] += int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

- watch out because CountMinSketch::update takes an int, but CountMinSketch::estimate returns an unsigned int. Watch subtractions with unsigneds which may give nonsensical results; use casts appropriately.

Hopefully these things don't show up in other languages.