Last lecture today, on predictor evaluation, more predictors, and clustering.

No lab finally.

This is about it for the seminar.

# MIRI Seminar on Data Streams, spring 2014 edition

## Tuesday, May 27, 2014

## Tuesday, May 20, 2014

### Wednesday May 21st

We'll have a lecture on frequent pattern mining. Slides here right before the class.

No lab today.

No lab today.

## Tuesday, May 13, 2014

## Monday, April 28, 2014

### Next two weeks

No lab on this wednesday 30th, sorry. There will be less exercises and more labs in the second half of the course, data mining.

If you have time, start downloading MOA http://moa.cms.waikato.ac.nz/ and playing with it.

Also, I'm at a workshop next week (May 6th-9th), so no theory and no lab.

Ricard

If you have time, start downloading MOA http://moa.cms.waikato.ac.nz/ and playing with it.

Also, I'm at a workshop next week (May 6th-9th), so no theory and no lab.

Ricard

## Tuesday, April 22, 2014

## Tuesday, April 8, 2014

## Saturday, April 5, 2014

### Glitches in lab 1, c++ source

I've found a couple of glitches when doing the lab myself and using the c++ routines I pointed to.

- in my system, though int's are 32 bits, the constants RAND_MAX and LONG_PRIME are 16 bits (so at most 2^15-1). This gives far too little randomness for checking large sets of items. I've reposted distrib.h which simulates (badly) a 32 random bit generator. Also, if this happens in your system, you may want to change the lines

hashes[i][0] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

hashes[i][1] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

in the genajbj method of count_min_sketch.cpp with, for example,

hashes[i][0] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

hashes[i][0] *= RAND_MAX;

hashes[i][0] += int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

hashes[i][1] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

hashes[i][1] *= RAND_MAX;

hashes[i][1] += int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

- watch out because CountMinSketch::update takes an int, but CountMinSketch::estimate returns an unsigned int. Watch subtractions with unsigneds which may give nonsensical results; use casts appropriately.

Hopefully these things don't show up in other languages.

- in my system, though int's are 32 bits, the constants RAND_MAX and LONG_PRIME are 16 bits (so at most 2^15-1). This gives far too little randomness for checking large sets of items. I've reposted distrib.h which simulates (badly) a 32 random bit generator. Also, if this happens in your system, you may want to change the lines

hashes[i][0] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

hashes[i][1] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

in the genajbj method of count_min_sketch.cpp with, for example,

hashes[i][0] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

hashes[i][0] *= RAND_MAX;

hashes[i][0] += int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

hashes[i][1] = int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

hashes[i][1] *= RAND_MAX;

hashes[i][1] += int(float(rand())*float(LONG_PRIME)/float(RAND_MAX) + 1);

- watch out because CountMinSketch::update takes an int, but CountMinSketch::estimate returns an unsigned int. Watch subtractions with unsigneds which may give nonsensical results; use casts appropriately.

Hopefully these things don't show up in other languages.

Subscribe to:
Posts (Atom)