Updated: 2021-09-17

Some tips for data.table that I came across while googling which might be useful.

Print

To print more rows that default can be done with either:

  options(datatable.print.topn = 70)
  print(DT, topn = 70)

Using options will implement the changes globally. Else to make print nicer is using:

  options(datatable.prettyprint.char = 80L)

NA

Use nafill() to replace NA with specified value such as nafill(dt, fill==99) will replace all NA in dt with 99. To replace with front or back value of the vector then specify fill="locf|nocb" accordingly, ie. locf (last observation carried forward) and nocb (next observation carried backward).

Joining with roll and foverlaps

For detail example you can read here. Basically when specifying DT[dt, on=.(key_col), roll=TRUE] or roll=Inf the data will be joined closest to the keyed value where dt >= DT. Specifying with roll=-Inf will be the opposite. With roll="nearest" will roll both ways to the nearest value. Else you can specify an absolute value to roll on ie. roll=2 will roll to dt with key_col + 2.

Using foverlaps(DT, dt, type = "any") will join DT and dt as long as the value ranges give a match to the key values. When using foverlaps() you need to specify two keys eg. key = c("col1", "col2"). As in the given example with both columns are the key columns:

  foverlaps(dt4, dt3, type = "any")

  ##    min_y max_y x dt4_y dt4_y_end
  ## 1:     0    10 c   5.7        10
  ## 2:    10    15 c   5.7        10
  ## 3:    10    15 a  11.9        13
  ## 4:    15    20 d  18.0        22
  ## 5:    20    30 d  18.0        22
  ## 6:    20    30 b  21.4        25

Else specifying type="within" will join only those that match within the range.

Create empty data.table

Sometime I need to create an empty data.table based on an exisiting columnames. Lots of other methods to do this but the easiest way is:

cols <- names(DT)
dt <- setnames(data.table(matrix(nrow = 0, ncol = length(cols))), cols)
join  merge 
comments powered by Disqus