Skip to content

Commit

Permalink
Merge pull request #1 from h2oai/master
Browse files Browse the repository at this point in the history
update
  • Loading branch information
Viktor-Demin authored Apr 7, 2019
2 parents abd2ebd + de7a28b commit 936d65f
Show file tree
Hide file tree
Showing 199 changed files with 10,495 additions and 7,836 deletions.
8 changes: 4 additions & 4 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,18 @@ compiler:
- gcc

before_install:
- ( test -d "$LLVM4" && test "$(ls -A $LLVM4)") || ( wget -O llvm4.tar.xz "http://releases.llvm.org/4.0.0/clang+llvm-4.0.0-x86_64-linux-gnu-ubuntu-16.04.tar.xz" && mkdir -p "$LLVM4" && tar xf llvm4.tar.xz -C "$LLVM4" --strip-components 1 )
- sudo cp $LLVM4/lib/libomp.so /usr/lib
- ( test -d "$LLVM7" && test "$(ls -A $LLVM7)") || ( wget -O llvm7.tar.xz "http://releases.llvm.org/7.0.1/clang+llvm-7.0.1-x86_64-linux-gnu-ubuntu-16.04.tar.xz" && mkdir -p "$LLVM7" && tar xf llvm7.tar.xz -C "$LLVM7" --strip-components 1 )
- sudo cp $LLVM7/lib/libomp.so /usr/lib

env:
global:
- LLVM4=$HOME/LLVM4 LLVM_CONFIG=$LLVM4/bin/llvm-config CLANG=$LLVM4/bin/clang LD_LIBRARY_PATH=$LLVM4/lib:$LD_LIBRARY_PATH
- LLVM7=$HOME/LLVM7 LLVM_CONFIG=$LLVM7/bin/llvm-config CLANG=$LLVM7/bin/clang LD_LIBRARY_PATH=$LLVM7/lib:$LD_LIBRARY_PATH

cache:
pip: true
ccache: true
directories:
- $LLVM4
- $LLVM7

python:
- "3.5"
Expand Down
41 changes: 39 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,13 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
The method matches the entire string, not just the beginning. Thus, it
most closely resembles Python function `re.fullmatch()`.

- Added early stopping support to FTRL algo, that can now do binomial and
multinomial classification for categorical targets, as well as regression
for continuous targets.

- New function `dt.median()` can be used to compute median of a certain
column or expression, either per group or for the entire Frame (#1530).


### Fixed

Expand All @@ -64,6 +71,28 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
- The reported frame size (`sys.getsizeof(DT)`) is now more accurate; in
particular the content of string columns is no longer ignored (#1697).

- Type casting into str32 no longer produces an error if the resulting column
is larger than 2GB. Now a str64 column will be returned instead (#1695).

- Fixed memory leak during computation of a generic `DT[i, j]` expression.
Another memory leak was during generation of string columns, now also fixed
(#1705).

- Fixed crash upon exiting from a python terminal, if the user ever called
function `frame_column_rowindex().type` (#1703).

- Pandas "boolean column with NAs" (of dtype `object`) now converts into
datatable `bool8` column when pandas DataFrame is converted into a datatable
Frame (#1730).

- Fixed conversion to numpy of a view Frame which contains NAs (#1738).

- `datatable` can now be safely used with `multiprocessing`, or other modules
that perform fork-without-exec (#1758). The child process will spawn its
own thread pool that will have the same number of threads as the parent.
Adjust `dt.options.nthreads` in the child process(es) if different number
of threads is required.


### Changed

Expand All @@ -72,6 +101,14 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
`dt.options.display.interactive = True`. Alternatively, you can explore a
Frame interactively using `frame.view(True)`.

- Improved performance of type-casting a view column: now the code avoids
materializing the column before performing the cast.

- `Frame` class is now defined fully in C++, improving code robustness and
performance. The property `Frame.internal` was removed, as it no longer
represents anything. Certain internal properties of `Frame` can be accessed
via functions declared in the `dt.internal.` module.


### Deprecated

Expand All @@ -91,9 +128,9 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
- Thanks to everyone who helped make `datatable` more stable by discovering
and reporting bugs that were fixed in this release:

[arno candel][] (#1619),
[arno candel][] (#1619, #1730, #1738),
[antorsae][] (#1639),
[pasha stetsenko][] (#1672, #1694, #1697)
[pasha stetsenko][] (#1672, #1694, #1695, #1697, #1703, #1705)



Expand Down
141 changes: 6 additions & 135 deletions c/column.cc
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,11 @@
#include <cstdlib> // atoll
#include "column.h"
#include "datatablemodule.h"
#include "py_utils.h"
#include "rowindex.h"
#include "sort.h"
#include "utils.h"
#include "utils/assert.h"
#include "utils/file.h"
#include "utils/misc.h"


Column::Column(size_t nrows_)
Expand Down Expand Up @@ -123,24 +122,14 @@ Column* Column::new_xbuf_column(SType stype,
*/
Column* Column::new_mbuf_column(SType stype, MemoryRange&& mbuf) {
Column* col = new_column(stype);
col->replace_buffer(std::move(mbuf));
xassert(mbuf.size() % col->elemsize() == 0);
xassert(stype == SType::OBJ? mbuf.is_pyobjects() : true);
col->nrows = mbuf.size() / col->elemsize();
col->mbuf = std::move(mbuf);
return col;
}



void Column::replace_buffer(MemoryRange&&) {
throw RuntimeError()
<< "replace_buffer(mr) not valid for Column of type " << stype();
}

void Column::replace_buffer(MemoryRange&&, MemoryRange&&) {
throw RuntimeError()
<< "replace_buffer(mr1, mr2) not valid for Column of type " << stype();
}



/**
* Create a shallow copy of the column; possibly applying the provided rowindex.
*/
Expand Down Expand Up @@ -193,124 +182,6 @@ size_t Column::nmodal() const { return get_stats()->nmodal(this); }



//------------------------------------------------------------------------------
// Casting
//------------------------------------------------------------------------------

Column* Column::cast(SType new_stype) const {
return cast(new_stype, MemoryRange());
}

Column* Column::cast(SType new_stype, MemoryRange&& mr) const {
if (ri) {
// TODO: implement this
throw RuntimeError() << "Cannot cast a column with rowindex";
}
Column *res = nullptr;
if (mr) {
res = Column::new_column(new_stype);
res->nrows = nrows;
res->mbuf = std::move(mr);
} else {
if (new_stype == stype()) {
return shallowcopy();
}
res = Column::new_data_column(new_stype, nrows);
}
switch (new_stype) {
case SType::BOOL: cast_into(static_cast<BoolColumn*>(res)); break;
case SType::INT8: cast_into(static_cast<IntColumn<int8_t>*>(res)); break;
case SType::INT16: cast_into(static_cast<IntColumn<int16_t>*>(res)); break;
case SType::INT32: cast_into(static_cast<IntColumn<int32_t>*>(res)); break;
case SType::INT64: cast_into(static_cast<IntColumn<int64_t>*>(res)); break;
case SType::FLOAT32: cast_into(static_cast<RealColumn<float>*>(res)); break;
case SType::FLOAT64: cast_into(static_cast<RealColumn<double>*>(res)); break;
case SType::STR32: cast_into(static_cast<StringColumn<uint32_t>*>(res)); break;
case SType::STR64: cast_into(static_cast<StringColumn<uint64_t>*>(res)); break;
case SType::OBJ: cast_into(static_cast<PyObjectColumn*>(res)); break;
default:
throw ValueError() << "Unable to cast into stype = " << new_stype;
}
return res;
}

void Column::cast_into(BoolColumn*) const {
throw ValueError() << "Cannot cast " << stype() << " into bool";
}
void Column::cast_into(IntColumn<int8_t>*) const {
throw ValueError() << "Cannot cast " << stype() << " into int8";
}
void Column::cast_into(IntColumn<int16_t>*) const {
throw ValueError() << "Cannot cast " << stype() << " into int16";
}
void Column::cast_into(IntColumn<int32_t>*) const {
throw ValueError() << "Cannot cast " << stype() << " into int32";
}
void Column::cast_into(IntColumn<int64_t>*) const {
throw ValueError() << "Cannot cast " << stype() << " into int64";
}
void Column::cast_into(RealColumn<float>*) const {
throw ValueError() << "Cannot cast " << stype() << " into float";
}
void Column::cast_into(RealColumn<double>*) const {
throw ValueError() << "Cannot cast " << stype() << " into double";
}
void Column::cast_into(StringColumn<uint32_t>*) const {
throw ValueError() << "Cannot cast " << stype() << " into str32";
}
void Column::cast_into(StringColumn<uint64_t>*) const {
throw ValueError() << "Cannot cast " << stype() << " into str64";
}
void Column::cast_into(PyObjectColumn*) const {
throw ValueError() << "Cannot cast " << stype() << " into pyobj";
}



//------------------------------------------------------------------------------
// Integrity checks
//------------------------------------------------------------------------------

void Column::verify_integrity(const std::string& name) const {
mbuf.verify_integrity();
ri.verify_integrity();

size_t mbuf_nrows = data_nrows();

// Check RowIndex
if (ri.isabsent()) {
// Check that nrows is a correct representation of mbuf's size
if (nrows != mbuf_nrows) {
throw AssertionError()
<< "Mismatch between reported number of rows: " << name
<< " has nrows=" << nrows << " but MemoryRange has data for "
<< mbuf_nrows << " rows";
}
}
else {
// Check that the length of the RowIndex corresponds to `nrows`
if (nrows != ri.size()) {
throw AssertionError()
<< "Mismatch in reported number of rows: " << name << " has "
<< "nrows=" << nrows << ", while its rowindex.length="
<< ri.size();
}
// Check that the maximum value of the RowIndex does not exceed the maximum
// row number in the memory buffer
if (ri.max() >= mbuf_nrows && ri.max() != RowIndex::NA) {
throw AssertionError()
<< "Maximum row number in the rowindex of " << name << " exceeds the "
<< "number of rows in the underlying memory buffer: max(rowindex)="
<< ri.max() << ", and nrows(membuf)=" << mbuf_nrows;
}
}

// Check Stats
if (stats) { // Stats are allowed to be null
stats->verify_integrity(this);
}
}



//==============================================================================
Expand All @@ -323,7 +194,7 @@ SType VoidColumn::stype() const noexcept { return SType::VOID; }
size_t VoidColumn::elemsize() const { return 0; }
bool VoidColumn::is_fixedwidth() const { return true; }
size_t VoidColumn::data_nrows() const { return nrows; }
void VoidColumn::reify() {}
void VoidColumn::materialize() {}
void VoidColumn::resize_and_fill(size_t) {}
void VoidColumn::rbind_impl(std::vector<const Column*>&, size_t, bool) {}
void VoidColumn::apply_na_mask(const BoolColumn*) {}
Expand Down
Loading

0 comments on commit 936d65f

Please sign in to comment.