Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-17427: [Java] Add Windows build script that produces DLLs #14203

Merged
merged 16 commits into from
Sep 28, 2022

Conversation

kou
Copy link
Member

@kou kou commented Sep 22, 2022

No description provided.

@kou
Copy link
Member Author

kou commented Sep 22, 2022

@github-actions crossbow submit java-jars

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Sep 22, 2022

@github-actions crossbow submit java-jars

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Sep 23, 2022

@github-actions crossbow submit java-jars

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Sep 23, 2022

@github-actions crossbow submit java-jars

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Sep 23, 2022

@github-actions crossbow submit java-jars

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Sep 23, 2022

@github-actions crossbow submit java-jars

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Sep 23, 2022

@github-actions crossbow submit java-jars

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Sep 23, 2022

@github-actions crossbow submit java-jars

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Sep 23, 2022

@github-actions crossbow submit java-jars

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Sep 24, 2022

@github-actions crossbow submit java-jars

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Sep 24, 2022

@github-actions crossbow submit java-jars

@github-actions

This comment was marked as outdated.

@davisusanibar
Copy link
Contributor

#14208

Thanks, I see these libraries included now:

Dataset:
arrow_dataset_jni.dll
arrow_dataset_jni.lib
libarrow_dataset_jni.dylib
libarrow_dataset_jni.so

C Data Interface
arrow_cdata_jni.dll
arrow_cdata_jni.lib
libarrow_cdata_jni.dylib
libarrow_cdata_jni.so

Let me use that on some cookbooks to confirm is new DLL/LIB for C Data / Dataset test works

@kou
Copy link
Member Author

kou commented Sep 26, 2022

Ah, we don't need .lib. I'll remove them.

@kou
Copy link
Member Author

kou commented Sep 26, 2022

@github-actions crossbow submit java-jars

@github-actions
Copy link

Revision: 092d458

Submitted crossbow builds: ursacomputing/crossbow @ actions-ae8c93c847

Task Status
java-jars Github Actions

@davisusanibar
Copy link
Contributor

Tested on Windows 10 Home:

1.- Download new jar Dataset / C Data locally from https://github.com/ursacomputing/crossbow/releases/tag/actions-9be4b55dea-github-java-jars

2.- Test new DLL created:

# Dataset DLL
$ cygcheck.exe 'arrow_dataset_jni.dll'
  C:\Windows\system32\WINHTTP.dll
    C:\Windows\system32\ntdll.dll
    C:\Windows\system32\KERNELBASE.dll
  C:\Windows\system32\bcrypt.dll
  C:\Windows\system32\WININET.dll
    C:\Windows\system32\msvcrt.dll
  C:\Windows\system32\USERENV.dll
    C:\Windows\system32\RPCRT4.dll
  C:\Windows\system32\VERSION.dll
    C:\Windows\system32\KERNEL32.dll
  C:\Windows\system32\WS2_32.dll
  C:\Windows\system32\SHELL32.dll
    C:\Windows\system32\msvcp_win.dll
    C:\Windows\system32\USER32.dll
      C:\Windows\system32\win32u.dll
      C:\Windows\system32\GDI32.dll
  C:\Windows\system32\ole32.dll
    C:\Windows\system32\combase.dll
  C:\Windows\system32\ADVAPI32.dll
    C:\Windows\system32\SECHOST.dll
  C:\Windows\system32\MSVCP140.dll
    C:\Windows\system32\VCRUNTIME140.dll
    C:\Windows\system32\VCRUNTIME140_1.dll

# C Data Interface DLL
$ cygcheck.exe 'arrow_cdata_jni.dll'
  C:\Windows\system32\MSVCP140.dll
    C:\Windows\system32\VCRUNTIME140.dll
      C:\Windows\system32\KERNEL32.dll
        C:\Windows\system32\ntdll.dll
        C:\Windows\system32\KERNELBASE.dll
    C:\Windows\system32\VCRUNTIME140_1.dll

If you see errors try to install https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170

3.- Install new jar Dataset / C Data locally:

# intall dataset manually
mvn install:install-file -Dfile="C:\Users\dsusanibar\IdeaProjects\win-cookbooks\src\main\resources\files\arrow-dataset-10.0.0-SNAPSHOT.pom" -DgroupId="org.apache.arrow" -DartifactId="arrow-dataset" -Dversion="10.0.0-SNAPSHOT" -Dpackaging="pom"
mvn install:install-file -Dfile="C:\Users\dsusanibar\IdeaProjects\win-cookbooks\src\main\resources\files\arrow-dataset-10.0.0-SNAPSHOT.jar" -DgroupId="org.apache.arrow" -DartifactId="arrow-dataset" -Dversion="10.0.0-SNAPSHOT" -Dpackaging="jar"
# install c data interface manually
mvn install:install-file -Dfile="C:\Users\dsusanibar\IdeaProjects\win-cookbooks\src\main\resources\files\arrow-c-data-10.0.0-SNAPSHOT.pom" -DgroupId="org.apache.arrow" -DartifactId="arrow-c-data" -Dversion="10.0.0-SNAPSHOT" -Dpackaging="pom"
mvn install:install-file -Dfile="C:\Users\dsusanibar\IdeaProjects\win-cookbooks\src\main\resources\files\arrow-c-data-10.0.0-SNAPSHOT.jar" -DgroupId="org.apache.arrow" -DartifactId="arrow-c-data" -Dversion="10.0.0-SNAPSHOT" -Dpackaging="jar"

4.- Add new Dataset / C Data Interface dependencies into your project (Maven/Gradle)

5.- Create Dataset with mew Dataset jar that contains DLL arrow_dataset_jni.dll + Read RecordBatches with new C Data Interface that contains DLL arrow_cdata_jni.dll:

import org.apache.arrow.dataset.file.FileFormat;
import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
import org.apache.arrow.dataset.jni.NativeMemoryPool;
import org.apache.arrow.dataset.scanner.ScanOptions;
import org.apache.arrow.dataset.scanner.Scanner;
import org.apache.arrow.dataset.source.Dataset;
import org.apache.arrow.dataset.source.DatasetFactory;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.ipc.ArrowReader;

import java.io.IOException;
import java.net.URISyntaxException;

public class Recipe {
    public static void main(String[] args) throws URISyntaxException {
        // File at: https://github.com/apache/arrow-cookbook/blob/main/java/thirdpartydeps/parquetfiles/data1.parquet
        String uri = "file:///C:\\Users\\dsusanibar\\IdeaProjects\\win-cookbooks\\src\\main\\resources\\files\\data1.parquet";
        ScanOptions options = new ScanOptions(/*batchSize*/ 5);
        try (
            BufferAllocator allocator = new RootAllocator();
            DatasetFactory datasetFactory = new FileSystemDatasetFactory(allocator, NativeMemoryPool.getDefault(), FileFormat.PARQUET, uri);
            Dataset dataset = datasetFactory.finish();
            Scanner scanner = dataset.newScan(options)
        ) {
            scanner.scan().forEach(scanTask -> {
                try (ArrowReader reader = scanTask.execute()) {
                    while (reader.loadNextBatch()) {
                        final int[] count = {1};
                        try (VectorSchemaRoot root = reader.getVectorSchemaRoot()) {
                            System.out.println("Number of rows per batch["+ count[0]++ +"]: " + root.getRowCount());
                            System.out.println(root.contentToTSVString());
                        }
                    }
                } catch (IOException e) {
                    e.printStackTrace();
                }
            });
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Result:
Number of rows per batch[1]: 3
id	name
1	David
2	Gladis
3	Juan

Thanks a lot @kou

@kou
Copy link
Member Author

kou commented Sep 28, 2022

That's good to know. :-)
I merge this.

@kou kou merged commit 35bfeb4 into apache:master Sep 28, 2022
@kou kou deleted the java-jni-windows branch September 28, 2022 03:14
@ursabot
Copy link

ursabot commented Sep 28, 2022

Benchmark runs are scheduled for baseline = f3af96a and contender = 35bfeb4. 35bfeb4 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.14% ⬆️0.0%] test-mac-arm
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.04% ⬆️0.04%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 35bfeb41 ec2-t3-xlarge-us-east-2
[Failed] 35bfeb41 test-mac-arm
[Failed] 35bfeb41 ursa-i9-9960x
[Finished] 35bfeb41 ursa-thinkcentre-m75q
[Finished] f3af96a2 ec2-t3-xlarge-us-east-2
[Failed] f3af96a2 test-mac-arm
[Failed] f3af96a2 ursa-i9-9960x
[Finished] f3af96a2 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants