Avro - a simple example
When moving data from one place to another or just storing it, there are loads of options from plain text to specialized, binary formats. Somewhere in the middle are XML, JSON, ProtocolBuffers, Thrift and a newer entry Avro. Avro differs a little from some of these as it is in a binary format like protobufs and Thrift, but unlike these two, it also stores the schema with the file. It is easy to use being very similar to protobufs and Thrift or even XSD derived classes.
Follow the detail below (or the simple tutorial on the Avro pages):
Download or use dependency management (maven/gradle/etc) to get: avro-1.7.6.jar and avro-tools-1.7.6.jar and Jackson JSON library - specifically, core-asl and mapper-asl jars (those are 1.9.x jar names) or core for v2.x of Jackson. Make sure they're on the build path.
Create a schema (as in example.avsc):
cd .../workspace/avro-example/
java -jar /path/to/avro-tools-1.7.6.jar compile schema example.avsc .
which will create a myExample.java file in example/avro folder
Move the example/avro folder to be under src or move the newly created file to be under src/example.avro package or add the new file to the build path.
Put the schema to use by pulling it in as a class and creating a few instances - note the different constructors. Then open a writer and filewriter to write out the data, then open a reader and filereader to pull it back in - that should cover the basics! Note that the reader and writer and their corresponding filereader and filewriter can have differing schemas - in case you have versioning and want to open a file with one schema, but operate on the data with another.
Run the AvroEx.java file as an application. It will create the avro file for writing and reading and print out the data that was written out and read back in.
Follow the detail below (or the simple tutorial on the Avro pages):
Download or use dependency management (maven/gradle/etc) to get: avro-1.7.6.jar and avro-tools-1.7.6.jar and Jackson JSON library - specifically, core-asl and mapper-asl jars (those are 1.9.x jar names) or core for v2.x of Jackson. Make sure they're on the build path.
Create a schema (as in example.avsc):
{"namespace": "example.avro",
"type": "record", "name": "MyExample",
"fields": [
{"name": "title_of_doc", "type": "string"},
{"name": "author_name", "type": ["string", "null"]},
{"name": "number_pages", "type": ["int", "null"]}
]
}
... and run the avro command line tool to generate the class:cd .../workspace/avro-example/
java -jar /path/to/avro-tools-1.7.6.jar compile schema example.avsc .
which will create a myExample.java file in example/avro folder
Move the example/avro folder to be under src or move the newly created file to be under src/example.avro package or add the new file to the build path.
Put the schema to use by pulling it in as a class and creating a few instances - note the different constructors. Then open a writer and filewriter to write out the data, then open a reader and filereader to pull it back in - that should cover the basics! Note that the reader and writer and their corresponding filereader and filewriter can have differing schemas - in case you have versioning and want to open a file with one schema, but operate on the data with another.
package example.avro;
import java.io.File;
import java.io.IOException;
import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.specific.SpecificDatumReader;
import org.apache.avro.specific.SpecificDatumWriter;
public class AvroEx {
public static void main(String args[]){
MyExample exmplDoc = new MyExample(); //basic constructor, class from the record name in avsc file
exmplDoc.setTitleOfDoc("Testing for fun"); //notice it replaced title_of_doc with TitleOfDoc
exmplDoc.setNumberPages(123);
MyExample exmplDoc2 = new MyExample("Growing Green Software",322,"Mr Green"); //alt constructor
MyExample exmplDoc3 = MyExample.newBuilder().setTitleOfDoc("Forget Testing") //using builder requires setting
.setAuthorName("Miss Read").setNumberPages(null) //all fields even if null
.build();
//Write out an AVRO file
File file = new File("Example-out-in.avro");
DatumWriter<MyExample> userDatumW = new SpecificDatumWriter<MyExample>(MyExample.class); //serialize in memory
DataFileWriter<MyExample> dataFW = new DataFileWriter<MyExample>(userDatumW); //allow difference schema if necessary
try {
dataFW.create(exmplDoc.getSchema(), file);//write schema and records to file
dataFW.append(exmplDoc);
dataFW.append(exmplDoc2);
dataFW.append(exmplDoc3);
dataFW.close();
} catch (IOException e) {
e.printStackTrace();
}
//Read in AVRO data
DatumReader<MyExample> userDR = new SpecificDatumReader<MyExample>(MyExample.class);
try {
DataFileReader<MyExample> dataFR = new DataFileReader<MyExample>(file,userDR); //schema option again
MyExample userReadIn = null;
while (dataFR.hasNext()){
userReadIn = dataFR.next(userReadIn);
System.out.println(userReadIn);
}
dataFR.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Run the AvroEx.java file as an application. It will create the avro file for writing and reading and print out the data that was written out and read back in.
Comments
Post a Comment