The goal of this working group is
- Use Apache Bigtop as a foundation for POC development
- Example of POC
- How to make this better
- Background Skills: Use Apache Bigtop to deploy Hadoop Components and run tests to verify Hadoop Ecosystem functionality,
- This is a good place for beginners to start, install Bigtop and install ALL the components. Run the WC example in Hadoop and progress to all the other test examples in the individual components from HBase to Solr. Sync group on Hadoop/HBase first.
- Add what you learned to the wiki we started 2 y ago.
- https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop+0.4.0
- Import data into HDFS/HBase. Once you see how slow this is when operating in cluster mode, we can use this as a basis for comparison to other solutions like MongoDB and Cassandra and we can also contrast with no DB systems like Twitter Storm.
- Writing Systems tests in Groovy, can you get interoperating components to work in code?
- Building Apache Bigtop and adding additional BigData components to facilitate POCs
- DataPipe (guest lecture if available), Used servers( Guest lecture)
- Limitations of YCSB and how to benchmark APIs (guest lecture if available)
- MongoDB(guest lecture)
- Cassandra/Astyanax (guest lecture if available)
- Twitter Storm (guest lecture)
- Porting additional components into Apache Bigtop
- Individual Projects
Tools:
- free trial to Safari Online. You will need the O'Reilly/Manning books to work with the individual Bigtop components
- AWS compute time
- IntelliJ, nobody used this last year. Cancelling this
Syllabus:
Bigtop Session #1:
Group Goal: Develop and practice Bigtop skills needed for writing Integration tests and understanding Hadoop Component architectures, base skill set consists of running Bigtop in distributed mode in AWS, Bigtop Build, Running mvn verify and writing Groovy integration tests. Will take 4+ sessions to develop proficiency.
Creating an Apache Jira Account
Installing Apache Bigtop 0.5.0/0.6.0, updating the Bigtop Wiki
Running the components in bigtop.mk not on the wiki.
Running the integration tests
Bigtop 0.5.0/0.6.0 build
Bigtop 0.5.0/0.6.0 provision in AWS, configure for distributed operation
AWSBigtopCentos6Install.docx
Bigtop Session #2:
Integration tests preparation. Work through Ch 3/4 in Hadoop Real World Solutions
RealWorldSolnsCookbookExamples.docx
pom.xml
RealWorldSolnsCookbookExamples.docx
Zookeeper:
zookeepertelnet.docx
Zookeeper material
Group project Zookeeper Hands on Lab for next session
Bigtop Session #3:
Group Goal: MongoDB, CouchDB (if available). Install in local and sharded mode. Compare Bulk write and read performance to HDFS/HBase
Guest lecture Marshall BenchPress?
Benchmarking beyond YCSB
MongoDB(speaker from 10Gen), CouchDB (lecture if available)
Demo MongoDB local and sharded mode, Group project install Benchpress
Group project for next session, work in installing MongoDB, Cassandra, Astyanax.
Bigtop Session #4:
Bigtop Build Instructions:
- make rpm;
- How Bigtop works
- RPM files: HowRPMFilesWork.docx
- Apache Forrest Demo. 0.7 vs. 0.9(DC)
- Guest lecture Matt the Node.js dude...
Apache Cassandra/Astaynax: Guest Lecture from Netflix?
Cassandra Theory
Cassandra Bulk import performance
Cassandra Scan performance (Guest lecture Apple)
DataStax demo?
How RPM files work (DC lecture/demo)
Port Cassandra into Bigtop?
Group project for next session RPM Cassandra build for next session
Bigtop Session #4:
MongoDB(10 gen guest speaker):
Running in sharded mode
How slow is importing data into MongoDB in Sharded mode.
Doing a MongoDB RPM build
Storm(DC)
Demo, how to write programs in Storm, Spouts/Bolts
How does Storm parallelism work?
Monit example
RPM example
How to port Storm to Bigtop
Projects!!!
Comments (0)
You don't have permission to comment on this page.