| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Apache Bigtop Working Group Meeting 2013 (redirected from Apache Bigtop Working Group Meeting 201)

Page history last edited by doug chang 10 years, 9 months ago

The goal of this working group is

  • Use Apache Bigtop as a foundation for POC development
    • Example of POC
    • How to make this better 
  • Background Skills: Use Apache Bigtop to deploy Hadoop Components and run tests to verify Hadoop Ecosystem functionality,
    • This is a good place for beginners to start, install Bigtop and install ALL the components. Run the WC example in Hadoop and progress to all the other test examples in the individual components from HBase to Solr. Sync group on Hadoop/HBase first. 
    • Add what you learned to the wiki we started 2 y ago.
    • https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop+0.4.0 
    • Import data into HDFS/HBase. Once you see how slow this is when operating in cluster mode, we can use this as a basis for comparison to other solutions like MongoDB and Cassandra and we can also contrast with no DB systems like Twitter Storm.  
  • Writing Systems tests in Groovy, can you get interoperating components to work in code? 
  • Building Apache Bigtop and adding additional BigData components to facilitate POCs
  • DataPipe (guest lecture if available), Used servers( Guest lecture) 
  • Limitations of YCSB and how to benchmark APIs (guest  lecture if available)
  • MongoDB(guest lecture)
  • Cassandra/Astyanax (guest lecture if available)
  • Twitter Storm (guest lecture) 
  • Porting additional components into Apache Bigtop
  • Individual Projects  

 

 

Tools:

  • free trial to Safari Online. You will need the O'Reilly/Manning books to work with the individual Bigtop components
  • AWS compute time
  • IntelliJ, nobody used this last year. Cancelling this 

 

Syllabus: 

 

Bigtop Session #1:

Group Goal: Develop and practice Bigtop skills needed for writing Integration tests and understanding Hadoop Component architectures, base skill set consists of running Bigtop in distributed mode in AWS, Bigtop Build, Running mvn verify and writing Groovy integration tests. Will take 4+ sessions to develop proficiency.  

 

Creating an Apache Jira Account

Installing Apache Bigtop 0.5.0/0.6.0, updating the Bigtop Wiki

Running the components in bigtop.mk not on the wiki. 

Running the integration tests

Bigtop 0.5.0/0.6.0 build

Bigtop 0.5.0/0.6.0 provision in AWS, configure for distributed operation

 

AWSBigtopCentos6Install.docx

   

Bigtop Session #2:

 

Integration tests preparation. Work through Ch 3/4 in Hadoop Real World Solutions

RealWorldSolnsCookbookExamples.docx

pom.xml

RealWorldSolnsCookbookExamples.docx

 

 

 

 

Zookeeper:

 zookeepertelnet.docx

 

 Zookeeper material

 Group project Zookeeper Hands on Lab for next session

 

 

Bigtop Session #3: 

 Group Goal: MongoDB, CouchDB (if available). Install in local and sharded mode. Compare Bulk write and read performance to HDFS/HBase 

 Guest lecture Marshall BenchPress? 

 Benchmarking beyond YCSB 

 MongoDB(speaker from 10Gen), CouchDB (lecture if available) 

 Demo MongoDB local and sharded mode, Group project install Benchpress

 Group project for next session, work in installing MongoDB, Cassandra, Astyanax.  

 

Bigtop Session #4:

Bigtop Build Instructions:

  • make rpm;
  • How Bigtop works
  • RPM files: HowRPMFilesWork.docx  
  •  Apache Forrest Demo. 0.7 vs. 0.9(DC) 
  • Guest lecture Matt the Node.js dude... 

 

 

Apache Cassandra/Astaynax: Guest Lecture from Netflix? 

  Cassandra Theory 

  Cassandra Bulk import performance

  Cassandra Scan performance (Guest lecture Apple) 

  DataStax demo?

  How RPM files work (DC lecture/demo) 

  Port Cassandra into Bigtop?

  Group project for next session RPM Cassandra build for next session

 

Bigtop Session #4: 

MongoDB(10 gen guest speaker):

  Running in sharded mode

  How slow is importing data into MongoDB in Sharded mode.

  Doing a MongoDB RPM build

 

Storm(DC)

 Demo, how to write programs in Storm, Spouts/Bolts

 How does Storm parallelism work?

 Monit example 

 RPM example

 How to port Storm to Bigtop 

 

Projects!!!

 

 

Comments (0)

You don't have permission to comment on this page.