RackHD Commit Review Board - December 19, 2016

Previous Meeting:

RackHD Commit Review Board - December 12, 2016

  1. Expanded Forum and Logistics
  2. API 1.1 Deprecation
  3. Messaging to the Community
  4. DataSets Data Structure, Data Logistics  (GitHub Issue 417)
  5. Pending PR Requests
  6. Roundtable


Attendees:

Former user (Deleted)

Former user (Deleted)

Justin Kenney

Paul Scharlach

Amy Mullins

Leo Zhang

Tim Larson

James King

Former user (Deleted)

Former user (Deleted) 

Tom Capirchio

Thomas Sullivan

roland poulter

Former user (Deleted)


New Agenda Items:

  1. RackHD Prime Update
    1. Commit Team Members and Process (AI: Tim Larson and Thomas Sullivan will send out the updated write-up on the proposal)
    2. FIT/CIT Go Live (Targeting the Sprint starting on Jan 2nd for a go live for ending on Jan 13th)
    3. RackHD Build and Release Go Live (Targeting the Sprint starting on Jan 2nd for a go live for ending on Jan 13th)
    4. RackHD Channel Support, Rotation Concept (AI: need to draft up the proposal on what this will look like and how the logistics will function AI: Tim Larson and Thomas Sullivan ) 
      1. APAC channel has been bringing up issues that can be moved over to the SH team for dispatching
  2. Code check-in plan for Image repo server project
    1. Currently Ted implemented this feature as separate project at https://github.com/cgx027/on-static.  
      1. Discussion on where/how to check this project into RackHD code bases.
      2. Where should this live in the RackHD Github community code-base?
        1. On-Static is available for the GitHub.com repo.
        2. Need to validate that this doesn't conflict with any OnRack artifacts(AI: Jeanne Ohren to validate that we can deprecate the OnRack to ensure no conflicts)
      3. AI: Discuss at the Architecture Review Board (Former user (Deleted))
      4. AI: Need to determine the naming convention for this (Tom Capirchio)
  3. Timeline to move forward Dependency PR to production environment
    1. This is currently separate from the Jenkins harness
    2. Currently being run in parallel.
    3. Stability or actually testing dependencies? Are we stable now? (Answer is yes) 
    4. When does this move into the production regression harness?
    5. Thought is to do the production enablement now  (go live) to ensure we have a stable pipeline for the next round of enablement.
    6. Plan is to move forward now with enabling the production capability in this sprint (AI: Leo YJ Zhang (Unlicensed))
  4. Design Discussion
    1. Neighborhood manager

    2. Consul

      1. What other options would be "a better option" for service discovery?
      2. Current need is for modularity enablement for RackHD 
      3. Pros: it is in place and is functional
      4. Cons: it is heavy, vendor specific
      5. AI: Former user (Deleted) Go deeper on this in the Arch Review Board Discussion (This Wed)
      6. One potential ask here is to iterate on the implementation details to ensure that we could/can swap out the Hashi tools later if needed.
      7. Proposal: Punt on the service discovery for now and focus on the simple use case with (config base, static configurations) - restricts complicated deployments but does enable progress
        1. RPC and Service Isolation now
    3. Service isolation / decomposition

      1. AI: Former user (Deleted) Go deeper on this in the Arch Review Board Discussion (This Wed)

      2. Big questions to be answered - how much granularity do we want?
      3. Trade-offs between supportability and where does the ROI start to diminish?
      4. Optimize against functionality, feature velocity, performance, or supportability?
    4. Workflow engine optimizations

      1. Performance, Scale and "Footprint"

      2. Features:

        1. RPC mechanism

        2. Service Discovery

        3. Dispatch to Workers (Tiering)

        4. What is our baseline for evaluating the module?

      3. There is a lot to go into on this topic with this module and there will need to be further discussions on this one.
      4. We will need to ensure that we expand on the "dashboard" metrics for the module. 
    5. Usability improvements

      1. DHCP, PXE discovery – Lots of problems with this posted to Slack.  Is there anything we can change to make this easier and more reliable?

      2. More human friendly interface for graphs / task definitions

      3. Discussions have happened in the past around creating an investment in support tools, tracing, debug, etc..

      4. The assumption is that there is a layer above and that may have caused us to make some assumptions about how support for the user will be handled.

      5. The discussions on improved UI tools may tie very heavily into this. 

      6. We need to re-visit our front ends for the tools - UI needs an investment (feature roadmap is becoming important - we need to spend time on this going forward) (AI: OLT) 

      7. This will be one of our larger themes for this year. 

  5. Pending PRs
  6. Roundtable