Core Committer Weekly Interlock - September 21st 2017

Attendees

Former user (Deleted)

Former user (Deleted)

Former user (Deleted)

Former user (Deleted)

Bishoy youssef

Michael Hepfer

Former user (Deleted)

Former user (Deleted)

Former user (Deleted)

Leo Zhang



Agenda

  1. Ease of use of the Vagrant based demo
    1. Week of 8/7 comment on the slack channel: "At this point I've given up on rackhd. If even the demo requires an old version of ubuntu to run an old version of virtualbox to get it working, I will stick with something simpler."
      1. Requirements agreed to  by the team:   
        1. The demo should be simple, convenient, easy to use / bring up and debug
        2. The environments should be workable across different versions or latest versions. Such as mongodb, docker/virtualbox, 
        3. Host OS independent, Can run on Windows, Linux, or MacOS host system.
        4. Has no impact / dependency of the host network.
        5. utilizes existing nightly RackHD images, does not require building / testing of additional new images
        6.  Uses infrasim for vmbc nodes
        7. can run discovery workflows
        8. can run OS install workflows
        9. can run FIT smoke test suite
        10. The demo solution could support running in cloud (ex: IaaS, PaaS) technically
        11. to include the smi microservice containers  
      2. CC team voted for the "docker - compose " POC effort: https://github.com/RackHD/RackHD/pull/889
        1. 7 votes
        2.  new stories to be created and driven in veyron backlog.  Epic to be created and sent via email for team to review.
  2. How to add stand alone services to the Master CI/CD pipeline (ex SMI Micro Services, UCS etc)  right now Master CI is strictly core RackHD
    1. Status of on-network/on-topology and test/deployment options
      1. New services should follow 12factor.net guidelines, Rest API should be available for IPC between services.
    2. SMI Service Integration to CI

      1. Former user (Deleted) has downloaded the idrac simulation tool, currently under evaluation
        1. tool supports only read operations
        2. RackHD Epic to be created that introduces workflow testing to rackhd CI/CD.  This will cover smi service testing, does not cover "plugin" integration tsting 
      2.  the idrac simulation tool will be used for virtualized testing (PR quality gates and post merge testing) and introduce the 13g Dell physical hardware to the Regression-Baremetal job for smi workflow testing for regression test..   

    3. Michael Hepfer and Former user (Deleted) to sync up offline to stand up a concourse environment 
  3. Architectural discussion : single entry point for services
    1. looking for 1 entry point to all of the microservices, discussed briefly how today smi services leverages zuul.   Should this also be considered for on-taskgraph and other future RackHD services?  Discussion to continue
    2. currently smi services are also using the workflow engine as an entry point to those services.  Looking to eliminate the smi/workflow engine dependency and leverage a standard api gateway.


Did not get to these items below:

  1. RackHD Release Cadence

    1. As we’re moving in to continuous delivery for the Concourse based CI (ie, deployed packages per merged PR), does that change the need or frequency for weekly RackHD sprint releases?  Email thread started on 9/20.

    2. If we are releasing debians and docker containers AND the demo is moved to a docker image, do we still need to provide a script in the new CI env that allows users to generate a Vagrant based RackHD image
  2. RackHD Tooling Updates
    1. Ubuntu to be upgraded to 16.04
      1.   What has been developed to date for the Concouse env includes the 16.04 migration, should Jenkins based env be upgraded?
      2. ova scripts will need to be updated (passing a parameter) to move to 16.04 (covered by Felouka:  RAC-5987 - Getting issue details... STATUS
    2. Node v6 is the current available version, RackHD is running v4.
      1. RackHD Epic to be created to migrate from v4→ v8
        1. Needs to be assigned.  Former user (Deleted)/Maglev team to help create the epic.
    3. RackHD Story tracking testing the latest MongoDB version in CI (Mongo recommending using 3.X + versions only, not supporting anything in the 2.X version family)
      1. Do we want to support this in Jenkins, Concourse, or both.  Will be part of the Concourse env.
        1. Concourse env tracking story:  RAC-5991 - Getting issue details... STATUS
        2. Jenkins side, seems to be a trivial effort to support.  May bring out issues in RackHD and if previous issues have since been resolved.
          1. Plan to try testing with the latest, see what the issues are.  If trivial set of isues, move to the latest.  If many issues encountered, hold off.
          2. Maglev team to create the story and target next sprint. - any update? 
  3. Process change for Master CI failures - how long can a developer work on a fix for a Master CI Failure before requiring to back out the change and get back to green?
    1. have your 1 working day to resolve the issue (ie, up to the next MasterCI run at 6:31pm EST) 
    2. if not resolved, code is backed out, the MasterCI job is re-run to ensure that pipeline is returned to Green.
    3. If thought to be resolved, MasterCI to be re-run to ensure the pipeline returns to green
  4. Review slides from Former user (Deleted) for CI Security moving CI to container , moving CI to cloud .
    1. AI: CC team to review the slides, come back with feedback/answers to the questions posed in the slide deck.
  5. Racadm→WSMAN tooling conversion
    1. Agreement at OLT that we will be going fully wsman-based and eliminate racadm from workflow support.  
  6. BareMetal Regression Pipeline now created/monitored. 
    1. Plan is to monitor for a few weeks, should it then be a gate?
    2. BareMetal OS install on real hardware currently runs every 2 hours on the nightly docker images.  Will need to kick off BareMetal at same time as CI
    3. Do we then continue to run BareMetal every 2 hours
    4. Should this be part of the Master CI pipeline, if so then we would need a modification to the Merge Freeze tool to freeze on failure of BareMetal regression tests.


Next meeting will be Thursday September 28.