STILL BEING EDITED, NOT FINAL
Meeting Minutes from 2012 Joint Techs in Stanford
Welcome from Ken Miller, Co-Chair of Performance Working Group
====================
Wireless Broadband Measurement in CaliforniaYoungJoon Byun, Cal State, Monterey Bay
- sponsor is CPUC (California public utilities commission)
- part of ARRA grant
- state-wide testing 2x/yr through 2014
- tool dev to measure wireless performance
- goal is to objectively evaluate major providers of mobile wireless across state of California
- currently analyzing results
- updating software for second field test in fall.
- all data available.
any issues with server placemant?
looks pretty good
just used EC2, with placement in east and west coast
final results could be averaged
q: when experience server congestion, in virtual environment, spin up another one.
are looking for more in second field trial.
q: could do one per tester.
overall capacity fine. just start ofday.
q: doing anything to control who can access testers?
no protection today
where collecting data?
get stored files, and push back
look at any server data?
just client data [less accurate?]
up/down
up then down
q: curious to see if you need to do east/west thing. could be only nee done, speed up by factor of 2 if only do one.
ybyun@csumb.edu
any suggestions, comments, please contact.
want a chance to meet
=========
ESnet lookup service - Sowmya Balasubramanian
gather requirements, look at use cases, and revamp design.
designed several years ago
but increased scale has stressed and looks like the trajectory is bad
add security.
list of requirements;
based on use cases and current load
10,000 to 100k records [next ten years easily]
query time < 1sec. [else user gives up] [200ms?]
registration time <1hr [4-6 hrs to propogate today]
validate services have not been forged
q: 1sec, recall studies 250ms
as long as first resutls 250ms...
q: assume one query?
no, but want to make sure simple query < 1sec (heard "on average")
--
To simplify API, going with REST and JSON
record management (regisr/edit)
query api (get stuff)
http get (pull)
pub/sub with http streaming (push)
http://odev-vm-7.es.net/lookup-service-examples
[dev vm right place]
design - data represeantion (so... change that design?)
well defined set of key/value pairs, but users can add too
mongo d
--
testing
new and old ls on same host
4500 services
new ls is 95% faster than old one
1min-> under 1se
esnet using new service; and also new projects
(change? how easy to update)(if change schema/format/key-value)
watch for alphahttp://ps4.es.net:8085/lookup/services
timeframe...
few weeks
GLS.
what need to do to move current isntallation?
store same stuff, different format
really how it works today:
index servers
pulls from lookup servers, create csv
and use csv for initial location finding.
right now, modify script
can talk to both
then don't need a flag day
(or convert index servers to something new)
q: way to bootstrap into useing
people are pretty good into updating, think can do w/yum update
q: how do poeple do it today
old is soapy, this is json
ps tkit, registration,
upgrade switch
since consumption dont' calt directly, can move.
q/martin
GENI doing it the same way
new pushes to new
old ones pull old one
ericP: have compatable API for GENI uses too.=====
=======
SFLOW Data Network Visibility and Control
Neil McKee, InMon Corporation
sFlow: data network viz and control
now what sflow is?
sflow monitoring servers and apps.
where it's evolving, and go to questions.
probably have in network, think about turning it on
space for cisco
2. comprehensive
collects from a bunch of stuff
based on virtual network and switches
servers, hypervisors, virtual switches
2 mechanisms w/sflow that help
- de-synchronized, parallel push
auto push a full set of SNMP ifTable stuff
- packet/transaction sampling
allows you to do lots of things
ip addrs, urls, app attributes... things impossible to get all together
but need for situational analysis
sFlow samples packet headers
collector decides what to analyzie
hence can get new stuff really quickly
no firmware on switches, just software collector
--
captures packet path
where in and out of device
thread to find phys topo, and locate hosts to swtich ports w/in one min
[???]
--
arch
agents as simple as possibe, move stuff to collector
senders all open source & free
host sflow, sends mac addrs, so can join with packets
apps: get socets, underlying hypervisor load, and packet paths
enough stuff to join and ..
host stuff:
host-sflow.sourceforge.net
app monitring
that's the new stuff.
nfs/cifs. filepath, bytes, how long, soct)
web requrest. apache, nginx...
memcached lookups... memcache clusters...
database queries.
some playing, but add if you're motivated
have json-api. fashionable, and easy to add
so app can add information.
fire and forget.
XenMotion bandwidth, how does it look
see response time in perforamance of memcache cluster
Brian T's netprobe(?)
monitoring web farm
and see tranaction detail
and see correlations
carve by app response time, is way to correlate app performance & delivery, with underlying infrastructure conspiring to deliver that.
dip when app stops.
way to pull things apart, w/o overloading anything.
why mon everything, 2 good 1 real reaon
1. troubleshooting - always have context
2. putting network and server teams on same page
so see! [cloud services!!]
3. full observability required for automated control
control theory 101.
to automate closed loop, have to report all
guy who designed sflow is control engr
sflow+openflow are complementary...
openflow can control
if you have viz at same time, opportunity to close loop
research topic, but looks promising
danger w/openflow featuers to use for accountinga nd control?
much better to use wildcards when possible to openflow controls.
3 open standards that work well
netconf xmpl standards to set up / configuration
forwarding openflow controls make sense
visibility sflow...
cisco has a similar story, with a proprietary system
blog.sflow.com -> peter files musing
sflow.org to see if equpment supports it.
bgp stuff at oboarder, vs netflow?
sflow allows full bgp + as paths to be sent
very high-value measuremnt, to look at as paths and peering arrangment
allow you to break down by ip addr, subnets, protocols, min myb min
pull for accounting and routing perf
if routers don't support htat, can peer with router and pull aspaths in
and splice in to sflow/netflow feed, and do similar analysis
wan pov, realtime access allows for attack analysis
so a reason to find into wan routers as well as l2 switches.
PS PXE booting (brief)
to The Challenge - Ken
Mentions -
- I2 description of the proposed speed test tool
- Penn State PXE booting pS-Toolkit
Community Updates / Open Forum
any other updates
any other questions
how many people lookng at sflow
how planning/getting ready for big data challenges on networks
jim/ussc
use statseeker for all counter data
gobs of netflow
ericp: q about sflow
100G, how do that. single device how fast can go
running on brocade 100G today
turned on at SC, and it worked.
much easier for device to d sflow
sampling, decodding, aggregating, then flush out
sflow - sampling and send.
q: standard sampling rate, or all over the place
not faster than you need to
1/1000 and tweak
high level 1/40000 stillg et good data
security guys faster and faster
everyone else smooth and setady
6500s in core
10g across network
10g campuses
killing cpu when turned up more interfaces
so switched to brocade at borders, sflow
l2-7 info. before l3 only.
see more what's going on
initiative to look at core
sflow enabled dev on core, go to 100G.
can scale easily. afraid of what netflow would do if couldn't handle
10G inks
turn netflow reshalls and some things
put switch inline to do sflow to make work :)
XMRs at boarder
XLMs in core
juniper allu
sup720s out of gas
yes. prototype some new cards too, more power but still a lot of cpu
sec group, has mirrored port off of border routers
use bro cluster
get every packet off router
use sampling to trigger for security too now